C Programming
C Programming
C Programming
The notes on these pages are for the courses in C Programming I used to teach in the Experimental
College at the University of Washington in Seattle, WA. Normally these notes accompany fairly
traditional classroom lecture presentations, but they are intended to be reasonably complete (more so, for
that matter, than the lectures!) and should be usable as standalone tutorials.
I originally designed the first, Introductory course around The C Programming Language (2nd Edition)
by Kernighan and Ritchie, and the notes were designed to complement that text, highlighting important
points and explaining subtleties which might be lost on the general reader. Later, I rewrote the notes to
stand on their own (in part because, in spite of the first set of notes, too many of my students found K&R
a bit too technical for an informal, introductory course). Finally, I occasionally teach an Intermediate
course, which covers the topics which tend to be skipped or glossed over in introductory courses (bitwise
operators, structures, file I/O, etc.). The Intermediate course has its own set of notes.
All three sets of notes are available here. If you have a copy of K&R2 and would like a thorough
treatment of the language, read K&R and the ``Notes to Accompany K&R'' side by side. If you're just
getting your feet wet and would like a somewhat simpler introduction, read the ``Introductory Class
Notes.'' If you have had an introduction to C (either here or elsewhere) and are now looking to fill in
some of the missing pieces, read the ``Intermediate Class Notes.''
Of course, just reading a book or these notes won't really teach you C; you will also want to write and run
your own programs, for practice and so that the language concepts will make some kind of practical
sense. Most of my programming assignments (including review questions) are here as well, along with
their solution sets. (No peeking at the answers until you've given the problems your best shot!)
These notes are arranged for the web in the usual hierarchy by section and subsection. If you want to read
through all of them, without keeping track of your own stack to implement a depth-first tree traversal,
just follow the ``read sequentially'' links at the bottom of each page.
Depending on your background, you might want to read one or both of the two preliminary handouts:
one on programming in general, and one which reviews some math which is relevant to programming.
(And there are some other miscellaneous handouts, too.)
One note about the HTML: these pages were produced automatically from the base manuscripts for my
class notes, using a program of my own devising which is, all too typically, not (yet?) perfect. I apologize
in advance for any formatting glitches. In particular, when you see <sup>...</sup> or
<sub>...</sub> in the text, these do not represent bugs in your browser or accidental bugs in my
markup; instead, these are my interim compromise way of representing superscripts and subscripts to
you, since there's no way to do so in portable HTML.
http://www.eskimo.com/~scs/cclass/cclass.html (1 of 2) [22/07/2003 5:07:43 PM]
C Programming
Finally, I realize that reading these notes on the net is not always as convenient as it might be,
particularly when the net is slow. Please realize, though, that the net is what it is, and that I have gone to
a certain amount of effort to place these notes here at all. Please do not ask me to send you a set of these
notes for browsing on your own machine, as I am currently unable to do so.
Experimental College
Experimental College
The Experimental College is (I think) Washington's oldest and largest alternative educational resource,
typically offering hundreds of classes serving thousands of students each quarter. Experimental College
classes are taught by local members of the community who love what they're doing and want to teach
you to love it, too. Most classes are conducted in the Seattle area. For much more information, visit the
official Experimental College home page.
C Programming Notes
C Programming Notes
Notes to Accompany The C Programming Language, by Kernighan and Ritchie (``K&R'')
Steve Summit
C Programming Notes
Preface
Preface to the First Edition
Introduction
Chapter 1. A Tutorial Introduction
Chapter 2: Types, Operators, and Expressions
Chapter 3: Control Flow
Chapter 4: Functions and Program Structure
Chapter 5: Pointers and Arrays
Chapter 6: Structures
Chapter 7: Input and Output
Read Sequentially
Preface
Preface
page ix
You'll get some hint here that C has become a bit more formal as it has ``grown up.'' That formality is
appropriate, and for the second edition of K&R to acknowledge it is appropriate, and for any modern
course in C programming to teach it is appropriate. Personally, I learned C before it had become quite so
formalized, and occasionally my traditional biases will leak through. I'll try to admit it when they do.
As the authors note, C is a relatively small language, but one which (to its admirers, anyway) wears well.
C's small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in
the way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you,
there's a lot you have to do yourself. (Actually, this is viewed by many as an additional advantage:
anything the language doesn't do for you, it doesn't dictate to you, either, so you're free to do that
something however you want.)
Introduction
Introduction
page 2
Deep sentence:
...C deals with the same sort of objects that most computers do, namely characters,
numbers, and addresses.
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable resulting from
a C program is not efficient, it's probably because of something silly you did, not because of something
the compiler did behind your back which you have no control over. In any case, there's no point in
complaining about C's low-level flavor: C is what it is.
Next we see a more detailed list of the things that are not ``part of C.'' It's good to understand exactly
what we mean by this. When we say that the C language proper does not do things like memory
allocation or I/O, or even string manipulation, we obviously do not mean that there is no way to do these
things in C. In fact, the usual functions for doing these things are specified by the ANSI C Standard with
as much rigor as is the core language itself.
The fact that things like memory allocation and I/O are done through function calls has three
implications:
1. the function calls to do memory allocation, I/O, etc. are no different from any other function calls;
2. the functions which do memory allocation, I/O, etc. do not know any more about the data they're
acting on than ordinary functions do (we'll have more to say about this later); and
3. if you have specialized needs, you can do nonstandard memory allocation or I/O whenever you wish,
by using your own functions and ignoring the standard ones provided.
The sentence that says ``Most C implementations have included a reasonably standard collection of such
functions'' is historical; today, all implementations conforming to the ANSI C Standard have a very
http://www.eskimo.com/~scs/cclass/krnotes/sx3.html (1 of 2) [22/07/2003 5:07:52 PM]
Introduction
standard collection.
page 3
Deep sentence:
...C retains the basic philosophy that programmers know what they are doing; it only
requires that they state their intentions explicitly.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
page 4
The last sentence of the Introduction is misleading: as we'll see, it's risky to defer to any particular
compiler as a ``final authority on the language.'' A compiler is only a final authority on the language it
accepts, and the language that a particular compiler accepts is not necessarily exactly C, no matter what
the name of the compiler suggests. Most compilers accept extensions which are not part of standard C
and which are not supported by other compilers; some compilers are deficient and fail to accept certain
constructs which are in standard C. From time to time, you may have questions about what is truly
standard and which neither you nor anyone you've talked to is able to answer. If you don't have a copy of
the standard (or if you do, but you discover that the standardese in which it's written is impenetrable),
you may have to temporarily accept the jurisdiction of your particular compiler, in order to get some
program working today and under that particular compiler, but you'd do well to mark the code in
question as suspect and the question in your head as ``don't know; still unanswered.''
(
<
)
=
*
;
The second thing to note is that style issues (such as how a program is laid out) are important, but they're
not something to be too dogmatic about, and there are also other, deeper style issues besides mere layout
and typography.
There is some value in having a reasonably standard style (or a few standard styles) for code layout.
Please don't take the authors' advice to ``pick a style that suits you'' as an invitation to invent your own
brand-new style. If (perhaps after you've been programming in C for a while) you have specific
objections to specific facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If you want to place your
own stamp of originality on the programs that you write, there are better avenues for your creativity than
inventing a bizarre layout; you might instead try to make the logic easier to follow, or the user interface
easier to use, or the code freer of bugs.)
Deep sentence:
...in C, as in many other languages, integer division truncates: any fractional part is
discarded.
The authors say all there is to say here, but remember it: just when you've forgotten this sentence, you'll
wonder why something is coming out zero when you thought it was supposed the be the quotient of two
nonzero numbers.
page 12
Here is more discussion on the difference between integer and floating-point division. Nothing deep; just
something to remember.
page 13
Hidden here are discriptions of some more of printf's ``conversion specifiers.'' %o and %x print
integers, in octal (base 8) and hexadecimal (base 16), respecively. Since a percent sign normally tells
printf to expect an additional argument and insert its value, you might wonder how to get printf to
just print a %. The answer is to double it: %%.
Also, note (as was mentioned on page 11) that you must match up the arguments to printf with the
conversion specification; the compiler can't (or won't) generally check them for you or fix things up if
you get them wrong. If fahr is a float, the code
printf("%d\n", fahr);
will not work. You might ask, ``Can't the compiler see that %d needs an integer and fahr is floatingpoint and do the conversion automatically, just like in the assignments and comparisons on page 12?''
And the answer is, no. As far as the compiler knows, you've just passed a character string and some other
arguments to printf; it doesn't know that there's a connection between the arguments and some special
characters inside the string. This is one of the implications of the fact, stated earlier, that functions like
printf are not special. (Actually, some compilers or other program checkers do know that a function
named printf is special, and will do some extra checking for you, but you can't count on it.)
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html
call getchar,
assign its return value to a variable,
test the return value against EOF, and
process the character (in this case, print it again).
We can't eliminate any of these steps. We have to assign getchar's value to a variable (we can't just use
it directly) because we have to do two different things with it (test, and print). Therefore, compressing the
assignment and test into the same line (as on page 17) is the only good way of avoiding two distinct calls
to getchar (as on page 16). You may not agree that the compressed idiom is better for being more
compact or easier to read, but the fact that there is now only one call to getchar is a real virtue.
In a tiny program like this, the repeated call to getchar isn't much of a problem. But in a real program,
if the thing being read is at all complicated (not just a single character read with getchar), and if the
processing is at all complicated (such that the input call before the loop and the input call at the end of the
loop become widely separated), and if the way that input is done is ever changed some day, it's just too
likely that one of the input calls will get changed but not the other.
(Also, note that when an assignment like c = getchar() appears within a larger expression, the
surrounding expression receives the same value that is assigned. Using an assignment as a subexpression
in this way is perfectly legal and quite common in C.)
http://www.eskimo.com/~scs/cclass/krnotes/sx4f.html
When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard has nothing
to do with the EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something (i.e. if it's in an infinite loop) you'll
have to take more drastic measures. Under Unix, control-C will terminate the current program, almost no
matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the current
program, but by default MS-DOS only checks for control-C when it's looking for input, so an infinite loop
can be unkillable. There's a DOS command, I think it's
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your
computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program, which it reads all at once, and then copies to the screen (again). In
other words, when you run this program, it will probably seem to echo the input a line at a time, rather
than a character at a time. You may wonder how a program can read a character right away, without
waiting for the user to hit RETURN. That's an excellent question, but unfortunately the answer is rather
complicated, and beyond the scope of this introduction. (Among other things, how to read a character
right away is one of the things that's not defined by the C language, and it's not defined by any of the
standard library functions, either. How to do it depends on which operating system you're using.)
pages 26-7
It's probably a good idea if you're aware of this ``old style'' function syntax, so that you won't be taken
aback when you come across it, perhaps in code written by reactionary old fogies (such as the author of
these notes) who still tend to use it out of habit when they're not paying attention.
fact that a '\0' character is always present to mark the end of a string. (It's easy to forget to count the
'\0' character when allocating space for a string, for instance.) Notice the nice picture on page 30; this
is a good way of thinking about data structures (and not just simple character arrays, either).
page 29
Note that the program explicitly allocates space for the two strings it manipulates: the current line line,
and the longest line longest. (It only needs these two strings at any one time, even though the input
consists of arbitrarily many lines.) Note that it cannot simply assign one string to another (because C
provides no built-in support for composite objects such as character strings); the program calls the copy
function to do so. (The authors write their own copy function for explanatory purposes; the standard
library contains a string-copying function which would normally be used.) The only strings that aren't
explicitly allocated are the arrays in the getline and copy functions; as the discussion briefly
mentions, these do not need to be allocated because they're already allocated in the caller. (There are a
number of subtleties about array parameters to functions; we'll have more to say about them later.)
The code on page 29 contains a number of examples of compressed assignments and tests; evidently the
authors expect you to get used to this style in a hurry. The line
while ((len = getline(line, MAXLINE)) > 0)
is similar to the getchar loops earlier in this chapter; it calls getline, saves its return value in the
variable len, and tests it against 0.
The comparison
i<lim-1 && (c=getchar())!=EOF && c!='\n'
in the for loop in the getline function does several things: it makes sure there is room for another
character in the array; it calls, assigns, and tests getchar's return value against EOF, as before; and it
also tests the returned character against '\n', to detect end of line. The surrounding code is mildly
clumsy in that it has to check for \n a second time; later, when we learn more about loops, we may find
a way of writing it more cleanly. You may also notice that the code deals correctly with the possibility
that EOF is seen without a \n.
The line
while ((to[i] = from[i]) != '\0')
in the copy function does two things at once: it copies characters from the from array to the to array,
and at the same time it compares the copied character against '\0', so that it stops at the end of the
string. (If you think this is cryptic, wait 'til we get to page 106 in chapter 5!)
http://www.eskimo.com/~scs/cclass/krnotes/sx4m.html (2 of 3) [22/07/2003 5:08:14 PM]
We've also just learned another printf conversion specifier: %s prints a string.
page 30
Deep sentence:
There is no way for a user of getline to know in advance how long an input line might
be, so getline checks for overflow.
Because dynamically allocating memory for arbitrary-length strings is mildly tedious in C, it's tempting
to use fixed-size arrays. (It's so tempting, in fact, that that's what most programs do, and since fixed-size
arrays are also considerably easier to discuss, all of our early example programs will use them.) Using
fixed-size arrays is fine, as long as some assurance is made that they don't overflow. Unfortunately, it's
also tempting (and easy) to forget to guard against array overflow, perhaps by deluding yourself into
thinking that too-long inputs ``can't happen.'' Murphy's law says that they do happen, and the various
corrolaries to Murphy's law say that they happen in the most unpleasant way and at the least convenient
time. Don't be cavalier about arrays; do make sure that they're big enough and that you guard against
overflowing them. (In another mark of C's general insensitivity to beginning programmers, most
compilers do not check for array overflow; if you write more data to an array than it is declared to hold,
you quietly scribble on other parts of memory, usually with disastrous results.)
Deep sentence:
You should note that we are using the words definition and declaration carefully when we
refer to external variables in this section. ``Definition'' refers to the place where the
variable is created or assigned storage; ``declaration'' refers to places where the nature of
the variable is stated but no storage is allocated.
Do note the careful distinction; it's an important one and one which I'll be using, too.
page 34
The authors' criticism of the second (page 32) version of the longest-line program is accurate. The
revision of the longest-line program to use external variables was done only to demonstrate the use of
external variables, not to improve the program in any way (nor does it improve the program in any way).
As a general rule, external variables are acceptable for storing certain kinds of global state information
which never changes, which is needed in many functions, and which would be a nuisance to pass around.
I don't think of external variables as ``communicating between functions'' but rather as ``setting common
state for the entire program.'' When you start thinking of an external variables as being one of the ways
you communicate with a particular function, and in particular when you find yourself changing the value
of some external variable just before calling some function, to affect its operation in some way, you start
getting into the troublesome uses of external variables, which you should avoid.
/* WRONG */
about dividing by zero. (If, on the other hand, the compiler always evaluated both sides of the && before
checking to see whether they were both true, the code above could divide by zero.)
page 42
Note the extra parentheses in
(c = getchar()) != '\n'
Since this is a common idiom, you'll need to remember the parentheses. What would
c = getchar() != '\n'
do?
C's treatment of Boolean values (that is, those where we only care whether they're true or false) is
straightforward. We'll have more to say about it later, but for now, note that a value of zero is ``false,''
and any nonzero value is ``true.'' You might also note that there is no necessary connection between
statements like if() which expect a true/false value and operators like >= and && which generate
true/false values. You can use operators like >= and && in any expression, and you can use any
expression in an if() statement.
The authors make a good point about style: if valid is conceptually a Boolean variable (that is, it's an
integer, but we only care about whether it's zero or nonzero, in other words, ``false'' or ``true''), then
if(valid)
is a perfectly reasonable and readable condition. However, when values are not conceptually Boolean, I
encourage you to make explicit comparisons against 0. For example, we could have expressed our
average-taking code as
if(n && sum / n > 1)
but I think it's clearer to be explicit and say
if(n != 0 && sum / n > 1)
(However, many C programmers feel that expressions like
if(n && sum / n > 1)
are ``more concise,'' so you will see them all the time and you should be able to read them.)
page 43
To illustrate the ``schitzophrenic'' nature of characters (are they characters, or are they small integer
values?), it's useful to look at an implementation of the standard library function atoi. (If you're getting
overwhelmed, though, you may skip this example for now, and come back to it later.) The atoi routine
converts a string like "123" into an integer having the corresponding value.
As you study the atoi code at the top of page 43, figure out why it does not seem to explicitly check for
the terminating '\0' character.
The expression
s[i] - '0'
is an example of the ``crossing over'' between thinking about a character and its value. Since the value of
the character '0' is not zero (and, similarly, the other numeric characters don't have their ``obvious''
values, either), we have to do a little conversion to get the value 0 from the character '0', the value 1 from
the character '1', etc. Since the character set values for the digit characters '0' to '9' are contiguous
(48-57, if you must know), the conversion involves simply subtracting an offset, and the offset (if you think
about it) is simply the value of the character '0'. We could write
s[i] - 48
if we really wanted to, but that would require knowing what the value actually is. We shouldn't have to
know (and it might be different in some other character set), so we can let the compiler do the dirty work
by using '0' as the offset (since subtracting '0' is, by definition, the same as subtracting the value of the
character '0').
The functions from <ctype.h> are being introduced here without a lot of fanfare. Here is the main loop
of the atoi routine, rewritten to use isdigit:
for (i = 0; isdigit(s[i]); ++i)
n = 10 * n + (s[i] - '0');
Don't worry too much about the discussion of signed vs. unsigned characters for now. (Don't forget about it
completely, though; eventually, you'll find yourself working with a program where the issue is significant.)
For now, just remember:
1. Use int as the type of any variable which receives the return value from getchar, as discussed in
section 1.5.1 on page 16.
2. If you're ever dealing with arbitrary ``bytes'' of binary data, you'll usually want to use unsigned
char.
page 44
As we saw in section 2.6 on page 44, relational and logical operators always ``return'' 1 for ``true'' and 0 for
``false.'' However, when C wants to know whether something is true or false, it just looks at whether it's
nonzero or zero, so any nonzero value is considered ``true.'' Finally, some functions which return true/false
values (the text mentions isdigit) may return ``true'' values of other than 1.
You don't have to worry about these distinctions too much, and you also don't have to worry about the
fragment
d = c >= '0' && c <= '9'
as long as you write conditionals in a sensible way. If you wanted to see whether two variables a and b
were equal, you'd never write
if((a == b) == 1)
(although it would work: the == operator ``returns'' 1 if they're equal). Similarly, you don't want to write
if(isdigit(c) == 1)
because it's equally silly-looking, and in this case it might not work. Just write things like
if(a == b)
and
if(isdigit(c))
and you'll steer clear of most problems. (Make sure, though, that you never try something like if('0'
<= c <= '9'), since this wouldn't do at all what it looks like it's supposed to.)
The set of implicit conversions on page 44, though informally stated, is exactly the set to remember for
now. They're easy to remember if you notice that, as the authors say, ``the `lower' type is promoted to the
`higher' type,'' where the ``order'' of the types is
char < short int < int < long int < float < double < long double
(We won't be using long double, so you don't need to worry about it.) We'll have more to say about
these rules on the next page.
Don't worry too much for now about the additional rules for unsigned values, because we won't be using
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (3 of 8) [22/07/2003 5:08:29 PM]
them at first.
Do notice that implicit (automatic) conversions do happen across assignments. It's perfectly acceptable to
assign a char to an int or vice versa, or assign an int to a float or vice versa (or any other
combination). Obviously, when you assign a value from a larger type to a smaller one, there's a chance that
it might not fit. Therefore, compilers will often warn you about such assignments.
page 45
Casts can be a bit confusing at first. A cast is the syntax used to request an explicit type conversion;
coercion is just a more formal word for ``conversion.'' A cast consists of a type name in parentheses and is
used as a unary operator. You may have used languages which had conversion operators which looked
more like function calls:
integer i = 2;
floating f = floating(i);
integer i2 = integer(f);
/* not C */
/* not C */
i2 = c1 + i1;
the fourth rule on page 44 tells us to convert the char to an int, as if we'd written
i2 = (int)c1 + i1;
If we multiply a long int and a double:
d2 = L1 * d1;
the second rule tells us to convert the long int to a double, as if we'd written
d2 = (double)L1 * d1;
An assignment of a char to an int
i1 = c1;
is as if we'd written
i1 = (int)c1;
and an assignment of a float to an int
i1 = f1;
is as if we'd written
i1 = (int)f1;
Some programmers worry that implicit conversions are somehow unreliable and prefer to insert lots of
explicit conversions. I recommend that you get comfortable with implicit conversions--they're quite useful-and don't clutter your code with extra casts.
There are a few places where you do need casts, however. Consider the code
i1 = 200;
i2 = 400;
L1 = i1 * i2;
The product 200 x 400 is 80000, which is not guaranteed to fit into an int. (Remember that an int is only
guaranteed to hold values up to 32767.) Since 80000 will fit into a long int, you might think that you're
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (5 of 8) [22/07/2003 5:08:29 PM]
okay, but you're not: the two sides of the multiplication are of the same type, so the compiler doesn't see the
need to perform any automatic conversions (none of the rules on page 44 apply). The multiplication is
carried out as an int, which overflows with unpredictable results, and only after the damage has been
done is the unpredictable value converted to a long int for assignment to L1. To get a multiplication
like this to work, you have to explicitly convert at least one of the int's to long int:
L1 = (long int)i1 * i2;
Now, the two sides of the * are of different types, so they're both converted to long int (by the fifth rule
on page 44), and the multiplication is carried out as a long int. If it makes you feel safer, you can use
two casts:
L1 = (long int)i1 * (long int)i2;
but only one is strictly required.
A similar problem arises when two integers are being divided. The code
i1 = 1;
f1 = i1 / 2;
does not set f1 to 0.5, it sets it to 0. Again, the two operands of the / operand are already of the same type
(the rules on page 44 still don't apply), so an integer division is performed, which discards any fractional
part. (We saw a similar problem in section 1.2 on page 12.) Again, an explicit conversion saves the day:
f1 = (float)i1 / 2;
Alternately, in a case like this, you can use a floating-point constant:
f1 = i1 / 2.0;
In either case, as soon as one of the operands is floating point, the division is carried out in floating point,
and you get the result you expect.
Implicit conversions always happen during arithmetic and assignment to variables. The situation is a bit
more complicated when functions are being called, however.
The authors use the example of the sqrt function, which is as good an example as any. sqrt accepts an
argument of type double and returns a value of type double. If the compiler didn't know that sqrt
took a double, and if you called
sqrt(4);
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (6 of 8) [22/07/2003 5:08:29 PM]
or
int n = 4;
sqrt(n);
the compiler would pass an int to sqrt. Since sqrt expects a double, it will not work correctly if it
receives an int. Therefore, it was once always necessary to use explicit conversions in cases like this, by
calling
sqrt((double)4)
or
sqrt((double)n)
or
sqrt(4.0)
However, it is now possible, with a function prototype, to tell the compiler what types of arguments a
function expects. The prototype for sqrt is
double sqrt(double);
and as long as a prototype is in effect (``in scope,'' as the cognoscenti would say), you can call sqrt
without worrying about conversions. When a prototype is in effect, the compiler performs implicit
conversions during function calls (specifically, while passing the arguments) exactly as it does during
simple assignments.
Obviously, using prototypes makes for much safer programming, and it is recommended that you use them
whenever possible. For the standard library functions (the ones already written for you), you get prototypes
automatically when you include the header files which describe sets of library functions. For example, you
get prototypes for all of C's built-in math functions by putting the line
#include <math.h>
at the top of your program. For functions that you write, you can supply your own prototypes, which we'll
be learning more about later.
However, there are a few situations (we'll talk about them later) where prototypes do not apply, so it's
important to remember that function calls are a bit different and that explicit conversions (i.e. casts) may
occasionally be required. Don't imagine that prototypes are a panacea.
http://www.eskimo.com/~scs/cclass/krnotes/sx5g.html (7 of 8) [22/07/2003 5:08:29 PM]
page 46
Don't worry about the rand example.
052525
& 000177
-----125
In the second example, if SET_ON is 012 and x is 0, then x | SET_ON looks like
000000000
| 000001010
--------1010
000000
| 000012
-----12
000402
| 000012
-----412
Note that with &, anywhere we have a 0 we turn bits off, and anywhere we have a 1 we copy bits through
from the other side. With |, anywhere we have a 1 we turn bits on, and anywhere we have a 0 we leave
bits alone.
You'll frequently see the word mask used, both as a noun and a verb. You can imagine that we've cut a
mask or stencil out of cardboard, and are using spray paint to spray through the mask onto some other
piece of paper. For |, the holes in the mask are like 1's, and the spray paint is like 1's, and we paint more
1's onto the underlying paper. (If there was already paint under a hole, nothing really changes if we get
http://www.eskimo.com/~scs/cclass/krnotes/sx5i.html (1 of 2) [22/07/2003 5:08:33 PM]
++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if it chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
to do anything reasonable with them at all, such that the above code might end up printing 42, or
8923409342, or 0, or crashing your computer.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
f() + (g() * h())
we still wouldn't know whether f(), g(), or h() would be called first. (The parentheses would force
the multiplication to happen before the addition, but precedence already would have forced that,
anyway.) If we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
order; this parenthesized version would be just as undefined as i++ * i++ was.
page 53
Deep sentence:
Function calls, nested assignment statements, and increment and decrement operators
cause ``side effects''--some variable is changed as a by-product of the evaluation of an
expression.
(There's a slight inaccuracy in this sentence: any assignment expression counts as a side effect.)
It's these ``side effects'' that you want to keep in mind when you're making sure that your programs are
well-defined and don't suffer any of the undefined behavior we've been discussing. (When we informally
said that complex expressions had several things going on ``at once,'' we were actually referring to
expressions with multiple side effects.) As a general rule, you should make sure that each expression
only has one side effect, or if it has several, that different variables are changed by the several side
effects.
page 54
Deep sentence:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. And you don't want to write programs that work only under one particular compiler, that take
advantage of the way that compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on ``under the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.
momentarily misled.)
page 61
When the authors say that ``the index and limit of a C for loop can be altered from within the loop,''
they mean that a loop like
int i, n = 10;
for(i = 0; i < n; i++) {
if(i == 5)
i++;
printf("%d\n", i);
if(i == 8)
n++;
}
where i and n are modified within the loop, is legal. (Obviously, such a loop can be very confusing, so
you'll probably be better off not making use of this freedom too much.)
When they say that ``the index variable... retains its value when the loop terminates for any reason,'' you
may not find this too surprising, unless you've used other languages where it's not the case. The fact that
loop control variables retain their values after a loop can make some code much easier to write; for
example, the atoi function at the bottom of this page depends on having its i counter manipulated by
several loops as it steps over three different parts of the string (whitespace, sign, digits) with i's value
preserved between each step.
Deep sentence:
Each step does its part, and leaves things in a clean state for the next.
This is an extremely important observation on how to write clean code. As you study the atoi code,
notice that it falls into three parts, each implementing one step of the pseudocode description: skip white
space, get sign, get integer part and convert it. At each step, i points at the next character which that
step is to inspect. (If a step is skipped, because there is no leading whitespace or no sign, the later steps
don't care.)
You may hear the term invariant used: this refers to some condition which exists at all stages of a
program or function. In this case, the invariant is that i always points to the next character to be
inspected. Having some well-chosen invariants can make code much easier to write and maintain. If there
aren't enough invariants--if i is sometimes the next character to look at and sometimes the character that
was just looked at--debugging and maintaining the code can be a nightmare.
The line
sign = (s[i] == '-') ? -1 : 1;
may seem a bit cryptic at first, though it's a textbook example of the use of ?: . The line is equivalent to
sign = 1;
if(s[i] == '-')
sign = -1;
pages 61-62
It's instructive to study this Shell or ``gap'' sort, but don't worry if you find it a bit bewildering.
Deep sentence:
Notice how the generality of for makes the outer loop fit the same form as the others,
even though it is not an arithmetic progression.
The point is that loops don't have to count 0, 1, 2... or 1, 2, 3... . (This one counts n/2, n/4, n/8... .
Later we'll see loops which don't step over numbers at all.)
page 63
Deep sentence:
The commas that separate function arguments, variables in declarations, etc. are not
comma operators...
http://www.eskimo.com/~scs/cclass/krnotes/sx6e.html (3 of 5) [22/07/2003 5:08:48 PM]
/* WRONG */
/* WRONG */
or
(The only disadvantage here is that it's trickier to return i and j if we need them both.)
and return from a function call, as opposed to simply doing the function's statements in-line. It's a risky
thing to worry about, though, because as soon as you start worrying about it, you have a bit of a
disincentive to use functions. If you're reluctant to use functions, your programs will probably be bigger
and more complicated and harder to maintain (and perhaps, for various reasons, actually less efficient).
The authors choose not to get involved with the system-specific aspects of separate compilation, but we'll
take a stab at it here. We'll cover two possibilities, depending on whether you're using a traditional
command-line compiler or a newer integrated development environment (IDE) or other graphical user
interface (GUI) compiler.
When using a command-line compiler, there are usually two main steps involved in building an
executable program from one or more source files. First, each source file is compiled, resulting in an
object file containing the machine instructions (generated by the compiler) corresponding to the code in
that source file. Second, the various object files are linked together, with each other and with libraries
containing code for functions which you did not write (such as printf), to produce a final, executable
program.
Under Unix, the cc command can perform one or both steps. So far, we've been using extremely simple
invocations of cc such as
cc hello.c
(section 1.1, page 6). This invocation compiles a single source file, links it, and places the executable
(somewhat inconveniently) in a file named a.out.
Suppose we have a program which we're trying to build from three separate source files, x.c, y.c, and
z.c. We could compile all three of them, and link them together, all at once, with the command
cc x.c y.c z.c
(see also page 70). Alternatively, we could compile them separately: the -c option to cc tells it to
compile only, but not to link. Instead of building an executable, it merely creates an object file, with a
name ending in .o, for each source file compiled. So the three commands
cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o, and z.o. Then, the three object
files could be linked together using
cc x.o y.o z.o
http://www.eskimo.com/~scs/cclass/krnotes/sx7.html (2 of 5) [22/07/2003 5:08:54 PM]
When the cc command is given an .o file, it knows that it does not have to compile it (it's an object file,
already compiled); it just sends it through to the link process.
Here we begin to see one of the advantages of separate compilation: if we later make a change to y.c,
only it will need recompiling. (At some point you may want to learn about a program called make,
which keeps track of which parts need recompiling and issues the appropriate commands for you.)
Above we mentioned that the second, linking step also involves pulling in library functions. Normally,
the functions from the Standard C library are linked in automatically. Occasionally, you must request a
library manually; one common situation under Unix is that certain math routines are in a separate math
library, which is requested by using -lm on the command line. Since the libraries must typically be
searched after your program's own object files are linked (so that the linker knows which library
functions your program uses), any -l option must appear after the names of your files on the command
line. For example, to link the object file mymath.o (previously compiled with cc -c mymath.c)
together with the math library, you might use
cc mymath.o -lm
Two final notes on the Unix cc command: if you're tired of using the nonsense name a.out for all of
your programs, you can use -o to give another name to the output (executable) file:
cc -o hello hello.c
would create an executable file named hello, not a.out. Finally, everything we've said about cc also
applies to most other Unix C compilers. Many of you will be using acc (a semistandard name for a
version of cc which does accept ANSI Standard C) or gcc (the FSF's GNU C Compiler, which also
accepts ANSI C and is free).
There are command-line compilers for MS-DOS systems which work similarly. For example, the
Microsoft C compiler comes with a CL (``compile and link'') command, which works almost the same as
Unix cc. You can compile and link in one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.
The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Turbo C or the Microsoft Programmer's Workbench or Think C, most
of the mechanical details are taken care of for you. (There's also less I can say here about these
environments, because they're all different.) Typically there's a way to specify the list of files (modules)
which make up your project, and a single ``build'' button which does whatever's required to build (and
perhaps even execute) your program.
section 4.1: Basics of Functions
section 4.2: Functions Returning Non-Integers
section 4.3: External Variables
section 4.4: Scope Rules
section 4.5: Header Files
section 4.6: Static Variables
section 4.7: Register Variables
section 4.8: Block Structure
section 4.9: Initialization
section 4.10: Recursion
section 4.11: The C Preprocessor
section 4.11.1: File Inclusion
section 4.11.2: Macro Substitution
section 4.11.3: Conditional Inclusion
matching program fill this role very well: getline's job is to read one line, and strindex'es job is to
find one string in another string. printf's specification is considerably broader: its job is to print stuff.
(It's not surprising that printf can therefore be the harder routine to call, and is certainly much harder to
implement. Its saving virtue is that it is nonetheless broadly applicable and infinitely reusable.)
When you find that a program is hard to manage, it's often because if has not been designed and broken up
into functions cleanly. Two obvious reasons for moving code down into a function are because:
1. It appeared in the main program several times, such that by making it a function, it can be written just
once, and the several places where it used to appear can be replaced with calls to the new function.
2. The main program was getting too big, so it could be made (presumably) smaller and more manageable
by lopping part of it off and making it a function.
These two reasons are important, and they represent significant benefits of well-chosen functions, but they
are not sufficient to automatically identify a good function. A good function has at least these two
additional attributes:
3. It does just one well-defined task, and does it well.
4. Its interface to the rest of the program is clean and narrow.
Attribute 3 is just a restatement of something we said above. Attribute 4 says that you shouldn't have to
keep track of too many things when calling a function. If you know what a function is supposed to do, and
if its task is simple and well-defined, there should be just a few pieces of information you have to give it to
act upon, and one or just a few pieces of information which it returns to you when it's done. If you find
yourself having to pass lots and lots of information to a function, or remember details of its internal
implementation to make sure that it will work properly this time, it's often a sign that the function is not
sufficiently well-defined. (It may be an arbitrary chunk of code that was ripped out of a main program that
was getting too big, such that it essentially has to have access to all of that main function's variables.)
The whole point of breaking a program up into functions is so that you don't have to think about the entire
program at once; ideally, you can think about just one function at a time. A good function is a ``black box'':
when you call it, you only have to know what it does (not how it does it); and when you're writing it, you
only have to know what it's supposed to do (and you don't have to know why or under what circumstances
its caller will be calling it). Some functions may be hard to write (if they have a hard job to do, or if it's
hard to make them do it truly well), but that difficulty should be compartmentalized along with the function
itself. Once you've written a ``hard'' function, you should be able to sit back and relax and watch it do that
hard work on call from the rest of your program. If you find that difficulties pervade a program, that the
hard parts can't be buried inside black-box functions and then forgotten about, if you find that there are hard
parts which involve complicated interactions among multiple functions, then the program probably needs
redesigning.
For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program'' into
a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. The ``main
program,'' main(), is itself just a function, and any function we find ourself writing will often be
appropriately written in terms of sub-functions, sub-sub-functions, etc.
That's probably enough for now about functions in general. Here are a few more notes about the linematching program.
The authors mention that ``The standard library provides a function strstr that is similar to strindex,
except that it returns a pointer instead of an index.'' We haven't met pointers yet (they're in chapter 5), so
we aren't quite in a position to appreciate the difference between an index and a pointer. Generally, an
index is a small number referring to some element of an array. A pointer is more general: it can point to any
data object of a particular type, whether it's one element of an array, or some other object anywhere in
memory. (Don't worry too much about the distinction yet, but bear in mind that there is a distinction. Note,
too, that the distinction is not absolute; in fact, the word ``index'' seems to derive from the concept of
pointing, as you can see if you think about what you use your index finger for, or if you notice that the
entries in a book's index point at the referenced parts of the book. We frequently speak casually of an index
variable ``pointing at'' some cell of an array, even though it's not a true pointer variable.)
One facet of the getline function's interface might bear mentioning: its first argument, the character
array s, is being used to return the line that it reads. This may seem to contradict the rule that a function
can never modify the value of a variable in its caller. As was briefly mentioned on page 28, there's an
exception for arrays, which well be learning about in chapter 5; for now, we'll gloss over the point.
(Actually, we're glossing over two points: not only is getline able to return a value via an argument, but
the argument isn't really an array, although it's declared as and looks like one. Please forgive these gentle
fictions; explaining them completely would really be premature at this point. Perhaps they weren't worth
mentioning yet, after all.)
For comparison, here is yet another version of getline:
int getline(char s[], int lim)
{
int c, i = 0;
while(--lim > 0 && (c=getchar()) != EOF) {
s[i++] = c;
if(c == '\n')
break;
}
s[i] = '\0';
http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (3 of 6) [22/07/2003 5:08:57 PM]
return i;
}
Note that by using break, we avoid having to test for '\n' in two different places.
If you're having trouble seeing how the strindex function works, its algorithm is
for (each position i in s)
if (t occurs at position i in s)
return i;
(else) return -1;
Filling in the details of ``if (t occurs at position i in s)'', we have:
for (each position i in s)
for (each character in t)
if (it matches the corresponding character in s)
if (it's '\0')
return i;
else
keep going
else
no match at position i
(else) return -1;
A slightly less compressed implementation than the one on page 69 would be:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
for(j = i, k = 0; t[k] != '\0'; j++, k++)
if(s[j] != t[k])
break;
if(t[k] == '\0')
return i;
}
http://www.eskimo.com/~scs/cclass/krnotes/sx7a.html (4 of 6) [22/07/2003 5:08:57 PM]
return -1;
}
Note that we have to check for the end of the string t twice: once to see if we're at the end of it in the
innermost loop, and again to see why we terminated the innermost loop. (If we terminated the innermost
loop because we reached the end of t, we found a match; otherwise, we didn't.) We could rearrange things
to remove the duplicated test:
int strindex(char s[], char t[])
{
int i, j, k;
for (i = 0; s[i] != '\0'; i++) {
j = i;
k = 0;
do {
if(t[k] == '\0')
return i;
} while(s[j++] == t[k++]);
}
return -1;
}
It's a matter of style which implementation of strindex is preferable; it's impossible to say which is
``best.'' (Can you see a slight difference in the behavior of the version on page 69 versus the two here?
Under what circumstance(s) would this difference be significant? How would the version on page 69
behave under those circumstances, and how would the two routines here behave?)
page 70
Deep sentence:
A program is just a set of definitions of variables and functions.
This sentence may or may not seem deep, and it may or may not be deep, but it's a fundamental definition
of what a C program is.
Note that a function's return value is automatically converted to the return type of the function, if necessary,
just as in assignments like
f = i;
where f is float and i is int.
Most programmers do use parentheses around the expression in a return statement, because that way it
looks more like while(), for(), etc. The reason the parentheses are optional is that the formal syntax is
return expression ;
and, as we know, any expression surrounded by parentheses is another expression.
It's debatable whether it's ``not illegal'' for a function to have return statements with and without values.
It's a ``sign of trouble'' at best, and undefined at worst. Another clear sign of trouble (which is equally
undefined) is when a function returns no value, or is declared as void, but a caller attempts to use the
return value.
The main program on page 69 returns the number of matching lines found. This is probably better than
returning nothing, but the convention is usually that a C program returns 0 when it succeeds and a positive
number when it fails.
``The standard library includes an atof'' means that we're reimplementing something which would
otherwise be provided for us anyway (i.e. just like printf). In general, it's a bad idea to rewrite
standard library routines, because by doing so you negate the advantage of having someone else write
them for you, and also because the compiler or linker are allowed to complain if you redefine a standard
routine. (On the other hand, seeing how the standard library routines are implemented can be a good
learning experience.)
page 72
In the ``primitive calculator'' code at the top of page 72, note that the call to atof is buried in the
argument list of the call to printf.
Deep sentences:
The function atof must be declared and defined consistently. If atof itself and the call
to it in main have inconsistent types in the same source file, the error will be detected by
the compiler. But if (as is more likely) atof were compiled separately, the mismatch
would not be detected, atof would return a double that main would treat as an int,
and meaningless answers would result.
The problems of mismatched function declarations are somewhat reduced today by the widespread use of
ANSI function prototypes, but they're still important to be aware of.
The implicit function declarations mentioned at the bottom of page 72 are an older feature of the
language. They were handy back in the days when most functions returned int and function prototypes
hadn't been invented yet, but today, if you want to use prototypes, you won't want to rely on implicit
http://www.eskimo.com/~scs/cclass/krnotes/sx7b.html (1 of 3) [22/07/2003 5:09:00 PM]
declarations. If you don't like depending on defaults and implicit declarations, or if you do want to use
function prototypes religiously, you're under no compunction to make use of (or even learn about)
implicit function declarations, and you'll want to configure your compiler so that it will warn you if you
call a function which does not have an explicit, prototyped declaration in scope.
You may wonder why the compiler is able to get some things right (such as implicit conversions between
integers and floating-point within expressions) whether or not you're explicit about your intentions, while
in other circumstances (such as while calling functions returning non-integers) you must be explicit. The
question of when to be explicit and when to rely on the compiler hinges on several questions:
1. How much information does the compiler have available to it?
2. How likely is it that the compiler will infer the right action?
3. How likely is it that a mistake which you the programmer might make will be caught by the
compiler, or silently compiled into incorrect code?
It's fine to depend on things like implicit conversions as long as the compiler has all the information it
needs to get them right, unambiguously. (Relying on implicit conversions can make code cleaner, clearer,
and easier to maintain.) Relying on implicit declarations, however, is discouraged, for several reasons.
First, there are generally fewer declarations than expressions in a program, so the impact (i.e. work) of
making them all explicit is less. Second, thinking about declarations is good discipline, and requiring that
everything normally be declared explicitly can let the compiler catch a number of errors for you (such as
misspelled functions or variables). Finally, since the compiler only compiles one source file at a time, it
is never able to detect inconsistencies between files (such as a function or variable declared one way in
once source file and some other way in another), so it's important that cross-file declarations be explicit
and consistent. (Various strategies, such as placing common declarations in header files so that they can
be #included wherever they're needed, and requesting that the compiler warn about function calls without
prototypes in scope, can help to reduce the number of errors having to do with improper declarations.)
For the most part, you can also ignore the ``old style'' function syntax, which hardly anyone is using any
more. The only thing to watch out for is that an empty set of parentheses in a function declaration is an
old-style declaration and means ``unspecified arguments,'' not ``no arguments.'' To declare a new-style
function taking no arguments, you must include the keyword void between the parentheses, which
makes the lack of arguments explicit. (A declaration like
int f(void);
does not declare a function accepting one argument of type void, which would be meaningless, since
the definition of type void is that it is a type with no values. Instead, as a special case, a single,
unnamed parameter of type void indicates that a function takes no arguments.) For example, the
definition of the getchar function might look like
int getchar(void)
http://www.eskimo.com/~scs/cclass/krnotes/sx7b.html (2 of 3) [22/07/2003 5:09:00 PM]
{
int c;
read next character into c somehow
if (no next character)
return EOF;
return c;
}
page 73
Note that this version of atoi, written in terms of atof, has very slightly different behavior: it reads
past a '.' (and, assuming a fully-functional version of atof, an 'e').
The use of an explicit cast when returning a floating-point expression from a routine declared as
returning int represents another point on the spectrum of what you should worry about explicitly versus
what you should feel comfortable making use of implicitly. This is a case where the compiler can do the
``right thing'' safely and unambiguously, as long as what you said (in this case, to return a floating-point
expression from a routine declared as returning int) is in fact what you meant. But since the real
possibility exists that discarding the fractional part is not what you meant, some compilers will warn you
about it. Typically, compilers which warn about such things can be quieted by using an explicit cast; the
explicit cast (even though it appears to ask for the same conversion that would have happened implicitly)
serves to silence the warning. (In general, it's best to silence spurious warnings rather than just ignoring
them. If you get in the habit of ignoring them, sooner or later you'll overlook a significant one that you
would have cared about.)
A ``stack'' is simply a last-in, first-out list. You ``push'' data items onto a stack, and whenever you ``pop''
an item from the stack, you get the one most recently pushed.
pages 76-79
The code for the calculator may seem daunting at first, but it's much easier to follow if you look at each
part in isolation (as good functions are meant to be looked at), and notice that the routines fall into three
levels. At the top level is the calculator itself, which resides in the function main. The main function
calls three lower-level functions: push, pop, and getop. getop, in turn, is written in terms of the still
lower-level functions getch and ungetch.
A few details of the communication among these functions deserve mention. The getop routine actually
returns two values. Its formal return value is a character representing the next operation to be performed.
Usually, that character is just the character the user typed, that is, +, -, *, or /. In the case of a number
typed by the user, the special code NUMBER is returned (which happens to be #defined to be the
character '0', but that's arbitrary). A return value of NUMBER indicates that an entire string of digits has
been typed, and the string itself is copied into the array s passed to getop. In this case, therefore, the
array s is the second return value.
In some printings, the second line on page 76 reads
#include <math.h>
/* for atof() */
/* for atof() */
page 77
Make sure you understand why the code
push(pop() - pop());
/* WRONG */
Note that getop does not incorporate the functionality of atoi or atof--it collects and returns the
digits as a string, and main calls atof to convert the string to a floating-point number (prior to pushing
it on the stack). (There's nothing profound about this arrangement; there's no particular reason why
getop couldn't have been set up to do the conversion itself.)
The reasons for using a routine like ungetch are good and sufficient, but they may not be obvious at
first. The essential motivation, as the authors explain, is that when we're reading a string of digits, we
don't know when we've reached the end of the string of digits until we've read a non-digit, and that nondigit is not part of the string of digits, so we really shouldn't have read it yet, after all. The rest of the
program is set up based on the assumption that one call to getop will return the string of digits, and the
next call will return whatever operator followed the string of digits.
To understand why the surprising and perhaps kludgey-sounding getch/ungetch approach is in fact a
good one, let's consider the alternatives. getop could keep track of the one-too-far character somehow,
and remember to use it next time instead of reading a new character. (Exercise 4-11 asks you to
implement exactly this.) But this arrangement of getop is considerably less clean from the standpoint of
the ``invariants'' we were discussing earlier. getop can be written relatively cleanly if one of its
invariants is that the operator it's getting is always formed by reading the next character(s) from the input
stream. getop would be considerably messier if it always had to remember to use an old character if it
had one, or read a new character otherwise. If getop were modified later to read new kinds of operators,
and if reading them involved reading more characters, it would be easy to forget to take into account the
possibility of an old character each time a new character was needed. In other words, everywhere that
getop wanted to do the operation
read the next character
it would instead have to do
if (there's an old character)
use it
else
read the next character
It's much cleaner to push the checking for an old character down into the getch routine.
Devising a pair of routines like getch and ungetch is an excellent example of the process of
abstraction. We had a problem: while reading a string of digits, we always read one character too far.
The obvious solution--remembering the one-too-far character and using it later--would have been clumsy
if we'd implemented it directly within getop. So we invented some new functions to centralize and
encapsulate the functionality of remembering accidentally-read characters, so that getop could be
written cleanly in terms of a simple ``get next character'' operation. By centralizing the functionality, we
make it easy for getop to use it consistently, and by encapsulating it, we hide the (potentially ugly)
http://www.eskimo.com/~scs/cclass/krnotes/sx7c.html (3 of 4) [22/07/2003 5:09:02 PM]
details from the rest of the program. getch and ungetch may be tricky to write, but once we've
written them, we can seal up the little black boxes they're in and not worry about them any more, and the
rest of the program (especially getop) is cleaner.
page 79
If you're not used to the conditional operator ?: yet, here's how getch would look without it:
int getch(void)
{
if (bufp > 0)
return buf[--bufp];
else
return getchar();
}
Also, the extra generality of these two routines (namely, that they can push back and remember several
characters, a feature which the calculator program doesn't even use) makes them a bit harder to follow.
Exercise 4-8 asks you two write simpler versions which allow only one character of pushback. (Also, as
the text notes, we don't really have to be writing ungetch at all, because the standard library already
provides an ungetc which can provide one character of pushback for getchar.)
When we defined a stack, we said that it was ``last-in, first-out.'' Are the versions of getch and
ungetch on page 79 last-in, first-out or first-in, first out? Do you agree with this choice?
One last note: the name of the variable bufp suggests that it is a pointer, but it's actually an index into
the buf array.
(for example) four characters actually contains five, because of the terminating '\0'.
But if you're writing a recursive function, as long as you do a little bit of work yourself, and only pass on
a portion of the job to another instance of yourself, you haven't completely reneged on your
responsibilities. Furthermore, if you're ever called with such a small job to do that the little bit you're
willing to do encompasses the whole job, you don't have to call yourself again (there's no remaining
portion that you can't do). Finally, since each recursive call does some work, passing on smaller and
smaller portions to succeeding recursive calls, and since the last call (where the remaining portion is
empty) doesn't generate any more recursive calls, the recursion is broken and doesn't constitute an
infinite loop.
Don't worry about the quicksort example if it seems impenetrable--quicksort is an important algorithm,
but it is not easy to understand completely at first.
Note that the qsort routine described here is very different from the standard library qsort (in fact, it
probably shouldn't even have the same name).
page 90
The correct way to write the square() macro is
#define square(x) ((x) * (x))
There are three rules to remember when defining function-like macros:
1. The macro expansion must always be parenthesized so that any low-precedence operators it
contains will still be evaluated first. If we didn't write the square() macro carefully, the
invocation
1 / square(n)
might expand to
1 / n * n
while it should expand to
1 / (n * n)
2. Within the macro definition, all occurrences of the parameters must be parenthesized so that any
low-precedence operators the actual arguments contain will be evaluated first. If we didn't write
the square() macro carefully, the invocation
square(n + 1)
might expand to
n + 1 * n + 1
while it should expand to
(n + 1) * (n + 1)
3. If a parameter appears several times in the expansion, the macro may not work properly if the
actual argument is an expression with side effects. No matter how we parenthesize the
square() macro, the invocation
square(i++)
would result in
i++ * i++
(perhaps with some parentheses), but this expression is undefined, because we don't know when
the two increments will happen with respect to each other or the multiplication.
Since the square() macro can't be written perfectly safely, (arguments with side effects will always be
troublesome), its callers will always have to be careful (i.e. not to call it with arguments with side
effects). One convention is to capitalize the names of macros which can't be treated exactly as if they
were functions:
#define Square(x) ((x) * (x))
page 90 continued
#undef can be used when you want to give a macro restricted scope, if you can remember to undefine it
when you want it to go out of scope. Don't worry about ``[ensuring] that a routine is really a function, not
a macro'' or the getchar example.
Also, don't worry about the # and ## operators. These are new ANSI features which aren't needed except
in relatively special circumstances.
purposes. At the top of the page, HDR is a simple on-off switch; it is #defined (with no replacement
text) when hdr.h is #included for the first time, and any subsequent #inclusion merely tests whether
HDR is #defined. (Note that it is in fact quite possible to define a macro with no replacement text; a
macro so defined is distinguishable from a macro which has not been #defined at all. One common
use of a macro with no replacement text is precisely as a simple #if switch like this.)
At the bottom of the page, HDR ends up containing the name of a header file to be #included; the
name depends on the #if and #elif directives. The line
#include HDR
#includes one of them, depending on the final value of HDR.
/* an integer */
/* a pointer-to-int */
ip = &i;
printf("%d\n", *ip);
*ip = 5;
/* ip points to i */
/* prints i, which is 1 */
/* sets i to 5 */
(The obvious questions are, ``if you want to print i, or set it to 5, why not just do it? Why mess around
with this `pointer' thing?'' More on that in a minute.)
The unary & and * operators are complementary. Given an object (i.e. a variable), & generates a pointer
to it; given a pointer, * ``returns'' the value of the pointed-to object. ``Returns'' is in quotes because, as
you may have noticed in the examples, you're not restricted to fetching values via pointers: you can also
store values via pointers. In an assignment like
*ip = 0;
the subexpression *ip is conceptually ``replaced'' by the object which ip points to, and since *ip
appears on the left-hand side of the assignment operator, what happens to the pointed-to object is that it
gets assigned to.
One of the things that's hard about pointers is simply talking about what's going on. We've been using the
words ``return'' and ``replace'' in quotes, because they don't quite reflect what's actually going on, and
we've been using clumsy locutions like ``fetch via pointers'' and ``store via pointers.'' There is some
jargon for referring to pointer use; one word you'll often see is dereference, a term which, though its
derivation is suspect, is used to mean ``follow a pointer to get at, and use, the object it points to.'' Thus,
we sometimes call unary * the ``pointer dereferencing operator,'' and we may say that the expressions
printf("%d\n", *ip);
and
*ip = 5;
both ``dereference the pointer ip.'' We may also talk about indirecting on a pointer: to indirect on a
pointer is again to follow it to see what it points to; and * may also be called the ``pointer indirection
operator.''
Our examples of pointers so far have been, admittedly, artificial and rather meaningless. Let's try a
slightly more realistic example. In the previous chapter, we used the routines atoi and atof to convert
strings representing numbers to the actual numbers represented. Often the strings were typed by the user,
and read with getline. As you may have noticed, neither atoi nor atof does any validity or error
checking: both simply stop reading when they reach a character that can't be part of the number they're
converting, and if there aren't any numeric characters in the string, they simply return 0. (For example,
atoi("49er") is 49, and atoi("three") is 0, and atof("1.2.3") is 1.2 .) These attributes
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (2 of 7) [22/07/2003 5:09:22 PM]
make atoi and atof easy to write and easy (for the programmer) to use, but they are not the most userfriendly routines possible. A good user interface would warn the user and prompt again in case of
invalid, non-numeric input.
Suppose we were writing a simple inventory-control system. For each part stored in our warehouse, we
might record the part number, location, and number of parts on hand. For simplicity, we'll assume that
the location is always a simple bin number.
Somewhere in the inventory-control program, we might find the variables
int part_number;
int location;
int number_on_hand;
and there might be a routine that lets the user enter any of these numbers. Suppose that there is another
variable,
int which_entry;
which indicates which of the three numbers is being entered (1 for part_number, 2 for location, or
3 for number_on_hand). We might have code like this:
char instring[30];
switch (which_entry) {
case 1:
printf("enter part number:\n");
getline(instring, 30);
part_number = atoi(instring);
break;
case 2:
printf("enter location:\n");
getline(instring, 30);
location = atoi(instring);
break;
case 3:
printf("enter number on hand:\n");
getline(instring, 30);
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (3 of 7) [22/07/2003 5:09:22 PM]
number_on_hand = atoi(instring);
break;
}
Suppose that we now begin to add a bit of rudimentary verification to the input routines. The first case
might look like
case 1:
do {
printf("enter part number:\n");
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
part_number = atoi(instring);
} while (part_number == 0);
break;
If the first character is not a digit, or if atoi returns 0, the code goes around the loop another time, and
prompts the user again, in hopes that the user will type some proper numeric input this time. (The tests
for numeric input are not sufficient, nor even wise if 0 is a possible input value, as it presumably is for
number on hand. In fact, the two tests really do the same thing! But please overlook these faults. If you're
curious, you can learn about a new ANSI function, strtol, which is like atoi but gives you a bit
more control, and would be a better routine to use here.)
The code fragment above is for just one of the three input cases. The obvious way to perform the same
checking for the other two cases would be to repeat the same code two more times, changing the prompt
string and the name of the variable assigned to (location or number_on_hand instead of
part_number). Duplicating the code is a nuisance, though, especially if we later come up with a better
way to do input verification (perhaps one not suffering from the imperfections mentioned above). Is there
a better way?
One way would be to use a temporary variable in the input loop, and then set one of the three real
variables to the value of the temporary variable, depending on which_entry:
int temp;
do {
printf("enter the number:\n");
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
temp = atoi(instring);
http://www.eskimo.com/~scs/cclass/krnotes/sx8a.html (4 of 7) [22/07/2003 5:09:22 PM]
do {
printf("enter %s:\n", prompt);
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
*numpointer = atoi(instring);
} while (*numpointer == 0);
The idea here is that prompt is the prompt string and numpointer points to the particular numeric
value we're entering. That way, a single input verification loop can print any of the three prompts and set
any of the three numeric variables, depending on where numpointer points. (We won't officially see
character pointers and strings until section 5.5, so don't worry if the use of the prompt pointer seems
new or inexplicable.)
This example is, in its own ways, quite artificial. (In a real inventory-control program, we'd obviously
need to keep track of many parts; we couldn't use single variables for the part number, location, and
quantity. We probably wouldn't really have a which_entry variable telling us which number to
prompt for, and we'd do the numeric validation quite differently. We might well do numeric entry and
validation in a separate function, removing this need for the pointers.) However, the pointer aspect of this
example--using a pointer to refer to one of several different things, so that one generic piece of code can
access any of the things--is a very typical (i.e. realistic) use of pointers.
There's one nuance of pointer declarations which deserves mention. We've seen that
int *ip;
declares the variable ip as a pointer to an int. We might look at that declaration and imagine that int
* is the type and ip is the name of the variable being declared. (Actually, so far, these assumptions are
both true.) We might therefore imagine that a more ``obvious'' way of writing the declaration would be
int* ip;
This would work, but it is misleading, as we'll see if we try to declare two int pointers at once. How
shall we do it? If we try
int* ip1, ip2;
/* WRONG */
we don't succeed; this would declare ip1 as a pointer-to-int, but ip2 as an int (not a pointer). The
correct declaration for two pointers is
char line[20];
if (getline(line, 20) <= 0)
return EOF;
*pn = atoi(line);
return 1;
}
The getint function on page 97 is documented as returning nonzero for a valid number and 0 for
something other than a number. Our stripped-down version does not, and as it happens, the example code
at the bottom of page 96 does not make use of the valid/invalid distinction. Can you see a way to rewrite
the code at the bottom of page 96 to fill in the cells of the array with only valid numbers?
You might also notice, again from the code at the bottom of page 96, that & need not be restricted to
single, simple variables; it can take the address of any data object, in this case, one cell of the array.
Just as for all of C's other operators, & can be applied to arbitrary expressions, although it is restricted to
expressions which represent addressable objects. Expressions like &1 or &(2+3) are meaningless and
illegal.
You may remember a discussion from section 1.5.1 on page 16 of how C's getchar routine is able to
return all possible characters, plus an end-of-file indication, in its single return value. Why does getint
need two return values? Why can't it use the same trick that getchar does?
The examples in this section are again relatively safe. The pointers have all been parameters, and the
callers have passed pointers (that is, the ``addresses'' of) their own, properly-allocated variables. That is,
code like
int a = 1, b = 2;
swap(&a, &b);
and
int a;
getint(&a);
is correct and quite safe.
Something to beware of, though, is the temptation to inadvertently pass an uninitialized pointer variable
(rather than the ``address'' of some other variable) to a routine which expects a pointer. We know that the
getint routine expects as its argument a pointer to an int in which it is to store the integer it gets.
Suppose we took that description literally, and wrote
int *ip;
/* a pointer to an int */
getint(ip);
Here we have in fact passed a pointer-to-int to getint, but the pointer we passed doesn't point
anywhere! When getint writes to (``dereferences'') the pointer, in an expression like *pn = 0, it will
scribble on some random part of memory, and the program may crash. When people get caught in this
trap, they often think that to fix it they need to use the & operator:
getint(&ip);
/* WRONG */
/* WRONG */
but these go from bad to worse. (If you think about them carefully, &ip is a pointer-to-pointer-to-int,
and *ip is an int, and neither of these types matches the pointer-to-int which getint expects.) The
correct usage for now, as we showed already, is something like
int a;
getint(&a);
In this case, a is an honest-to-goodness, allocated int, so when we generate a pointer to it (with &a) and
call getint, getint receives a pointer that does point somewhere.
The meaning of ``adding 1 to a pointer,'' and by extension, all pointer arithmetic, is that
pa+1 points to the next object, and pa+i points to the i-th object beyond pa.
This aspect of pointers--that arithmetic works on them, and in this way--is one of several vital facts about
pointers in C. On the next page, we'll see the others.
page 99
Deep sentences:
The correspondence between indexing and pointer arithmetic is very close. By definition,
the value of a variable or expression of type array is the address of element zero of the
array.
This is a fundamental definition, which we'll now spend several pages discussing.
Don't worry too much yet about the assertion that ``pa and a have identical values.'' We're not surprised
about the value of pa after the assignment pa = &a[0]; we've been taking the address of array
elements for several pages now. What we don't know--we're not yet in a position to be surprised about it
or not--is what the ``value'' of the array a is. What is the value of the array a?
In some languages, the value of an array is the entire array. If an array appears on the right-hand sign of
an assignment, the entire array is assigned, and the left-hand side had better be an array, too. C does not
work this way; C never lets you manipulate entire arrays.
In C, by definition, the value of an array, when it appears in an expression, is a pointer to its first
element. In other words, the value of the array a simply is &a[0]. If this statement makes any kind of
intuitive sense to you at this point, that's great, but if it doesn't, please just take it on faith for a while.
This statement is a fundamental (in fact the fundamental) definition about arrays and pointers in C, and if
you don't remember it, or don't believe it, then pointers and arrays will never make proper sense. (You
will also need to know another bit of jargon: we often say that, when an array appears in an expression, it
decays into a pointer to its first element.)
Given the above definition, let's explore some of the consequences. First of all, though we've been saying
pa = &a[0];
we could also say
pa = a;
because by definition the value of a in an expression (i.e. as it sits there all alone on the right-hand side)
is &a[0]. Secondly, anywhere we've been using square brackets [] to subscript an array, we could also
have used the pointer dereferencing operator *. That is, instead of writing
i = a[5];
we could, if we wanted to, instead write
i = *(a+5);
Why would this possibly work? How could this possibly work? Let's look at the expression *(a+5) step
by step. It contains a reference to the array a, which is by definition a pointer to its first element. So
*(a+5) is equivalent to *(&a[0]+5). To make things clear, let's pretend that we'd assigned the
pointer to the first element to an actual pointer variable:
int *pa = &a[0];
Now we have *(a+5) is equivalent to *(&a[0]+5) is equivalent to *(pa+5). But we learned on
page 98 that *(pa+5) is simply the contents of the location 5 cells past where pa points to. Since pa
points to a[0], *(pa+5) is a[5]. Thus, for whatever it's worth, any time you have an array subscript
a[i], you could write it as *(a+i).
The idea of the previous paragraph isn't worth much, because if you've got an array a, indexing it using
the notation a[i] is considerably more natural and convenient than the alternate *(a+i). The
significant fact is that this little correspondence between the expressions a[i] and *(a+i) holds for
more than just arrays. If pa is a pointer, we can get at locations near it by using *(pa+i), as we learned
on page 98, but we can also use pa[i]. This time, using the ``other'' notation (array instead of pointer,
when we thought we had a pointer) can be more convenient.
At this point, you may be asking why you can write pa[i] instead of *(pa+i). You may be
wondering how you're going to remember that you can do this, or remember what it means if you see it
in someone else's code, when it's such a surprising fact in the first place. There are several ways to
remember it; pick whichever one suits you:
1. It's an arbitrary fact, true by definition; just memorize it.
2. If, for an array a, instead of writing a[i], you can also write *(a+i) (as we proved a few
paragraphs back); then it's only fair that for a pointer pa, instead of writing *(pa+i), you can
also write pa[i].
3. Deep sentence: ``In evaluating a[i], C converts it to *(a+i) immediately; the two forms are
equivalent.''
4. An array is a contiguous block of elements of a particular type. A pointer often points to a
contiguous block of elements of a particular type. Therefore, it's very handy to treat a pointer to a
http://www.eskimo.com/~scs/cclass/krnotes/sx8c.html (3 of 7) [22/07/2003 5:09:26 PM]
Whenever we wrote a function like getline or getop which seemed to accept an array of characters,
and whenever we thought we were passing arrays of characters to these routines, we were actually
passing pointers. This explains, among other things, how getline and getop were able to modify the
arrays in the caller, even though we said that call-by-value meant that functions can't modify variables in
their callers since they receive copies of the parameters. When a function receives a pointer, it cannot
modify the original pointer in the caller, but it can definitely modify what the pointer points to.
If that doesn't make sense, make sure you appreciate the full difference between a pointer and what it
points to! It is intirely possible to modify one without modifying the other. Let's illustrate this with an
example. If we say
char a[] = "hello";
char b[] = "world";
we've declared two character arrays, a and b, each containing a string. If we say
char *p = a;
we've declared p as a pointer-to-char, and initialized it to point to the first character of the array a. If
we then say
*p = 'H';
we've modified what p points to. We have not modified p itself. After saying *p = 'H'; the string in
the array a has been modified to contain "Hello".
If we say
p = b;
on the other hand, we have modified the pointer p itself. We have not really modified what p points to.
In a sense, ``what p points to'' has changed--it used to be the string in the array a, and now it's the string
in the array b. But saying p = b didn't modify either of the strings.
page 100
Since, as we've just seen, functions never receive arrays as parameters, but instead always receive
pointers, how have we been able to get away with defining functions (like getline and getop) which
seemed to accept arrays? The answer is that whenever you declare an array parameter to a function, the
compiler pretends that you actually declared a pointer. (It does this mostly so that we can get away with
the ``gentle fiction'' of pretending that we can pass arrays to functions.)
When you see a statement like ``char s[]; and char *s; are equivalent'' (as in fact you see at the
top of page 100), you can be sure that (and you must remember that) it is only function formal parameters
that are being talked about. Anywhere else, arrays and pointers are quite different, as we've discussed.
Expressions like p[-1] (at the end of section 5.3) may be easier to understand if we convert them back
to the pointer form *(p + -1) and thence to *(p-1) which, as we've seen, is the object one before
what p points to.
With the examples in this section, we begin to see how pointer manipulations can go awry. In sections
5.1 and 5.2, most of our pointers were to simple variables. When we use pointers into arrays, and when
we begin using pointer arithmetic to access nearby cells of the array, we must be careful never to go off
the end of the array, in either direction. A pointer is only valid if it points to one of the allocated cells of
an array. (There is also an exception for a pointer just past the end of an array, which we'll talk about
later.) Given the declarations
int a[10];
int *pa;
the statements
pa = a;
*pa = 0;
*(pa+1) = 1;
pa[2] = 2;
pa = &a[5];
*pa = 5;
*(pa-1) = 4;
pa[1] = 6;
pa = &a[9];
*pa = 9;
pa[-1] = 8;
are all valid. These statements set the pointer pa pointing to various cells of the array a, and modify
some of those cells by indirecting on the pointer pa. (As an exercise, verify that each cell of a that
receives a value receives the value of its own index. For example, a[6] is set to 6.)
However, the statements
pa = a;
*(pa+10) = 0;
*(pa-1) = 0;
pa = &a[5];
/* WRONG */
/* WRONG */
*(pa+10) = 0;
pa = &a[10];
*pa = 0;
/* WRONG */
/* WRONG */
and
int *pa2;
pa = &a[5];
pa2 = pa + 10;
pa2 = pa - 10;
/* WRONG */
/* WRONG */
are all invalid. The first examples set pa to point into the array a but then use overly-large offsets (+10, 1) which end up trying to store a value outside of the array a. The statements in the last set of examples
set pa2 to point outside of the array a. Even though no attempt is made to access the nonexistent cells,
these statements are illegal, too. Finally, the code
int a[10];
int *pa, *pa2;
pa = &a[5];
pa2 = pa + 10;
*pa2 = 0;
/* WRONG */
/* WRONG */
would be very wrong, because it not only computes a pointer to the nonexistent 15<sup>th</sup> cell
of a 10-element array, but it also tries to store something there.
The 0 is just a shorthand; it does not necessarily mean machine address 0. To make it clear that we're
talking about the null pointer and not the integer 0, we often use a macro definition like
#define NULL 0
so that we can say things like
int *ip = NULL;
(If you've used Pascal or LISP, the nil pointer in those languages is analogous.)
In fact, the above #definition of NULL has been placed in the standard header file <stdio.h> for us
(and in several other standard header files as well), so we don't even need to #define it. I agree
completely with the authors that using NULL instead of 0 makes it more clear that we're talking about a
null pointer, so I'll always be using NULL, too.
Just as we can set a pointer to NULL, we can also test a pointer to see if it's NULL. The code
if(p != NULL)
*p = 0;
else
printf("p doesn't point anywhere\n");
tests p to see if it's non-NULL. If it's not NULL, it assumes that it points somewhere valid, and writes a 0
there. Otherwise (i.e. if p is the null pointer) the code complains.
Though we can use null pointers as markers to remind ourselves of which of our pointers don't point
anywhere, it's up to us to do so. It is not guaranteed that all uninitialized pointer variables (which
obviously don't point anywhere) are initialized to NULL, so if we want to use the null pointer convention
to remind ourselves, we'd best explicitly initialize all unused pointers to NULL. Furthermore, there is no
general mechanism that automatically checks whether a pointer is non-null before we use it. If we think
that a pointer might not point anywhere, and if we're using the convention that pointers that don't point
anywhere are set to NULL, it's up to us to compare the pointer to NULL to decide whether it's safe to use
it.
The next new piece of information in this section (which we've already alluded to) is pointer comparison.
You can compare two pointers for equality or inequality (== or !=): they're equal if they point to the
same place or are both null pointers; they're unequal if they point to different places, or if one points
somewhere and one is a null pointer. If two pointers point into the same array, the relational comparisons
<, <=, >, and >= can also be used.
page 103
http://www.eskimo.com/~scs/cclass/krnotes/sx8d.html (2 of 4) [22/07/2003 5:09:29 PM]
The sentences
...n is scaled according to the size of the objects p points to, which is determined by the
declaration of p. If an int is four bytes, for example, the int will be scaled by four.
say something we've seen already, but may only confuse the issue. We've said informally that in the code
int a[10];
int *pa = &a[0];
*(pa+1) = 1;
pa contains the ``address'' of the int object a[0], but we've discouraged thinking about this address as
an actual machine memory address. We've said that the expression pa+1 moves to the next int in the
array (in this case, a[1]). Thinking at this abstract level, we don't even need to worry about any
``scaling by the size of the objects pointed to.''
If we do look at a lower, machine level of addressing, we may learn that an int occupies some number
of bytes (usually two or four), such that when we add 1 to a pointer-to-int, the machine address is
actually increased by 2 or 4. If you like to consider the situation from this angle, you're welcome to, but
if you don't, you certainly don't have to. If you do start thinking about machine addresses and sizes, make
extra sure that you remember that C does do the necessary scaling for you. Don't write something like
int a[10];
int *pa = &a[0];
*(pa+sizeof(int)) = 1;
where sizeof(int) is the size of an int in bytes, and expect it to access a[1].
Since adding an int to a pointer gives us another pointer:
int a[10];
int *pa1 = &a[0];
int *pa2 = pa1 + 5;
we might wonder if we can rearrange the expression
pa2 = pa1 + 5
to get
pa2 - pa1
(where this is no longer a C assignment, we're just wondering if we can subtract pa1 from pa2, and
what the result might be). The answer is yes: just as you can compare two pointers which point into the
same array, you can subtract them, and the result is, naturally enough, the distance between them, in cells
or elements.
(In the large parenthetical statement in the middle of the page, don't worry too much about ptrdiff_t,
size_t, and sizeof.)
}
Note that we have to manually append the '\0' to s after the loop. Note that in doing so we depend
upon i retaining its final value after the loop, but this is guaranteed in C, as we learned in Chapter 3.
Here is a similar function, using pointer notation:
void strcpy(char *s, char *t)
{
while(*t != '\0')
*s++ = *t++;
*s = '\0';
}
Again, we have to manually append the '\0'. Yet another option might be to use a do/while loop.
All of these versions of strcpy are quite similar to the copy function we saw on page 29 in section
1.9.
page 106
The version of strcpy at the top of this page is my least favorite example in the whole book. Yes,
many experienced C programmers would write strcpy this way, and yes, you'll eventually need to be
able to read and decipher code like this, but my own recommendation against this kind of cryptic code is
strong enough that I'd rather not show this example yet, if at all.
We need strcmp for about the same reason we need strcpy. Just as we cannot assign one string to
another using =, we cannot compare two strings using ==. (If we try to use ==, all we'll compare is the
two pointers. If the pointers are equal, they point to the same place, so they certainly point to the same
string, but if we have two strings in two different parts of memory, pointers to them will always compare
different even if the strings pointed to contain identical sequences of characters.)
Note that strcmp returns a positive number if s is greater than t, a negative number if s is less than t,
and zero if s compares equal to t. ``Greater than'' and ``less than'' are interpreted based on the relative
values of the characters in the machine's character set. This means that 'a' < 'b', but (in the ASCII
character set, at least) it also means that 'B' < 'a'. (In other words, capital letters will sort before
lower-case letters.) The positive or negative number which strcmp returns is, in this implementation at
least, actually the difference between the values of the first two characters that differ.
Note that strcmp returns 0 when the strings are equal. Therefore, the condition
if(strcmp(a, b))
http://www.eskimo.com/~scs/cclass/krnotes/sx8e.html (3 of 5) [22/07/2003 5:09:31 PM]
do something...
doesn't do what you probably think it does. Remember that C considers zero to be ``false'' and nonzero to
be ``true,'' so this code does something if the strings a and b are unequal. If you want to do something if
two strings are equal, use code like
if(strcmp(a, b) == 0)
do something...
(There's nothing fancy going on here: strcmp returns 0 when the two strings are equal, so that's what
we explicitly test for.)
To continue our ongoing discussion of which pointer manipulations are safe and which are risky or must
be done with care, let's consider character pointers. As we've mentioned, one thing to beware of is that a
pointer derived from a string literal, as in
char *pmessage = "now is the time";
is usable but not writable (that is, the characters pointed to are not writable.) Another thing to be careful
of is that any time you copy strings, using strcpy or some other method, you must be sure that the
destination string is a writable array with enough space for the string you're writing. Remember, too, that
the space you need is the number of characters in the string you're copying, plus one for the terminating
'\0'.
For the above reasons, all three of these examples are incorrect:
char *p1 = "Hello, world!";
char *p2;
strcpy(p2, p1);
/* WRONG */
strcpy(p4, p3);
/* WRONG */
In the first example, p2 doesn't point anywhere. In the second example, a is a writable array, but it
doesn't have room for the terminating '\0'. In the third example, p4 points to memory which we're not
allowed to overwrite. A correct example would be
char *p = "Hello, world!";
char a[14];
strcpy(a, p);
(Another option would be to obtain some memory for the string copy, i.e. the destination for strcpy,
using dynamic memory allocation, but we're not talking about that yet.)
page 106 continued (bottom)
Expressions like *p++ and *--p may seem cryptic at first sight, but they're actually analogous to array
subscript expressions like a[i++] and a[--i], some of which we were using back on page 47 in
section 2.8.
On the left are the pointers before sorting, and on the right are the pointers after sorting. The three strings
have not been moved, but by reshuffling the pointers, the three pointers in order now point to the lines
abc
defghi
jklmnopqrst
page 108
Once again, we see a nice simple decomposition of the problem, which might seem deceptively simple
except that when problems are decomposed in simple ways like this, and then implemented faithfully,
they really can be this simple. Deferring the sorting step is an excellent idea, especially if we didn't quite
follow the details of the sorting functions in the previous chapter. (Actually, in practice, we can usually
defer the sorting step forever, since there's often a general-purpose sort routine provided for us
somewhere. C is no exception: a qsort function is a required part of its standard library. For the most
part, the only people who have to write sort routines are programming students and the few people who
get stuck implementing system functions.)
The main program at the bottom of page 108 looks a bit more elaborate than the pseudocode at the top
of the page, but the essence of the program is the three calls to readlines, qsort, and
writelines. Everything else is declarations, plus an error message which is printed if readlines is
for some reason not able to read the input. Eventually, you should be able to understand why all of the
various declarations are required, but you can skim over them at first.
page 109
The readlines function first calls our old friend getline to read each line into a local array, line.
On page 29 in section 1.9, we saw a program for finding the longest line in the input: it read each line
into a local array line, and kept a copy of the longest line in a second array longest. In that program,
it didn't matter that the input array line was continually overwritten with each new input line, and that
most lines (except the longest one) were lost and forgotten. Here, however, we do need to save all of the
input lines somewhere, so that we can sort them and print them later.
The lines are saved by calling alloc, a function which we wrote in section 5.4 but may have skimmed
over. alloc allocates n bytes of new memory for something which we need to save. Each time we read
another line, we call alloc to allocate some new memory to store it, then call strcpy to copy the line
from the line array to the newly allocated memory. This way, it's okay that the next line is read into the
same line array; we save each line, as it's read, into its own little alloc'ed piece of memory.
Note that memory allocated with a routine such as alloc persists, just as global and static variables
do; it does not disappear when the function that allocated it returns.
Hopefully you're getting used to reading compressed condition statements by now, because here's
another doozy:
if (nlines >= maxlines || (p == alloc(len)) == NULL)
This line checks to make sure we have enough room to store the new line we just read. We need two
things: (1) a slot in the lineptr array to store the pointer, and (2) space allocated by alloc to store
the line itself. If we don't have either of these things, we return -1, indicating that we ran out of memory.
We don't have a slot in the lineptr array if we've already read maxlines lines, and we don't have
room to store the line itself if alloc returns NULL. The subexpression (p = alloc(len)) ==
NULL is equivalent in form to to other assign-and-test combinations we've been using involving
getchar and getline: it assigns alloc's return value to p, then compares it to NULL.
Normally, we might be suspicious of the call alloc(len). Why? Remember that strings are always
terminated by '\0', so the space required to store a string is always one more than the the number of
characters in it. Normally, we'll call things like alloc(len + 1), and accidentally calling
alloc(len) is usually a bug. Here, it happens to be okay, because before we copy the line to the
newly-allocated memory, we strip the newline '\n' from the end of it, by overwriting it with '\0',
hence making the string one shorter than len. (Why is the last character in line, namely the '\n', at
line[len-1], and not line[len]?)
The fragments
if (nlines >= maxlines ...
and
lineptr[nlines++] = p;
deserve some attention. These represent a common way of filling in an array in C. nlines always holds
the number of lines we've read so far (it's another invariant). It starts out as 0 (we haven't read any lines
yet) and it ends up as the total number of lines we've read. Each time we read a new line, we store the
line (more precisely, a pointer to it) in lineptr[nlines++]. By using postfix ++, we store the
pointer in the slot indexed by the previous value of nlines, which is what we want, because arrays are
0-based in C. The first time through the loop, nlines is 0, so we store a pointer to the first line in
lineptr[0], and then increment nlines to 1. If nlines ever becomes equal to maxlines, we've
filled in all the slots of the array, and we can't use any more (even though, at that point, the highest-filled
cell in the array is lineptr[maxlines-1], which is the last cell in the array, again because arrays
are 0-based). We test for this condition by checking nlines >= maxlines, as a little measure of
paranoia. The test nlines == maxlines would also work, but if we ever accidentally introduce a
bug into the program such that we fill past the last slot without noticing it, we wouldn't want to keep on
filling farther and farther past the end.
Deep sentences:
http://www.eskimo.com/~scs/cclass/krnotes/sx8f.html (3 of 5) [22/07/2003 5:09:34 PM]
Precomputing an array like that might make things a tiny bit easier on the computer (it wouldn't have to loop
through the entire array each time, as it does in the day_of_year function), but it makes it considerably
harder to see what the numbers mean, and to verify that they are correct. The simple table of individual month
lengths is much clearer, and if the computer has to do a bit more grunge work, well, that's what computers are
for. As explained in another book co-authored by Brian Kernighan:
A cumulative table of days must be calculated by someone and checked by someone else. Since
few people are familiar with the number of days up to the end of a particular month, neither
writing nor checking is easy. But if instead we use a table of days per month, we can let the
computer count them for us. (``Let the machine do the dirty work.'')
The bottom of page 112 begins to get confusing. The ``number of rows'' of an array like daytab ``is
irrelevant'' when passed to a function such as the hypothetical f because the compiler doesn't need to know the
number of rows when calculating subscripts. It does need to know the number of columns or ``width,'' because
that's how it knows that the second element on the second row of a 10-column array is actually 12 cells past the
beginning of the array, which is essentially what it needs to know when it goes off and actually accesses the
array in memory. But it doesn't need to know how long the overall array is, as long as we promise not to run off
the end of it, and that's always up to us. (This is why we haven't specified the array sizes in the definitions of
functions such as getline on pages 29 and 69, or atoi on pages 43, 61, and 73, or readlines on page
109, although we did carry the array size as a separate argument to getline and readlines, to assist us in
our promise not to run off the end.)
The third version of f on page 112 comes about because of the ``gentle fiction'' involving array parameters. We
learned on page 99 that functions don't really receive arrays as parameters; they receive arrays (since any array
passed by the caller decayed immediately to a pointer). On page 39 we wrote a strlen function as
int strlen(char s[])
but on page 99 we rewrote it as
int strlen(char *s)
which is closer to the way the compiler sees the situation. (In fact, when we write int strlen(char
s[]), the compiler essentially rewrites it as int strlen(char *s) for us.) In the same way, a function
declared as
f(int daytab[][13])
can be rewritten by us (or if not, is rewritten by the compiler) to
f(int (*daytab)[13])
which declares the daytab parameter as a pointer-to-array-of-13-ints. Here we see two things: (1) the rewrite
which changes an array parameter to a pointer parameter happens only once (we end up with a pointer to an
http://www.eskimo.com/~scs/cclass/krnotes/sx8g.html (3 of 4) [22/07/2003 5:09:36 PM]
array, not a pointer to a pointer), and (2) the syntax for pointers to arrays is a bit messy, because of some
required extra parentheses, as explained in the text.
If this seems obscure, don't worry about it too much; just declare functions with array parameters matching the
arrays you call them with, like
f(int daytab[2][13])
and let the compiler worry about the rewriting.
Deep sentence:
More generally, only the first dimension (subscript) of an array is free; all the others have to be
specified.
This just says what we said already: when declaring an array as a function parameter, you can leave off the first
dimension because it is the overall length and not knowing it causes no immediate problems (unless you
accidentally go off the end). But the compiler always needs to know the other dimensions, so that it knows how
the rown and columns line up.
if you don't feel comfortable with it. I used to try write code like this, because it seemed to be what
everybody else did, but it never sat well, and it was always just a bit too hard to write and to prove
correct. I've reverted to simple, obvious loops like
int argi;
char *sep = "";
for (argi = 1; argi < argc; argi++) {
printf("%s%s", sep, argv[argi]);
sep = " ";
}
printf("\n");
Often, it's handy to have the original argc and argv around later, anyway. (This loop also shows
another way of handling space separators.)
page 116
Page 116 shows a simple improvement on the matching-lines program first presented on page 69; page
117 adds a few more improvements. The differences between page 69 and page 116 are that the pattern is
read from the command line, and strstr is used instead of strindex. The difference between page
116 and page 117 is the handling of the -n and -x options. (The next obvious improvement, which we're
not quite in a position to make yet, is to allow a file name to be specified on the command line, rather
than always reading from the standard input.)
page 117
Several aspects of this code deserve note.
The line
while (c = *++argv[0])
is not in error. (In isolation, it might look like an example of the classic error of accidentally writing =
instead of == in a comparison.) What it's actually doing is another version of a combined set-and-test: it
assigns the next character pointed to by argv[0] to c, and compares it against '\0'. You can't see the
comparison against '\0', because it's implicit in the usual interpretation of a nonzero expression as
``true.'' An explicit test would look like this:
while ((c = *++argv[0]) != '\0')
argv[0] is a pointer to a character in a string; ++argv[0] increments that pointer to point to the next
character in the string; and *++argv[0] increments the pointer while returning the next character
pointed to. argv[0] is not the first string on the command line, but rather whichever one we're looking
at now, since elsewhere in the loop we increment argv itself.
Some of the extra complexity in this loop is to make sure that it can handle both
-x -n
and
-xn
In pseudocode, the option-parsing loop is
for ( each word on the command line )
if ( it begins with '-' )
for ( each character c in that word )
switch ( c )
...
For comparison, here is another way of writing effectively the same loop:
int argi;
char *p;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (p = &argv[argi][1]; *p != '\0'; p++)
switch (*p) {
case 'x':
...
This uses array notation to access the words on the command line, but pointer notation to access the
characters within a word (more specifically, a word that begins with '-'). We could also use array
notation for both:
int argi, chari;
for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)
for (chari = 1; argv[argi][chari] != '\0'; chari++)
switch (argv[argi][chari]) {
case 'x':
...
In either case, the inner, character loop starts at the second character (index [1]), not the first, because
the first character (index [0]) is the '-'.
It's easy to see how the -n option is implemented. If -n is seen, the number flag is set to 1 (a.k.a.
``true''), and later, in the line-matching loop, each time a line is printed, if the number flag is true, the
line number is printed first. It's harder to see how -x works. An except flag is set to 1 if -x is present,
but how is except used? It's buried down there in the line
if ((strstr(line, *argv) != NULL) != except)
What does that mean? The subexpression
(strstr(line, *argv) != NULL)
is 1 if the line contains the pattern, and 0 if it does not. except is 0 if we should print matching lines,
and 1 if we should print non-matching lines. What we've actually implemented here is an ``exclusive
OR,'' which is ``if A or B but not both.'' Other ways of writing this would be
int matched = (strstr(line, *argv) != NULL);
if (matched && !except || !matched && except) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
int matched = (strstr(line, *argv) != NULL);
if (except ? !matched : matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
Chapter 6: Structures
Chapter 6: Structures
page 127
There's one other piece of motivation behind structures that it's useful to discuss. Suppose we didn't have
structures (or didn't know what they were or how to use them). Suppose we wanted to implement payroll
records. We might set up a bunch of parallel arrays, holding the names, mailing addresses, social security
numbers, and salaries of all of our employees:
char *name[100];
char *address[100];
long ssn[100];
float salary[100];
The idea here is that name[0], address[0], ssn[0], and salary[0] would describe one
employee, array slots with subscript [1] would describe the second employee, etc. There are at least two
problems with this scheme: first, if we someday want to handle more than 100 employees, we have to
remember to change the size of several arrays. (Using a symbolic constant like
#define MAXEMPLOYEES 100
would certainly help.)
More importantly, there would be no easy way to pass around all the information associated with a single
employee. Suppose we wanted to write the function print_employee, which will print all the
information associated with a particular employee. What arguments would this function take? We could
pass it the index to use to retrieve the information from the arrays, but that would mean that all of the
arrays would have to be global. We could pass the function an individual name, address, SSN, and salary,
but that would mean that whenever we added a new piece of information to the database (perhaps next
week we'll want to keep track of employee's shoe sizes), we would have to add another argument to the
print_employee function, and change all of the calls. (Pretty soon, the number of arguments to the
print_employee function would become unwieldy.) What we'd really like is a way to encapsulate all
of the data about a single employee into a single data structure, so we could just pass that data structure
around.
The right solution to this problem, in languages such as C which support the idea, is to define a structure
describing an employee. We can make one array of these structures to describe all the employees, and we
can pass around single instances of the structure where they're needed.
section 6.1: Basics of Structures
Chapter 6: Structures
/* 1 */
/* 2 */
/* 3 */
The list of members (if present) describes what the new structure ``looks like inside.'' The list of
variables (if present) is (obviously) the list of variables of this new type which we're defining and which
the rest of the program will use. The tag (if present) is just an arbitrary name for the structure type itself
(not for any variable we're defining). The tag is used to associate a structure definition (as in fragment 1)
with a later declaration of variables of that same type (as in fragment 2).
http://www.eskimo.com/~scs/cclass/krnotes/sx9a.html (1 of 2) [22/07/2003 5:09:43 PM]
One thing to beware of: when you declare the members of a structure without defining any variables,
always remember the trailing semicolon, as shown in fragment 1 above. (If you forget it, the compiler
will wait until the next thing it finds in your source file, and try to define that as a variable or function of
the structure type.)
/* WRONG */
point thepoint;
/* WRONG */
and
would be errors.
The above list of things the language lets us do with structures lets us keep them and move them around,
but there isn't really anything defined by the language that we can do with structures. It's up to us to
define any operations on structures, usually by writing functions. (The addpoint function on page 130
is a good example. It will make a bit more sense if you think of it as adding not isolated points, but rather
vectors. [We can't add Seattle plus Los Angeles, but we could add (two miles south, one mile east) plus
(one mile east, two miles north).])
page 131
As an aside, how safe are the min() and max() macros defined at the top of page 131, with respect to
the criteria discussed on pages 15 and 16 of the notes on section 4.11.2 (page 90 in the text)?
The precise meaning of the ``shorthand'' -> operator is that sp->m is, by definition, equivalent to
(*sp).m, for any structure pointer sp and member m of the pointed-to structure.
structures that is linked together in some way, and as we're about to see, we're going to use a set of linked
structures to implement a binary tree. Just as we talk about a ``cell'' or ``element'' of an array, we talk
about a ``node'' in a tree or linked list.
When we look at the description of the algorithm for finding out whether a word is already in the tree, we
may begin to see why the binary tree is more efficient than the linear list. When searching through a
linear list, each time we discard a value that's not the one we're looking for, we've only discarded that one
value; we still have the entire rest of the list to search. In a binary tree, however, whenever we move
down the tree, we've just eliminated half of the tree. (We might say that a binary tree is a data structure
which makes binary search automatic.) Consider guessing a number between one and a hundred by
asking ``Is it 1? Is it 2? Is it 3?'' etc., versus asking ``Is it less than 50? Is it greater than 25? Is it less than
12?''
page 140
Make sure you're comfortable with the idea of a structure which contains pointers to other instances of
itself. If you draw some little box-and-arrow diagrams for a binary tree, the idea should fall into place
easily. (As the authors point out, what would be impossible would be for a structure to contain not a
pointer but rather another entire instance of itself, because that instance would contain another, and
another, and the structure would be infinitely big.)
page 141
Note that addtree accepts as an argument the tree to be added to, and returns a pointer to a tree,
because it may have to modify the tree in the process of adding a new node to it. If it doesn't have to
modify the tree (more precisely, if it doesn't have to modify the top or root of the tree) it returns the same
pointer it was handed.
Another thing to note is the technique used to mark the edges or ``leaves'' of the tree. We said that a null
pointer was a special pointer value guaranteed not to point anywhere, and it is therefore an excellent
marker to use when a left or right subtree does not exist. Whenever a new node is built, addtree
initializes both subtree pointers (``children'') to null pointers. Later, another chain of calls to addtree
may replace one or the other of these with a new subtree. (Eventually, both might be replaced.)
If you don't completely see how addtree works, leave it for a moment and look at treeprint on the
next page first.
The bottom of page 141 discusses a tremendously important issue: memory allocation. Although we only
have one copy of the addtree function (which may call itself recursively many times), by the time
we're done, we'll have many instances of the tnode structure (one for each unique word in the input).
Therefore, we have to arrange somehow that memory for these multiple instances is properly allocated.
We can't use a local variable of type struct tnode in addtree, because local variables disappear
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (2 of 9) [22/07/2003 5:09:51 PM]
when their containing function returns. We can't use a static variable of type struct tnode in
addtree, or a global variable of type struct tnode, because then we'd have only one node in the
whole program, and we need many.
What we need is some brand-new memory. Furthermore, we have to arrange it so that each time
addtree builds a brand-new node, it does so in another new piece of brand-new memory. Since each
node contains a pointer (char *) to a string, the memory for that string has to be dynamically allocated,
too. (If we didn't allocate memory for each new string, all the strings would end up being stored in the
word array in main on page 140, and they'd step all over each other, and we'd only be able to see the
last word we read.)
For the moment, we defer the questions of exactly where this brand-new memory is to come from by
defining two functions to do it. talloc is going to return a (pointer to a) brand-new piece of memory
suitable for holding a struct tnode, and strdup is going to return a (pointer to a) brand-new piece
of memory containing a copy of a string.
page 142
treeprint is probably the cleanest, simplest recursive function there is. If you've been putting off
getting comfortable with recursive functions, now is the time.
Suppose it's our job to print a binary tree: we've just been handed a pointer to the base (root) of the tree.
What do we do? The only node we've got easy access to is the root node, but as we saw, that's not the
first or the last element to print or anything; it's generally a random node somewhere in the middle of the
eventual sorted list (distinguished only by the fact that it happened to be inserted first). The node that
needs to be printed first is buried somewhere down in the left subtree, and the node to print just before
the node we've got easy access to is buried somewhere else down in the left subtree, and the node to print
next (after the one we've got) is buried somewhere down in the right subtree. In fact, everything down in
the left subtree is to be printed before the node we've got, and everything down in the right subtree is to
be printed after. A pseudocode description of our task, therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another tree, so printing it out
sounds about as hard as printing an entire tree, which is what we were supposed to do. In fact, it's exactly
as hard: it's the same problem. Are we going in circles? Are we getting anywhere? Yes, we are: the left
subtree, even though it is still a tree, is at least smaller than the full tree we started with. The same is true
of the right subtree. Therefore, we can use a recursive call to do the hard work of printing the subtrees,
and all we have to do is the easy part: print the node we're at. The fact that the subtrees are smaller gives
us the leverage we need to make a recursive algorithm work.
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (3 of 9) [22/07/2003 5:09:51 PM]
In any recursive function, it is (obviously) important to terminate the recursion, that is, to make sure that
the function doesn't recursively call itself forever. In the case of binary trees, when you reach a ``leaf'' of
the tree (more precisely, when the left or right subtree is a null pointer), there's nothing more to visit, so
the recursion can stop. We can test for this in two different ways, either before or after we make the
``last'' recursive call:
void treeprint(struct tnode *p)
{
if(p->left != NULL)
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
if(p->right != NULL)
treeprint(p->right);
}
or
void treeprint(struct tnode *p)
{
if(p == NULL)
return;
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
treeprint(p->right);
}
Sometimes, there's little difference between one approach and the other. Here, though, the second
approach (which is equivalent to the code on page 142) has a distinct advantage: it will work even if the
very first call is on an empty tree (in this case, if there were no words in the input). As we mentioned
earlier, it's extremely nice if programs work well at their boundary conditions, even if we don't think
those conditions are likely to occur.
(One more thing to notice is that it's quite possible for a node to have a left subtree but not a right, or vice
versa; one example is the node labeled ``of'' in the tree on page 139.)
Another impressive thing about a recursive treeprint function is that it's not just a way of writing it,
or a nifty way of writing it; it's really the only way of writing it. You might try to figure out how to write
a nonrecursive version. Once you've printed something down in the left subtree, how do you know where
to go back up to? Our struct tnode only has pointers down the tree, there aren't any pointers back to
the ``parent'' of each node. If you write a nonrecursive version, you have to keep track of how you got to
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (4 of 9) [22/07/2003 5:09:51 PM]
where you are, and it's not enough to keep track of the parent of the node you're at; you have to keep a
stack of all the nodes you've passed down through. When you write a recursive version, on the other
hand, the normal function-call stack essentially keeps track of all this for you.
We now return to the problem of dynamic memory allocation. The basic approach builds on something
we've been seeing glimpses of for a few chapters now: we use a general-purpose function which returns a
pointer to a block of n bytes of memory. (The authors presented a primitive version of such a function in
section 5.4, and we used it in the sorting program in section 5.6.) Our problem is then reduced to (1)
remembering to call this allocation function when we need to, and (2) figuring out how many bytes we
need. Problem 1 is stubborn, but problem 2 is solved by the sizeof operator we met in section 6.3.
You don't need to worry about all the details of the ``digression on a problem related to storage
allocators.'' The vast majority of the time, this problem is taken care of for you, because you use the
system library function malloc.
The problem of malloc's return type is not quite as bad as the authors make it out to be. In ANSI C, the
void * type is a ``generic'' pointer type, specifically intended to be used where you need a pointer
which can be a pointer to any data type. Since void * is never a pointer to anything by itself, but is
always intended to be converted (``coerced'') into some other type, it turns out that a cast is not strictly
required: in code like
struct tnode *tp = malloc(sizeof(struct tnode));
or
return malloc(sizeof(struct tnode));
the compiler is willing to convert the pointer types implicitly, without warning you and without requiring
you to insert explicit casts. (If you feel more comfortable with the casts, though, you're welcome to leave
them in.)
page 143
strdup is a handy little function that does two things: it allocates enough memory for one of your
strings, and it copies your string to the new memory, returning a pointer to it. (It encapsulates a pattern
which we first saw in the readlines function on page 109 in section 5.6.) Note the +1 in the call to
malloc! Accidentally calling malloc(strlen(s)) is an easy but serious mistake.
As we mentioned at the beginning of chapter 5, memory allocation can be hard to get right, and is at the
root of many difficulties and bugs in many C programs. Here are some rules and other things to
remember:
1. Make sure you know where things are allocated, either by the compiler or by you. Watch out for
things like the local line array we've been tending to use with getline, and the local word
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (5 of 9) [22/07/2003 5:09:51 PM]
array on page 140. When a function writes to an array or a pointer supplied by the caller, it
depends on the caller to have allocated storage correctly. When you're the caller, make sure you
pass a valid pointer! Make sure you understand why
char *ptr;
getline(ptr, 100);
2.
3.
4.
5.
6.
is wrong and can't work. (For one thing: what does that 100 mean? If getline is only allowed
to read at most 100 characters, where have we allocated those 100 characters that getline is not
allowed to write to more of than?)
Be aware of any situations where a single array or data structure is used to store multiple different
things, in succession. Think again about the local line array we've been tending to use with
getline, and the local word array on page 140. These arrays are overwritten with each new
line, word, etc., so if you need to keep all of the lines or words around, you must copy them
immediately to allocated memory (as the line-sorting program on pages 108-9 in section 5.6 did,
but as the longest line program on page 29 in section 1.9 and the pattern-matching programs on
page 69 in section 4.1 and pages 116-7 in section 5.10 did not have to do).
Make sure you allocate enough memory! If you allocate memory for an array of 10 things, don't
accidentally store 11 things in it. If you have a string that's 10 characters long, make sure you
always allocate 11 characters for it (including one for the terminating '\0').
When you free (deallocate) memory, make sure that you don't have any pointers lying around
which still point to it (or if you do, make sure not to use them any more).
Always check the return value from memory-allocation functions. Memory is never infinite:
sooner or later, you will run out of memory, and allocation functions generally return a null
pointer when this happens.
When you're not using dynamically-allocated memory any more, do try to free it, if it's convenient
to do so and the program's not just about to exit. Otherwise, you may eventually have so much
memory allocated to stuff you're not using any more that there's no more memory left for new
stuff you need to allocate. (However, on all but a few broken systems, all memory is
automatically and definitively returned to the operating system when your program exits, so if one
of your programs doesn't free some memory, you shouldn't have to worry that it's wasted forever.)
Unfortunately, checking the return values from memory allocation functions (point 5 above) requires a
few more lines of code, so it is often left out of sample code in textbooks, including this one. Here are
versions of main and addtree for the word-counting program (pages 140-1 in the text) which do
check for out-of-memory conditions:
/* word frequency count */
main()
{
struct tnode *root;
char word[MAXWORD];
root = NULL;
while (getword(word, MAXWORD) != EOF) {
if (isalpha(word[0])) {
root = addtree(root, word);
if(root == NULL) {
printf("out of memory\n");
return 1;
}
}
}
treeprint(root);
return 0;
}
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
p = talloc();
/* make a new node */
if (p == NULL)
return NULL;
p->word = strdup(w);
if (p->word == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
p->left = addtree(p->left, w);
if(p->left == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
p->right = addtree(p->right, w);
http://www.eskimo.com/~scs/cclass/krnotes/sx9f.html (7 of 9) [22/07/2003 5:09:51 PM]
if(p->right == NULL)
return NULL;
}
return p;
}
In practice, many programmers would collapse the calls and tests:
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;
if (p == NULL) {
/* a new word has arrived */
if ((p = talloc()) == NULL)
return NULL;
if ((p->word = strdup(w)) == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++;
/* repeated word */
else if (cond < 0) {
/* less than: into left subtree */
if ((p->left = addtree(p->left, w)) == NULL)
return NULL;
}
else {
/* greater than: into right subtree */
if ((p->right = addtree(p->right, w)) == NULL)
return NULL;
}
return p;
}
you have complete control over what you do next.'' Here, ``what you do next'' is try calling sscanf
again, on the very same input string (thus effectively backing up to the very beginning of it), using a
different format string, to try parsing the input a different way.
something you'll want to learn about. It's extremely annoying for a program to say ``can't open file''
without saying why. (Some particularly unhelpful programs don't even tell you which file they couldn't
open.)
On this page we learn about four new functions, getc, putc, fprintf, and fscanf, which are just
like functions that we've already been using except that they let you specify a file pointer to tell them
which file (or other I/O stream) to read from or write to. (Note that for putc, the extra FILE *
argument comes last, while for fprintf and fscanf, it comes first.)
page 162
cat is about the most basic and important file-handling program there is (even if its name is a bit
obscure). The cat program on page 162 is a bit like the ``hello, world'' program on page 6--it may seem
trivial, but if you can get it to work, you're over the biggest first hurdle when it comes to handling files at
all.
Compare the cat program (and especially its filecopy function) to the file copying program on page
16 of section 1.5.1--cat is essentially the same program, except that it accepts filenames on the
command line.
Since the authors advise calling fclose in part to ``flush the buffer in which putc is collecting
output,'' you may wonder why the program at the top of the page does not call fclose on its output
stream. The reason can be found in the next sentence: an implicit fclose happens automatically for any
streams which remain open when the program exits normally.
In general, it's a good idea to close any streams you open, but not to close the preopened streams such as
stdin and stdout. (Since ``the system'' opened them for you as your program was starting up, it's
appropriate to let it close them for you as your program exits.)
sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com
17. Style
18. Tools and Resources
19. System Dependencies
20. Miscellaneous
Bibliography
Acknowledgements
All Questions
Read Sequentially
to German, by Jochen Schoof et al. (If that link doesn't work, try this one.)
to Japanese, by Kinichi Kitano. (I don't know of a URL, but it is or was posted regularly to
fj.comp.lang.c, and has been published by Toppan, ISBN 4-8101-8097-2.)
Seong-Kook Cin has completed a Korean translation, which is at
http://pcrc.hongik.ac.kr/~cinsk/cfaqs/.
A French C FAQ list (not a direct translation of this one) is at http://www.istyinfo.uvsq.fr/~rumeau/fclc/.
question
--------
front cover
xxix
1.1
is
is
is
is
at
at
at
at
least
least
least
least
8 bits
16 bits
16 bits
32 bits
and, in C99,
sizeof(long long) is at least 64 bits
3-4
1.3
1.4
1.7
1.7
11
1.14
13
1.15
18
1.22
20
1.24
#include "file1.h"
[Thanks to Jon Jagger]
23
1.29
24
1.29
24
1.29
24
1.29
25
1.29
32
2.4
33-36
2.6
38
2.10
40
2.12
43
2.20
3.3
51
3.4
52
3.6
57
3.12
68
4.5
72-3
4.10
73
4.11
75
4.13
84
5.8
95
6.2
104-5
6.15
105-7
6.16
110
6.19
115
7.1
121
7.9
7.10
132
7.30
134
7.32
136
8.1
136
8.2
143
9.2
151
10.4
152
10.6
158
10.15
161
10.21
163-4
10.29
164-5
10.27
168
11.1
169-70
11.2
174
11.10
175
11.10
180
11.19
182
11.25
183-4
11.27
186
11.29
186
11.29
189
11.33
198
12.11
201
12.16
205
12.19
207-8
12.21
212
12.28
213
12.30
224
13.4
225
13.6
226
13.6
227
13.6
234
13.14
240
13.17
The code
srand((unsigned int)time((time_t *)NULL));
though popular and generally effective is, alas,
not strictly conforming. It's theoretically
possible that time_t could be defined as a
floating-point type under some implementation,
and that the time_t values returned by time()
could therefore exceed the range of an unsigned
int in such a way that a well-defined cast to
(unsigned int) is not possible.
242-3
13.20
244
13.21
14.5
253
14.8
253
14.9
254-5
14.11
260-1
15.4
264
15.5
269-71
15.12
274
16.4
276
16.7
287
18.1
18.13
294
18.14
296
18.16
308
19.11
310
19.14
314
19.23
315
19.25
317
19.27
318
19.30
319
19.31
324
19.42
339-40
342-44
20.17
350
20.21
355
20.29
363
368
370-1
376
379
382
back cover
sequentially'' link leads to the first question; you can then follow the ``next'' link at the bottom of each
question's page to read through all of the questions and answers sequentially.
Steve Summit
scs@eskimo.com
17. Style
18. Tools and Resources
19. System Dependencies
20. Miscellaneous
Bibliography
Acknowledgements
All Questions
Read Sequentially
Question 20.40
Question 20.40
Where can I get extra copies of this list? What about back issues?
An up-to-date copy may be obtained from aw.com in directory xxx or ftp.eskimo.com in directory
u/s/scs/C-faq/. You can also just pull it off the net; it is normally posted to comp.lang.c on the first of
each month, with an Expires: line which should keep it around all month. A parallel, abridged version is
available (and posted), as is a list of changes accompanying each significantly updated version.
The various versions of this list are also posted to the newsgroups comp.answers and news.answers .
Several sites archive news.answers postings and other FAQ lists, including this one; two sites are
rtfm.mit.edu (directories pub/usenet/news.answers/C-faq/ and pub/usenet/comp.lang.c/) and ftp.uu.net
(directory usenet/news.answers/C-faq/). An archie server (see question 18.16) should help you find
others; ask it to ``find C-faq''. If you don't have ftp access, a mailserver at rtfm.mit.edu can mail you FAQ
lists: send a message containing the single word help to mail-server@rtfm.mit.edu . See the meta-FAQ
list in news.answers for more information.
An extended version of this FAQ list is being published by Addison-Wesley as C Programming FAQs:
Frequently Asked Questions (ISBN 0-201-84519-9). It should be available in November 1995.
This list is an evolving document of questions which have been Frequent since before the Great
Renaming, not just a collection of this month's interesting questions. Older copies are obsolete and don't
contain much, except the occasional typo, that the current list doesn't.
Question 18.16
Question 18.16
Where and how can I get copies of all these freely distributable programs?
As the number of available programs, the number of publicly accessible archive sites, and the number of
people trying to access them all grow, this question becomes both easier and more difficult to answer.
There are a number of large, public-spirited archive sites out there, such as ftp.uu.net, archive.umich.edu,
oak.oakland.edu, sumex-aim.stanford.edu, and wuarchive.wustl.edu, which have huge amounts of
software and other information all freely available. For the FSF's GNU project, the central distribution
site is prep.ai.mit.edu . These well-known sites tend to be extremely busy and hard to reach, but there are
also numerous ``mirror'' sites which try to spread the load around.
On the connected Internet, the traditional way to retrieve files from an archive site is with anonymous
ftp. For those without ftp access, there are also several ftp-by-mail servers in operation. More and more,
the world-wide web (WWW) is being used to announce, index, and even transfer large data files. There
are probably yet newer access methods, too.
Those are some of the easy parts of the question to answer. The hard part is in the details--this article
cannot begin to track or list all of the available archive sites or all of the various ways of accessing them.
If you have access to the net at all, you probably have access to more up-to-date information about active
sites and useful access methods than this FAQ list does.
The other easy-and-hard aspect of the question, of course, is simply finding which site has what you're
looking for. There is a tremendous amount of work going on in this area, and there are probably new
indexing services springing up every day. One of the first was ``archie'': for any program or resource
available on the net, if you know its name, an archie server can usually tell you which anonymous ftp
sites have it. Your system may have an archie command, or you can send the mail message ``help'' to
archie@archie.cs.mcgill.ca for information.
If you have access to Usenet, see the regular postings in the comp.sources.unix and comp.sources.misc
newsgroups, which describe the archiving policies for those groups and how to access their archives. The
comp.archives newsgroup contains numerous announcements of anonymous ftp availability of various
items. Finally, the newsgroup comp.sources.wanted is generally a more appropriate place to post queries
for source availability, but check its FAQ list, ``How to find sources,'' before posting there.
See also question 14.12.
Question 18.16
Question 14.12
Question 14.12
I'm looking for some code to do:
Fast Fourier Transforms (FFT's)
matrix arithmetic (multiplication, inversion, etc.)
complex arithmetic
Ajay Shah maintains an index of free numerical software; it is posted periodically, and available where
this FAQ list is archived (see question 20.40). See also question 18.16.
Question 14.11
Question 14.11
What's a good way to implement complex numbers in C?
It is straightforward to define a simple structure and some arithmetic functions to manipulate them. See
also questions 2.7, 2.10, and 14.12.
Question 2.7
Question 2.7
I heard that structures could be assigned to variables and passed to and from functions, but K&R1 says
not.
What K&R1 said was that the restrictions on structure operations would be lifted in a forthcoming
version of the compiler, and in fact structure assignment and passing were fully functional in Ritchie's
compiler even as K&R1 was being published. Although a few early C compilers lacked these operations,
all modern compilers support them, and they are part of the ANSI C standard, so there should be no
reluctance to use them. [footnote]
(Note that when a structure is assigned, passed, or returned, the copying is done monolithically; anything
pointed to by any pointer fields is not copied.)
References: K&R1 Sec. 6.2 p. 121
K&R2 Sec. 6.2 p. 129
ANSI Sec. 3.1.2.5, Sec. 3.2.2.1, Sec. 3.3.16
ISO Sec. 6.1.2.5, Sec. 6.2.2.1, Sec. 6.3.16
H&S Sec. 5.6.2 p. 133
Footnote 1
However, passing large structures to and from functions can be expensive (see question 2.9), so you may
want to consider using pointers, instead (as long as you don't need pass-by-value semantics, of course).
back
Question 2.9
Question 2.9
How are structure passing and returning implemented?
When structures are passed as arguments to functions, the entire structure is typically pushed on the
stack, using as many words as are required. (Programmers often choose to use pointers to structures
instead, precisely to avoid this overhead.) Some compilers merely pass a pointer to the structure, though
they may have to make a local copy to preserve pass-by-value semantics.
Structures are often returned from functions in a location pointed to by an extra, compiler-supplied
``hidden'' argument to the function. Some older compilers used a special, static location for structure
returns, although this made structure-valued functions non-reentrant, which ANSI C disallows.
References: ANSI Sec. 2.2.3
ISO Sec. 5.2.3
Question 2.8
Question 2.8
Why can't you compare structures?
There is no single, good way for a compiler to implement structure comparison which is consistent with
C's low-level flavor. A simple byte-by-byte comparison could founder on random bits present in unused
``holes'' in the structure (such padding is used to keep the alignment of later fields correct; see question
2.12). A field-by-field comparison might require unacceptable amounts of repetitive code for large
structures.
If you need to compare two structures, you'll have to write your own function to do so, field by field.
References: K&R2 Sec. 6.2 p. 129
ANSI Sec. 4.11.4.1 footnote 136
Rationale Sec. 3.3.9
H&S Sec. 5.6.2 p. 133
Question 2.12
Question 2.12
My compiler is leaving holes in structures, which is wasting space and preventing ``binary'' I/O to
external data files. Can I turn off the padding, or otherwise control the alignment of structure fields?
Your compiler may provide an extension to give you this control (perhaps a #pragma; see question
11.20), but there is no standard method.
See also question 20.5.
References: K&R2 Sec. 6.4 p. 138
H&S Sec. 5.6.4 p. 135
Question 11.20
Question 11.20
What are #pragmas and what are they good for?
The #pragma directive provides a single, well-defined ``escape hatch'' which can be used for all sorts of
implementation-specific controls and extensions: source listing control, structure packing, warning
suppression (like lint's old /* NOTREACHED */ comments), etc.
References: ANSI Sec. 3.8.6
ISO Sec. 6.8.6
H&S Sec. 3.7 p. 61
Question 11.19
Question 11.19
I'm getting strange syntax errors inside lines I've #ifdeffed out.
Under ANSI C, the text inside a ``turned off'' #if, #ifdef, or #ifndef must still consist of ``valid
preprocessing tokens.'' This means that there must be no newlines inside quotes, and no unterminated
comments or quotes (note particularly that an apostrophe within a contracted word looks like the
beginning of a character constant). Therefore, natural-language comments and pseudocode should always
be written between the ``official'' comment delimiters /* and */. (But see question 20.20, and also
10.25.)
References: ANSI Sec. 2.1.1.2, Sec. 3.1
ISO Sec. 5.1.1.2, Sec. 6.1
H&S Sec. 3.2 p. 40
Question 20.20
Question 20.20
Why don't C comments nest? How am I supposed to comment out code containing comments? Are
comments legal inside quoted strings?
C comments don't nest mostly because PL/I's comments, which C's are borrowed from, don't either.
Therefore, it is usually better to ``comment out'' large sections of code, which might contain comments,
with #ifdef or #if 0 (but see question 11.19).
The character sequences /* and */ are not special within double-quoted strings, and do not therefore
introduce comments, because a program (particularly one which is generating C code as output) might
want to print them.
Note also that // comments, as in C++, are not currently legal in C, so it's not a good idea to use them in
C programs (even if your compiler supports them as an extension).
References: K&R1 Sec. A2.1 p. 179
K&R2 Sec. A2.2 p. 192
ANSI Sec. 3.1.9 (esp. footnote 26), Appendix E
ISO Sec. 6.1.9, Annex F
Rationale Sec. 3.1.9
H&S Sec. 2.2 pp. 18-9
PCS Sec. 10 p. 130
Question 20.19
Question 20.19
Are the outer parentheses in return statements really optional?
Yes.
Long ago, in the early days of C, they were required, and just enough people learned C then, and wrote
code which is still in circulation, that the notion that they might still be required is widespread.
(As it happens, parentheses are optional with the sizeof operator, too, as long as its operand is a
variable or a unary expression.)
References: K&R1 Sec. A18.3 p. 218
ANSI Sec. 3.3.3, Sec. 3.6.6
ISO Sec. 6.3.3, Sec. 6.6.6
H&S Sec. 8.9 p. 254
Question 20.18
Question 20.18
Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)?
No. The switch statement was originally designed to be quite simple for the compiler to translate,
therefore case labels are limited to single, constant, integral expressions. You can attach several case
labels to the same statement, which will let you cover a small range if you don't mind listing all cases
explicitly.
If you want to select on arbitrary ranges or non-constant expressions, you'll have to use an if/else chain.
See also questions question 20.17.
References: K&R1 Sec. 3.4 p. 55
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
Rationale Sec. 3.6.4.2
H&S Sec. 8.7 p. 248
Question 20.17
Question 20.17
Is there a way to switch on strings?
Not directly. Sometimes, it's appropriate to use a separate function to map strings to integer codes, and
then switch on those. Otherwise, of course, you can fall back on strcmp and a conventional if/else
chain. See also questions 10.12, 20.18, and 20.29.
References: K&R1 Sec. 3.4 p. 55
K&R2 Sec. 3.4 p. 58
ANSI Sec. 3.6.4.2
ISO Sec. 6.6.4.2
H&S Sec. 8.7 p. 248
Question 10.12
Question 10.12
How can I construct preprocessor #if expressions which compare strings?
You can't do it directly; preprocessor #if arithmetic uses only integers. You can #define several
manifest constants, however, and implement conditionals on those.
See also question 20.17.
References: K&R2 Sec. 4.11.3 p. 91
ANSI Sec. 3.8.1
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 10.11
Question 10.11
I seem to be missing the system header file <sgtty.h>. Can someone send me a copy?
Standard headers exist in part so that definitions appropriate to your compiler, operating system, and
processor can be supplied. You cannot just pick up a copy of someone else's header file and expect it to
work, unless that person is using exactly the same environment. Ask your compiler vendor why the file
was not provided (or to send a replacement copy).
Question 10.9
Question 10.9
I'm getting strange syntax errors on the very first declaration in a file, but it looks fine.
Perhaps there's a missing semicolon at the end of the last declaration in the last header file you're
#including. See also questions 2.18 and 11.29.
Question 2.18
Question 2.18
This program works correctly, but it dumps core after it finishes. Why?
struct list {
char *item;
struct list *next;
}
/* Here is the main program. */
main(argc, argv)
{ ... }
A missing semicolon causes main to be declared as returning a structure. (The connection is hard to see
because of the intervening comment.) Since structure-valued functions are usually implemented by
adding a hidden return pointer (see question 2.9), the generated code for main() tries to accept three
arguments, although only two are passed (in this case, by the C start-up code). See also questions 10.9
and 16.4.
References: CT&P Sec. 2.3 pp. 21-2
Question 11.21
Question 11.21
What does ``#pragma once'' mean? I found it in some header files.
Question 10.7
Question 10.7
Is it acceptable for one header file to #include another?
It's a question of style, and thus receives considerable debate. Many people believe that ``nested
#include files'' are to be avoided: the prestigious Indian Hill Style Guide (see question 17.9)
disparages them; they can make it harder to find relevant definitions; they can lead to multiple-definition
errors if a file is #included twice; and they make manual Makefile maintenance very difficult. On the
other hand, they make it possible to use header files in a modular way (a header file can #include
what it needs itself, rather than requiring each #includer to do so); a tool like grep (or a tags file)
makes it easy to find definitions no matter where they are; a popular trick along the lines of:
#ifndef HFILENAME_USED
#define HFILENAME_USED
...header file contents...
#endif
(where a different bracketing macro name is used for each header file) makes a header file ``idempotent''
so that it can safely be #included multiple times; and automated Makefile maintenance tools (which
are a virtual necessity in large projects anyway; see question 18.1) handle dependency generation in the
face of nested #include files easily. See also question 17.10.
References: Rationale Sec. 4.1.2
Question 17.9
Question 17.9
Where can I get the ``Indian Hill Style Guide'' and other coding standards?
File or directory:
cs.washington.edu
pub/cstyle.tar.Z
(the updated Indian Hill guide)
ftp.cs.toronto.edu
ftp.cs.umd.edu
doc/programming
(including Henry Spencer's
``10 Commandments for C Programmers'')
pub/style-guide
You may also be interested in the books The Elements of Programming Style, Plum Hall Programming
Guidelines, and C Style: Standards and Guidelines; see the Bibliography. (The Standards and Guidelines
book is not in fact a style guide, but a set of guidelines on selecting and creating style guides.)
See also question 18.9.
Question 18.9
Question 18.9
Are there any C tutorials or other resources on the net?
Question 18.9
Question 18.10
Question 18.10
What's a good book for learning C?
There are far too many books on C to list here; it's impossible to rate them all. Many people believe that
the best one was also the first: The C Programming Language, by Kernighan and Ritchie (``K&R,'' now
in its second edition). Opinions vary on K&R's suitability as an initial programming text: many of us did
learn C from it, and learned it well; some, however, feel that it is a bit too clinical as a first tutorial for
those without much programming background.
An excellent reference manual is C: A Reference Manual, by Samuel P. Harbison and Guy L. Steele,
now in its fourth edition.
Though not suitable for learning C from scratch, this FAQ list has been published in book form; see the
Bibliography.
Mitch Wright maintains an annotated bibliography of C and Unix books; it is available for anonymous
ftp from ftp.rahul.net in directory pub/mitch/YABL/.
This FAQ list's editor maintains a collection of previous answers to this question, which is available upon
request. See also question 18.9.
Question 18.13
Question 18.13
Where can I find the sources of the standard C libraries?
One source (though not public domain) is The Standard C Library, by P.J. Plauger (see the
Bibliography). Implementations of all or part of the C library have been written and are readily available
as part of the netBSD and GNU (also Linux) projects. See also question 18.16.
Question 18.14
Question 18.14
I need code to parse and evaluate expressions.
Two available packages are ``defunc,'' posted to comp.sources.misc in December, 1993 (V41 i32,33), to
alt.sources in January, 1994, and available from sunsite.unc.edu in
pub/packages/development/libraries/defunc-1.3.tar.Z, and ``parse,'' at lamont.ldgo.columbia.edu. Other
options include the S-Lang interpreter, available via anonymous ftp from amy.tch.harvard.edu in
pub/slang, and the shareware Cmm (``C-minus-minus'' or ``C minus the hard stuff''). See also question
18.16.
There is also some parsing/evaluation code in Software Solutions in C (chapter 12, pp. 235-55).
Question 18.15
Question 18.15
Where can I get a BNF or YACC grammar for C?
The definitive grammar is of course the one in the ANSI standard; see question 11.2. Another grammar
(along with one for C++) by Jim Roskind is in pub/c++grammar1.1.tar.Z at ics.uci.edu . A fleshed-out,
working instance of the ANSI grammar (due to Jeff Lee) is on ftp.uu.net (see question 18.16) in
usenet/net.sources/ansi.c.grammar.Z (including a companion lexer). The FSF's GNU C compiler contains
a grammar, as does the appendix to K&R2.
The comp.compilers archives contain more information about grammars; see question 18.3.
References: K&R1 Sec. A18 pp. 214-219
K&R2 Sec. A13 pp. 234-239
ANSI Sec. A.2
ISO Sec. B.2
H&S pp. 423-435 Appendix B
Question 11.2
Question 11.2
How can I get a copy of the Standard?
[Late-breaking news: I've been told that copies of the new C99 can be obtained directly from
www.ansi.org; the price for an electronic document is only US $18.00.]
Copies are available in the United States from
American National Standards Institute
11 W. 42nd St., 13th floor
New York, NY 10036 USA
(+1) 212 642 4900
and
Global Engineering Documents
15 Inverness Way E
Englewood, CO 80112 USA
(+1) 303 397 2715
(800) 854 7179 (U.S. & Canada)
In other countries, contact the appropriate national standards body, or ISO in Geneva at:
ISO Sales
Case Postale 56
CH-1211 Geneve 20
Switzerland
(or see URL http://www.iso.ch or check the comp.std.internat FAQ list, Standards.Faq).
At the time of this writing, the cost is $130.00 from ANSI or $410.00 from Global. Copies of the original
X3.159 (including the Rationale) may still be available at $205.00 from ANSI or $162.50 from Global.
Note that ANSI derives revenues to support its operations from the sale of printed standards, so
electronic copies are not available.
In the U.S., it may be possible to get a copy of the original ANSI X3.159 (including the Rationale) as
``FIPS PUB 160'' from
http://www.eskimo.com/~scs/C-faq/q11.2.html (1 of 2) [22/07/2003 5:11:08 PM]
Question 11.2
Question 11.1
Question 11.1
What is the ``ANSI C Standard?''
In 1983, the American National Standards Institute (ANSI) commissioned a committee, X3J11, to
standardize the C language. After a long, arduous process, including several widespread public reviews,
the committee's work was finally ratified as ANS X3.159-1989 on December 14, 1989, and published in
the spring of 1990. For the most part, ANSI C standardizes existing practice, with a few additions from
C++ (most notably function prototypes) and support for multinational character sets (including the
controversial trigraph sequences). The ANSI C standard also formalizes the C run-time library support
routines.
More recently, the Standard has been adopted as an international standard, ISO/IEC 9899:1990, and this
ISO Standard replaces the earlier X3.159 even within the United States. Its sections are numbered
differently (briefly, ISO sections 5 through 7 correspond roughly to the old ANSI sections 2 through 4).
As an ISO Standard, it is subject to ongoing revision through the release of Technical Corrigenda and
Normative Addenda.
In 1994, Technical Corrigendum 1 amended the Standard in about 40 places, most of them minor
corrections or clarifications. More recently, Normative Addendum 1 added about 50 pages of new
material, mostly specifying new library functions for internationalization. The production of Technical
Corrigenda is an ongoing process, and a second one is expected in late 1995. In addition, both ANSI and
ISO require periodic review of their standards. This process is beginning in 1995, and will likely result in
a completely revised standard (nicknamed ``C9X'' on the assumption of completion by 1999).
The original ANSI Standard included a ``Rationale,'' explaining many of its decisions, and discussing a
number of subtle points, including several of those covered here. (The Rationale was ``not part of ANSI
Standard X3.159-1989, but... included for information only,'' and is not included with the ISO Standard.)
Question 10.26
Question 10.26
How can I write a macro which takes a variable number of arguments?
One popular trick is to define and invoke the macro with a single, parenthesized ``argument'' which in the
macro expansion becomes the entire argument list, parentheses and all, for a function such as printf:
#define DEBUG(args) (printf("DEBUG: "), printf args)
if(n != 0) DEBUG(("n is %d\n", n));
The obvious disadvantage is that the caller must always remember to use the extra parentheses.
gcc has an extension which allows a function-like macro to accept a variable number of arguments, but
it's not standard. Other possible solutions are to use different macros (DEBUG1, DEBUG2, etc.)
depending on the number of arguments, to play games with commas:
#define DEBUG(args) (printf("DEBUG: "), printf(args))
#define _ ,
DEBUG("i = %d" _ i)
It is often better to use a bona-fide function, which can take a variable number of arguments in a welldefined way. See questions 15.4 and 15.5.
Question 15.4
Question 15.4
How can I write a function that takes a variable number of arguments?
/* +1 for trailing \0 */
if(retbuf == NULL)
return NULL;
/* error */
(void)strcpy(retbuf, first);
va_start(argp, first);
Question 15.4
va_end(argp);
return retbuf;
}
Usage is something like
char *str = vstrcat("Hello, ", "world!", (char *)NULL);
Note the cast on the last argument; see questions 5.2 and 15.3. (Also note that the caller must free the
returned, malloc'ed storage.)
See also question 15.7.
References: K&R2 Sec. 7.3 p. 155, Sec. B7 p. 254
ANSI Sec. 4.8
ISO Sec. 7.8
Rationale Sec. 4.8
H&S Sec. 11.4 pp. 296-9
CT&P Sec. A.3 pp. 139-141
PCS Sec. 11 pp. 184-5, Sec. 13 p. 242
Question 5.2
Question 5.2
How do I get a null pointer in my programs?
According to the language definition, a constant 0 in a pointer context is converted into a null pointer at
compile time. That is, in an initialization, assignment, or comparison when one side is a variable or
expression of pointer type, the compiler can tell that a constant 0 on the other side requests a null pointer,
and generate the correctly-typed null pointer value. Therefore, the following fragments are perfectly
legal:
char *p = 0;
if(p != 0)
(See also question 5.3.)
However, an argument being passed to a function is not necessarily recognizable as a pointer context,
and the compiler may not be able to tell that an unadorned 0 ``means'' a null pointer. To generate a null
pointer in a function call context, an explicit cast may be required, to force the 0 to be recognized as a
pointer. For example, the Unix system call execl takes a variable-length, null-pointer-terminated list of
character pointer arguments, and is correctly called like this:
execl("/bin/sh", "sh", "-c", "date", (char *)0);
If the (char *) cast on the last argument were omitted, the compiler would not know to pass a null
pointer, and would pass an integer 0 instead. (Note that many Unix manuals get this example wrong .)
When function prototypes are in scope, argument passing becomes an ``assignment context,'' and most
casts may safely be omitted, since the prototype tells the compiler that a pointer is required, and of which
type, enabling it to correctly convert an unadorned 0. Function prototypes cannot provide the types for
variable arguments in variable-length argument lists however, so explicit casts are still required for those
arguments. (See also question 15.3.) It is safest to properly cast all null pointer constants in function
calls: to guard against varargs functions or those without prototypes, to allow interim use of non-ANSI
compilers, and to demonstrate that you know what you are doing. (Incidentally, it's also a simpler rule to
remember.)
Summary:
Unadorned 0 okay:
Question 5.2
initialization
function call,
no prototype in scope
assignment
comparison
variable argument in
varargs function call
function call,
prototype in scope,
fixed argument
References: K&R1 Sec. A7.7 p. 190, Sec. A7.14 p. 192
K&R2 Sec. A7.10 p. 207, Sec. A7.17 p. 209
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
H&S Sec. 4.6.3 p. 95, Sec. 6.2.7 p. 171
Question 5.3
Question 5.3
Is the abbreviated pointer comparison ``if(p)'' to test for non-null pointers valid? What if the internal
representation for null pointers is nonzero?
When C requires the Boolean value of an expression (in the if, while, for, and do statements, and
with the &&, ||, !, and ?: operators), a false value is inferred when the expression compares equal to
zero, and a true value otherwise. That is, whenever one writes
if(expr)
where ``expr'' is any expression at all, the compiler essentially acts as if it had been written as
if((expr) != 0)
Substituting the trivial pointer expression ``p'' for ``expr,'' we have
if(p)
is equivalent to
if(p != 0)
and this is a comparison context, so the compiler can tell that the (implicit) 0 is actually a null pointer
constant, and use the correct null pointer value. There is no trickery involved here; compilers do work
this way, and generate identical code for both constructs. The internal representation of a null pointer
does not matter.
The boolean negation operator, !, can be described as follows:
!expr
is essentially equivalent to
or to
((expr) == 0)
(expr)?0:1
is equivalent to
if(p == 0)
``Abbreviations'' such as if(p), though perfectly legal, are considered by some to be bad style (and by
others to be good style; see question 17.10).
See also question 9.2.
References: K&R2 Sec. A7.4.7 p. 204
http://www.eskimo.com/~scs/C-faq/q5.3.html (1 of 2) [22/07/2003 5:11:19 PM]
Question 5.3
ANSI Sec. 3.3.3.3, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 5.3.2 p. 122
Question 17.10
Question 17.10
Some people say that goto's are evil and that I should never use them. Isn't that a bit extreme?
Programming style, like writing style, is somewhat of an art and cannot be codified by inflexible rules,
although discussions about style often seem to center exclusively around such rules.
In the case of the goto statement, it has long been observed that unfettered use of goto's quickly leads
to unmaintainable spaghetti code. However, a simple, unthinking ban on the goto statement does not
necessarily lead immediately to beautiful programming: an unstructured programmer is just as capable of
constructing a Byzantine tangle without using any goto's (perhaps substituting oddly-nested loops and
Boolean control variables, instead).
Most observations or ``rules'' about programming style usually work better as guidelines than rules, and
work much better if programmers understand what the guidelines are trying to accomplish. Blindly
avoiding certain constructs or following rules without understanding them can lead to just as many
problems as the rules were supposed to avert.
Furthermore, many opinions on programming style are just that: opinions. It's usually futile to get
dragged into ``style wars,'' because on certain issues (such as those referred to in questions 9.2, 5.3, 5.9,
and 10.7), opponents can never seem to agree, or agree to disagree, or stop arguing.
Question 9.2
Question 9.2
Isn't #defining TRUE to be 1 dangerous, since any nonzero value is considered ``true'' in C? What if a
built-in logical or relational operator ``returns'' something other than 1?
It is true (sic) that any nonzero value is considered true in C, but this applies only ``on input'', i.e. where a
Boolean value is expected. When a Boolean value is generated by a built-in operator, it is guaranteed to
be 1 or 0. Therefore, the test
if((a == b) == TRUE)
would work as expected (as long as TRUE is 1), but it is obviously silly. In general, explicit tests against
TRUE and FALSE are inappropriate, because some library functions (notably isupper, isalpha, etc.)
return, on success, a nonzero value which is not necessarily 1. (Besides, if you believe that ``if((a ==
b) == TRUE)'' is an improvement over ``if(a == b)'', why stop there? Why not use ``if(((a ==
b) == TRUE) == TRUE)''?) A good rule of thumb is to use TRUE and FALSE (or the like) only for
assignment to a Boolean variable or function parameter, or as the return value from a Boolean function,
but never in a comparison.
The preprocessor macros TRUE and FALSE (and, of course, NULL) are used for code readability, not
because the underlying values might ever change. (See also questions 5.3 and 5.10.)
On the other hand, Boolean values and definitions can evidently be confusing, and some programmers
feel that TRUE and FALSE macros only compound the confusion. (See also question 5.9.)
References: K&R1 Sec. 2.6 p. 39, Sec. 2.7 p. 41
K&R2 Sec. 2.6 p. 42, Sec. 2.7 p. 44, Sec. A7.4.7 p. 204, Sec. A7.9 p. 206
ANSI Sec. 3.3.3.3, Sec. 3.3.8, Sec. 3.3.9, Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15, Sec. 3.6.4.1, Sec. 3.6.5
ISO Sec. 6.3.3.3, Sec. 6.3.8, Sec. 6.3.9, Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15, Sec. 6.6.4.1, Sec. 6.6.5
H&S Sec. 7.5.4 pp. 196-7, Sec. 7.6.4 pp. 207-8, Sec. 7.6.5 pp. 208-9, Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 2189, Sec. 8.5 pp. 238-9, Sec. 8.6 pp. 241-4
``What the Tortoise Said to Achilles''
Question 9.2
Question 5.10
Question 5.10
But wouldn't it be better to use NULL (rather than 0), in case the value of NULL changes, perhaps on a
machine with nonzero internal null pointers?
No. (Using NULL may be preferable, but not for this reason.) Although symbolic constants are often used
in place of numbers because the numbers might change, this is not the reason that NULL is used in place
of 0. Once again, the language guarantees that source-code 0's (in pointer contexts) generate null
pointers. NULL is used only as a stylistic convention. See questions 5.5 and 9.2.
Question 5.5
Question 5.5
How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?
Question 15.3
Question 15.3
I had a frustrating problem which turned out to be caused by the line
printf("%d", n);
where n was actually a long int. I thought that ANSI function prototypes were supposed to guard
against argument type mismatches like this.
When a function accepts a variable number of arguments, its prototype does not (and cannot) provide any
information about the number and types of those variable arguments. Therefore, the usual protections do
not apply in the variable-length part of variable-length argument lists: the compiler cannot perform
implicit conversions or (in general) warn about mismatches.
See also questions 5.2, 11.3, 12.9, and 15.2.
Question 11.3
Question 11.3
My ANSI compiler complains about a mismatch when it sees
extern int func(float);
int func(x)
float x;
{ ...
You have mixed the new-style prototype declaration ``extern int func(float);'' with the oldstyle definition ``int func(x) float x;''. It is usually safe to mix the two styles (see question
11.4), but not in this case.
Old C (and ANSI C, in the absence of prototypes, and in variable-length argument lists; see question
15.2) ``widens'' certain arguments when they are passed to functions. floats are promoted to double,
and characters and short integers are promoted to int. (For old-style function definitions, the values are
automatically converted back to the corresponding narrower types within the body of the called function,
if they are declared that way there.)
This problem can be fixed either by using new-style syntax consistently in the definition:
int func(float x) { ... }
or by changing the new-style prototype declaration to match the old-style definition:
extern int func(double);
(In this case, it would be clearest to change the old-style definition to use double as well, as long as the
address of that parameter is not taken.)
It may also be safer to avoid ``narrow'' (char, short int, and float) function arguments and return
types altogether.
See also question 1.25.
References: K&R1 Sec. A7.1 p. 186
K&R2 Sec. A7.3.2 p. 202
ANSI Sec. 3.3.2.2, Sec. 3.5.4.3
http://www.eskimo.com/~scs/C-faq/q11.3.html (1 of 2) [22/07/2003 5:11:26 PM]
Question 11.3
Question 11.4
Question 11.4
Can you mix old-style and new-style function syntax?
Doing so is perfectly legal, as long as you're careful (see especially question 11.3). Note however that oldstyle syntax is marked as obsolescent, so official support for it may be removed some day.
References: ANSI Sec. 3.7.1, Sec. 3.9.5
ISO Sec. 6.7.1, Sec. 6.9.5
H&S Sec. 9.2.2 pp. 265-7, Sec. 9.2.5 pp. 269-70
Question 11.5
Question 11.5
Why does the declaration
extern f(struct x *p);
give me an obscure warning message about ``struct x introduced in prototype scope''?
In a quirk of C's normal block scoping rules, a structure declared (or even mentioned) for the first time
within a prototype cannot be compatible with other structures declared in the same source file (it goes out
of scope at the end of the prototype).
To resolve the problem, precede the prototype with the vacuous-looking declaration
struct x;
which places an (incomplete) declaration of struct x at file scope, so that all following declarations
involving struct x can at least be sure they're referring to the same struct x.
References: ANSI Sec. 3.1.2.1, Sec. 3.1.2.6, Sec. 3.5.2.3
ISO Sec. 6.1.2.1, Sec. 6.1.2.6, Sec. 6.5.2.3
Question 11.8
Question 11.8
I don't understand why I can't use const values in initializers and array dimensions, as in
const int n = 5;
int a[n];
The const qualifier really means ``read-only;'' an object so qualified is a run-time object which cannot
(normally) be assigned to. The value of a const-qualified object is therefore not a constant expression
in the full sense of the term. (C is unlike C++ in this regard.) When you need a true compile-time
constant, use a preprocessor #define.
References: ANSI Sec. 3.4
ISO Sec. 6.4
H&S Secs. 7.11.2,7.11.3 pp. 226-7
Question 11.9
Question 11.9
What's the difference between const char *p and char * const p?
const char *p declares a pointer to a constant character (you can't change the character); char *
const p declares a constant pointer to a (variable) character (i.e. you can't change the pointer).
Read these ``inside out'' to understand them; see also question 1.21.
References: ANSI Sec. 3.5.4.1 examples
ISO Sec. 6.5.4.1
Rationale Sec. 3.5.4.1
H&S Sec. 4.4.4 p. 81
Question 1.21
Question 1.21
How do I declare an array of N pointers to functions returning pointers to functions returning pointers to
characters?
The first part of this question can be answered in at least three ways:
1. char *(*(*a[N])())();
2. Build the declaration up incrementally, using typedefs:
typedef char *pc;
typedef pc fpc();
typedef fpc *pfpc;
typedef pfpc fpfpc();
typedef fpfpc *pfpfpc;
pfpfpc a[N];
/*
/*
/*
/*
/*
/*
pointer to char */
function returning pointer to char */
pointer to above */
function returning... */
pointer to... */
array of... */
3. Use the cdecl program, which turns English into C and vice versa:
cdecl> declare a as array of pointer to function returning
pointer to function returning pointer to char
char *(*(*a[])())()
cdecl can also explain complicated declarations, help with casts, and indicate which set of parentheses
the arguments go in (for complicated function definitions, like the one above). Versions of cdecl are in
volume 14 of comp.sources.unix (see question 18.16) and K&R2.
Any good book on C should explain how to read these complicated C declarations ``inside out'' to understand them
(``declaration mimics use'').
The pointer-to-function declarations in the examples above have not included parameter type information. When
the parameters have complicated types, declarations can really get messy. (Modern versions of cdecl can help
here, too.)
References: K&R2 Sec. 5.12 p. 122
ANSI Sec. 3.5ff (esp. Sec. 3.5.4)
ISO Sec. 6.5ff (esp. Sec. 6.5.4)
H&S Sec. 4.5 pp. 85-92, Sec. 5.10.1 pp. 149-50
Question 1.21
Question 1.14
Question 1.14
I can't seem to define a linked list successfully. I tried
typedef struct {
char *item;
NODEPTR next;
} *NODEPTR;
but the compiler gave me error messages. Can't a structure in C contain a pointer to itself?
Structures in C can certainly contain pointers to themselves; the discussion and example in section 6.5 of
K&R make this clear. The problem with the NODEPTR example is that the typedef has not been defined
at the point where the next field is declared. To fix this code, first give the structure a tag (``struct
node''). Then, declare the next field as a simple struct node *, or disentangle the typedef
declaration from the structure definition, or both. One corrected version would be
struct node {
char *item;
struct node *next;
};
typedef struct node *NODEPTR;
and there are at least three other equivalently correct ways of arranging it.
A similar problem, with a similar solution, can arise when attempting to declare a pair of typedef'ed
mutually referential structures.
See also question 2.1.
References: K&R1 Sec. 6.5 p. 101
K&R2 Sec. 6.5 p. 139
ANSI Sec. 3.5.2, Sec. 3.5.2.3, esp. examples
ISO Sec. 6.5.2, Sec. 6.5.2.3
H&S Sec. 5.6.1 pp. 132-3
Question 1.14
Question 2.1
Question 2.1
What's the difference between these two declarations?
struct x1 { ... };
typedef struct { ... } x2;
The first form declares a structure tag; the second declares a typedef. The main difference is that the
second declaration is of a slightly more abstract type--its users don't necessarily know that it is a
structure, and the keyword struct is not used when declaring instances of it.
Question 1.34
Question 1.34
I finally figured out the syntax for declaring pointers to functions, but now how do I initialize one?
Question 4.12
Question 4.12
I've seen different methods used for calling functions via pointers. What's the story?
Originally, a pointer to a function had to be ``turned into'' a ``real'' function, with the * operator (and an
extra pair of parentheses, to keep the precedence straight), before calling:
int r, func(), (*fp)() = func;
r = (*fp)();
It can also be argued that functions are always called via pointers, and that ``real'' function names always
decay implicitly into pointers (in expressions, as they do in initializations; see question 1.34). This
reasoning, made widespread through pcc and adopted in the ANSI standard, means that
r = fp();
is legal and works correctly, whether fp is the name of a function or a pointer to one. (The usage has
always been unambiguous; there is nothing you ever could have done with a function pointer followed by
an argument list except call the function pointed to.) An explicit * is still allowed (and recommended, if
portability to older compilers is important).
See also question 1.34.
References: K&R1 Sec. 5.12 p. 116
K&R2 Sec. 5.11 p. 120
ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
Rationale Sec. 3.3.2.2
H&S Sec. 5.8 p. 147, Sec. 7.4.3 p. 190
Question 4.11
Question 4.11
Does C even have ``pass by reference''?
Not really. Strictly speaking, C always uses pass by value. You can simulate pass by reference yourself,
by defining functions which accept pointers and then using the & operator when calling, and the compiler
will essentially simulate it for you when you pass an array to a function (by passing a pointer instead, see
question 6.4 et al.), but C has nothing truly equivalent to formal pass by reference or C++ reference
parameters. (However, function-like preprocessor macros do provide a form of ``call by name''.)
See also questions 4.8 and 20.1.
References: K&R1 Sec. 1.8 pp. 24-5, Sec. 5.2 pp. 91-3
K&R2 Sec. 1.8 pp. 27-8, Sec. 5.2 pp. 91-3
ANSI Sec. 3.3.2.2, esp. footnote 39
ISO Sec. 6.3.2.2
H&S Sec. 9.5 pp. 273-4
Question 6.4
Question 6.4
Then why are array and pointer declarations interchangeable as function formal parameters?
Question 6.4
Question 6.21
Question 6.21
Why doesn't sizeof properly report the size of an array when the array is a parameter to a function?
The compiler pretends that the array parameter was declared as a pointer (see question 6.4), and sizeof
reports the size of the pointer.
References: H&S Sec. 7.5.2 p. 195
Question 6.20
Question 6.20
How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when
passing them to functions?
array[NROWS][NCOLUMNS];
**array1;
/* ragged */
**array2;
/* contiguous */
*array3;
/* "flattened" */
(*array4)[NCOLUMNS];
with the pointers initialized as in the code fragments in question 6.16, and functions declared as
f1(int a[][NCOLUMNS], int nrows, int ncolumns);
f2(int *aryp, int nrows, int ncolumns);
f3(int **pp, int nrows, int ncolumns);
where f1 accepts a conventional two-dimensional array, f2 accepts a ``flattened'' two-dimensional
array, and f3 accepts a pointer-to-pointer, simulated array (see also questions 6.18 and 6.19), the
following calls should work as expected:
f1(array, NROWS, NCOLUMNS);
f1(array4, nrows, NCOLUMNS);
f2(&array[0][0], NROWS, NCOLUMNS);
f2(*array, NROWS, NCOLUMNS);
f2(*array2, nrows, ncolumns);
f2(array3, nrows, ncolumns);
f2(*array4, nrows, NCOLUMNS);
f3(array1, nrows, ncolumns);
f3(array2, nrows, ncolumns);
The following two calls would probably work on most systems, but involve questionable casts, and work
only if the dynamic ncolumns matches the static NCOLUMNS:
f1((int (*)[NCOLUMNS])(*array2), nrows, ncolumns);
f1((int (*)[NCOLUMNS])array3, nrows, ncolumns);
Question 6.20
It must again be noted that passing &array[0][0] (or, equivalently, *array) to f2 is not strictly
conforming; see question 6.19.
If you can understand why all of the above calls work and are written as they are, and if you understand
why the combinations that are not listed would not work, then you have a very good understanding of
arrays and pointers in C.
Rather than worrying about all of this, one approach to using multidimensional arrays of various sizes is
to make them all dynamic, as in question 6.16. If there are no static multidimensional arrays--if all arrays
are allocated like array1 or array2 in question 6.16--then all functions can be written like f3.
Question 6.16
Question 6.16
How can I dynamically allocate a multidimensional array?
It is usually best to allocate an array of pointers, and then initialize each pointer to a dynamicallyallocated ``row.'' Here is a two-dimensional example:
#include <stdlib.h>
int **array1 = (int **)malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array1[i] = (int *)malloc(ncolumns * sizeof(int));
(In real code, of course, all of malloc's return values would be checked.)
You can keep the array's contents contiguous, while making later reallocation of individual rows
difficult, with a bit of explicit pointer arithmetic:
int **array2 = (int **)malloc(nrows * sizeof(int *));
array2[0] = (int *)malloc(nrows * ncolumns * sizeof(int));
for(i = 1; i < nrows; i++)
array2[i] = array2[0] + i * ncolumns;
In either case, the elements of the dynamic array can be accessed with normal-looking array subscripts:
arrayx[i][j] (for 0 <= i < NROWS and 0 <= j < NCOLUMNS).
If the double indirection implied by the above schemes is for some reason unacceptable, you can
simulate a two-dimensional array with a single, dynamically-allocated one-dimensional array:
int *array3 = (int *)malloc(nrows * ncolumns * sizeof(int));
However, you must now perform subscript calculations manually, accessing the i,jth element with
array3[i * ncolumns + j]. (A macro could hide the explicit calculation, but invoking it would
require parentheses and commas which wouldn't look exactly like multidimensional array syntax, and the
macro would need access to at least one of the dimensions, as well. See also question 6.19.)
Finally, you could use pointers to arrays:
int (*array4)[NCOLUMNS] =
http://www.eskimo.com/~scs/C-faq/q6.16.html (1 of 2) [22/07/2003 5:11:51 PM]
Question 6.16
Question 6.19
Question 6.19
How do I write functions which accept two-dimensional arrays when the ``width'' is not known at
compile time?
It's not easy. One way is to pass in a pointer to the [0][0] element, along with the two dimensions, and
simulate array subscripting ``by hand:''
f2(aryp, nrows, ncolumns)
int *aryp;
int nrows, ncolumns;
{ ... array[i][j] is accessed as aryp[i * ncolumns + j] ... }
This function could be called with the array from question 6.18 as
f2(&array[0][0], NROWS, NCOLUMNS);
It must be noted, however, that a program which performs multidimensional array subscripting ``by
hand'' in this way is not in strict conformance with the ANSI C Standard; according to an official
interpretation, the behavior of accessing (&array[0][0])[x] is not defined for x >= NCOLUMNS.
gcc allows local arrays to be declared having sizes which are specified by a function's arguments, but
this is a nonstandard extension.
When you want to be able to use a function on multidimensional arrays of various sizes, one solution is
to simulate all the arrays dynamically, as in question 6.16.
See also questions 6.18, 6.20, and 6.15.
References: ANSI Sec. 3.3.6
ISO Sec. 6.3.6
Question 6.19
Question 6.18
Question 6.18
My compiler complained when I passed a two-dimensional array to a function expecting a pointer to a
pointer.
The rule (see question 6.3) by which arrays decay into pointers is not applied recursively. An array of
arrays (i.e. a two-dimensional array in C) decays into a pointer to an array, not a pointer to a pointer.
Pointers to arrays can be confusing, and must be treated carefully; see also question 6.13. (The confusion
is heightened by the existence of incorrect compilers, including some old versions of pcc and pccderived lints, which improperly accept assignments of multi-dimensional arrays to multi-level
pointers.)
If you are passing a two-dimensional array to a function:
int array[NROWS][NCOLUMNS];
f(array);
the function's declaration must match:
f(int a[][NCOLUMNS])
{ ... }
or
f(int (*ap)[NCOLUMNS])
{ ... }
/* ap is a pointer to an array */
In the first declaration, the compiler performs the usual implicit parameter rewriting of ``array of array''
to ``pointer to array'' (see questions 6.3 and 6.4); in the second form the pointer declaration is explicit.
Since the called function does not allocate space for the array, it does not need to know the overall size,
so the number of rows, NROWS, can be omitted. The ``shape'' of the array is still important, so the column
dimension NCOLUMNS (and, for three- or more dimensional arrays, the intervening ones) must be
retained.
If a function is already declared as accepting a pointer to a pointer, it is probably meaningless to pass a
two-dimensional array directly to it.
See also questions 6.12 and 6.15.
Question 6.18
Question 6.3
Question 6.3
So what is meant by the ``equivalence of pointers and arrays'' in C?
Much of the confusion surrounding arrays and pointers in C can be traced to a misunderstanding of this
statement. Saying that arrays and pointers are ``equivalent'' means neither that they are identical nor even
interchangeable.
``Equivalence'' refers to the following key definition:
An lvalue of type array-of-T which appears in an expression decays (with three
exceptions) into a pointer to its first element; the type of the resultant pointer is pointer-toT.
(The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal
initializer for a character array.)
As a consequence of this definition, the compiler doesn't apply the array subscripting operator [] that
differently to arrays and pointers, after all. In an expression of the form a[i], the array decays into a
pointer, following the rule above, and is then subscripted just as would be a pointer variable in the
expression p[i] (although the eventual memory accesses will be different, as explained in question 6.2).
If you were to assign the array's address to the pointer:
p = a;
then p[3] and a[3] would access the same element.
See also question 6.8.
References: K&R1 Sec. 5.3 pp. 93-6
K&R2 Sec. 5.3 p. 99
ANSI Sec. 3.2.2.1, Sec. 3.3.2.1, Sec. 3.3.6
ISO Sec. 6.2.2.1, Sec. 6.3.2.1, Sec. 6.3.6
H&S Sec. 5.4.1 p. 124
Question 6.3
Question 6.2
Question 6.2
But I heard that char a[] was identical to char *a.
Not at all. (What you heard has to do with formal parameters to functions; see question 6.4.) Arrays are
not pointers. The array declaration char a[6] requests that space for six characters be set aside, to be
known by the name ``a.'' That is, there is a location named ``a'' at which six characters can sit. The
pointer declaration char *p, on the other hand, requests a place which holds a pointer, to be known by
the name ``p.'' This pointer can point almost anywhere: to any char, or to any contiguous array of
chars, or nowhere (see also questions 5.1 and 1.30).
As usual, a picture is worth a thousand words. The declarations
char a[] = "hello";
char *p = "world";
would initialize data structures which could be represented like this:
+---+---+---+---+---+---+
a: | h | e | l | l | o |\0 |
+---+---+---+---+---+---+
+-----+
+---+---+---+---+---+---+
p: | *======> | w | o | r | l | d |\0 |
+-----+
+---+---+---+---+---+---+
It is important to realize that a reference like x[3] generates different code depending on whether x is an
array or a pointer. Given the declarations above, when the compiler sees the expression a[3], it emits
code to start at the location ``a,'' move three past it, and fetch the character there. When it sees the
expression p[3], it emits code to start at the location ``p,'' fetch the pointer value there, add three to the
pointer, and finally fetch the character pointed to. In other words, a[3] is three places past (the start of)
the object named a, while p[3] is three places past the object pointed to by p. In the example above,
both a[3] and p[3] happen to be the character 'l', but the compiler gets there differently.
References: K&R2 Sec. 5.5 p. 104
CT&P Sec. 4.5 pp. 64-5
Question 6.2
Question 5.1
Question 5.1
What is this infamous null pointer, anyway?
The language definition states that for each pointer type, there is a special value--the ``null pointer''-which is distinguishable from all other pointer values and which is ``guaranteed to compare unequal to a
pointer to any object or function.'' That is, the address-of operator & will never yield a null pointer, nor
will a successful call to malloc. (malloc does return a null pointer when it fails, and this is a typical
use of null pointers: as a ``special'' pointer value with some other meaning, usually ``not allocated'' or
``not pointing anywhere yet.'')
A null pointer is conceptually different from an uninitialized pointer. A null pointer is known not to point
to any object or function; an uninitialized pointer might point anywhere. See also questions 1.30, 7.1, and
7.31.
As mentioned above, there is a null pointer for each pointer type, and the internal values of null pointers
for different types may be different. Although programmers need not know the internal values, the
compiler must always be informed which type of null pointer is required, so that it can make the
distinction if necessary (see questions 5.2, 5.5, and 5.6).
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ANSI Sec. 3.2.2.3
ISO Sec. 6.2.2.3
Rationale Sec. 3.2.2.3
H&S Sec. 5.3.2 pp. 121-3
Question 1.30
Question 1.30
What can I safely assume about the initial values of variables which are not explicitly initialized? If
global variables start out as ``zero,'' is that good enough for null pointers and floating-point zeroes?
Variables with static duration (that is, those declared outside of functions, and those declared with the
storage class static), are guaranteed initialized (just once, at program startup) to zero, as if the
programmer had typed ``= 0''. Therefore, such variables are initialized to the null pointer (of the correct
type; see also section 5) if they are pointers, and to 0.0 if they are floating-point.
Variables with automatic duration (i.e. local variables without the static storage class) start out
containing garbage, unless they are explicitly initialized. (Nothing useful can be predicted about the
garbage.)
Dynamically-allocated memory obtained with malloc and realloc is also likely to contain garbage,
and must be initialized by the calling program, as appropriate. Memory obtained with calloc is all-bits0, but this is not necessarily useful for pointer or floating-point values (see question 7.31, and section 5).
References: K&R1 Sec. 4.9 pp. 82-4
K&R2 Sec. 4.9 pp. 85-86
ANSI Sec. 3.5.7, Sec. 4.10.3.1, Sec. 4.10.5.3
ISO Sec. 6.5.7, Sec. 7.10.3.1, Sec. 7.10.5.3
H&S Sec. 4.2.8 pp. 72-3, Sec. 4.6 pp. 92-3, Sec. 4.6.2 pp. 94-5, Sec. 4.6.3 p. 96, Sec. 16.1 p. 386
Null Pointers
5. Null Pointers
5.1 What is this infamous null pointer, anyway?
5.2 How do I get a null pointer in my programs?
5.3 Is the abbreviated pointer comparison ``if(p)'' to test for non-null pointers valid?
5.4 What is NULL and how is it #defined?
5.5 How should NULL be defined on a machine which uses a nonzero bit pattern as the internal
representation of a null pointer?
5.6 If NULL were defined as ``((char *)0),'' wouldn't that make function calls which pass an uncast
NULL work?
5.9 If NULL and 0 are equivalent as null pointer constants, which should I use?
5.10 But wouldn't it be better to use NULL, in case the value of NULL changes?
5.12 I use the preprocessor macro "#define Nullptr(type) (type *)0" to help me build null
pointers of the correct type.
5.13 This is strange. NULL is guaranteed to be 0, but the null pointer is not?
5.14 Why is there so much confusion surrounding null pointers?
5.15 I'm confused. I just can't understand all this null pointer stuff.
5.16 Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?
5.17 Seriously, have any actual machines really used nonzero null pointers?
5.20 What does a run-time ``null pointer assignment'' error mean?
Null Pointers
top
Question 5.4
Question 5.4
What is NULL and how is it #defined?
As a matter of style, many programmers prefer not to have unadorned 0's scattered through their
programs. Therefore, the preprocessor macro NULL is #defined (by <stdio.h> or <stddef.h>)
with the value 0, possibly cast to (void *) (see also question 5.6). A programmer who wishes to make
explicit the distinction between 0 the integer and 0 the null pointer constant can then use NULL
whenever a null pointer is required.
Using NULL is a stylistic convention only; the preprocessor turns NULL back into 0 which is then
recognized by the compiler, in pointer contexts, as before. In particular, a cast may still be necessary
before NULL (as before 0) in a function call argument. The table under question 5.2 above applies for
NULL as well as 0 (an unadorned NULL is equivalent to an unadorned 0).
NULL should only be used for pointers; see question 5.9.
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
ANSI Sec. 4.1.5, Sec. 3.2.2.3
ISO Sec. 7.1.6, Sec. 6.2.2.3
Rationale Sec. 4.1.5
H&S Sec. 5.3.2 p. 122, Sec. 11.1 p. 292
Question 5.6
Question 5.6
If NULL were defined as follows:
#define NULL ((char *)0)
wouldn't that make function calls which pass an uncast NULL work?
Not in general. The problem is that there are machines which use different internal representations for
pointers to different types of data. The suggested definition would make uncast NULL arguments to
functions expecting pointers to characters work correctly, but pointer arguments of other types would
still be problematical, and legal constructions such as
FILE *fp = NULL;
could fail.
Nevertheless, ANSI C allows the alternate definition
#define NULL ((void *)0)
for NULL. Besides potentially helping incorrect programs to work (but only on machines with
homogeneous pointers, thus questionably valid assistance), this definition may catch programs which use
NULL incorrectly (e.g. when the ASCII NUL character was really intended; see question 5.9).
References: Rationale Sec. 4.1.5
Question 5.9
Question 5.9
If NULL and 0 are equivalent as null pointer constants, which should I use?
Many programmers believe that NULL should be used in all pointer contexts, as a reminder that the value
is to be thought of as a pointer. Others feel that the confusion surrounding NULL and 0 is only
compounded by hiding 0 behind a macro, and prefer to use unadorned 0 instead. There is no one right
answer. (See also questions 9.2 and 17.10.) C programmers must understand that NULL and 0 are
interchangeable in pointer contexts, and that an uncast 0 is perfectly acceptable. Any usage of NULL (as
opposed to 0) should be considered a gentle reminder that a pointer is involved; programmers should not
depend on it (either for their own understanding or the compiler's) for distinguishing pointer 0's from
integer 0's.
NULL should not be used when another kind of 0 is required, even though it might work, because doing
so sends the wrong stylistic message. (Furthermore, ANSI allows the definition of NULL to be ((void
*)0), which will not work at all in non-pointer contexts.) In particular, do not use NULL when the
ASCII null character (NUL) is desired. Provide your own definition
#define NUL '\0'
if you must.
References: K&R1 Sec. 5.4 pp. 97-8
K&R2 Sec. 5.4 p. 102
Question 5.17
Question 5.17
Seriously, have any actual machines really used nonzero null pointers, or different representations for
pointers to different types?
The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I. Later models used
segment 0, offset 0 for null pointers in C, necessitating new instructions such as TCNP (Test C Null
Pointer), evidently as a sop to all the extant poorly-written C code which made incorrect assumptions.
Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char
*'s) than word pointers (int *'s).
The Eclipse MV series from Data General has three architecturally supported pointer formats (word,
byte, and bit pointers), two of which are used by C compilers: byte pointers for char * and void *,
and word pointers for everything else.
Some Honeywell-Bull mainframes use the bit pattern 06000 for (internal) null pointers.
The CDC Cyber 180 Series has 48-bit pointers consisting of a ring, segment, and offset. Most users (in
ring 11) have null pointers of 0xB00000000000. It was common on old CDC ones-complement machines
to use an all-one-bits word as a special flag for all kinds of data, including invalid addresses.
The old HP 3000 series uses a different addressing scheme for byte addresses than for word addresses;
like several of the machines above it therefore uses different representations for char * and void *
pointers than for other pointers.
The Symbolics Lisp Machine, a tagged architecture, does not even have conventional numeric pointers; it
uses the pair <NIL, 0> (basically a nonexistent <object, offset> handle) as a C null pointer.
Depending on the ``memory model'' in use, 8086-family processors (PC compatibles) may use 16-bit
data pointers and 32-bit function pointers, or vice versa.
Some 64-bit Cray machines represent int * in the lower 48 bits of a word; char * additionally uses
the upper 16 bits to indicate a byte address within a word.
References: K&R1 Sec. A14.4 p. 211
Question 5.17
Question 5.16
Question 5.16
Given all the confusion surrounding null pointers, wouldn't it be easier simply to require them to be
represented internally by zeroes?
If for no other reason, doing so would be ill-advised because it would unnecessarily constrain
implementations which would otherwise naturally represent null pointers by special, nonzero bit patterns,
particularly when those values would trigger automatic hardware traps for invalid accesses.
Besides, what would such a requirement really accomplish? Proper understanding of null pointers does
not require knowledge of the internal representation, whether zero or nonzero. Assuming that null
pointers are internally zero does not make any code easier to write (except for a certain ill-advised usage
of calloc; see question 7.31). Known-zero internal pointers would not obviate casts in function calls,
because the size of the pointer might still be different from that of an int. (If ``nil'' were used to request
null pointers, as mentioned in question 5.14, the urge to assume an internal zero representation would not
even arise.)
Question 7.31
Question 7.31
What's the difference between calloc and malloc? Is it safe to take advantage of calloc's zerofilling? Does free work on memory allocated with calloc, or do you need a cfree?
Question 7.30
Question 7.30
Is it legal to pass a null pointer as the first argument to realloc? Why would you want to?
ANSI C sanctions this usage (and the related realloc(..., 0), which frees), although several earlier
implementations do not support it, so it may not be fully portable. Passing an initially-null pointer to
realloc can make it easier to write a self-starting incremental allocation algorithm.
References: ANSI Sec. 4.10.3.4
ISO Sec. 7.10.3.4
H&S Sec. 16.3 p. 388
Question 7.27
Question 7.27
So can I query the malloc package to find out how big an allocated block is?
Not portably.
Question 7.26
Question 7.26
How does free know how many bytes to free?
The malloc/free implementation remembers the size of each block allocated and returned, so it is not
necessary to remind it of the size when freeing.
Question 7.25
Question 7.25
I have a program which mallocs and later frees a lot of memory, but memory usage (as reported by
ps) doesn't seem to go back down.
Most implementations of malloc/free do not return freed memory to the operating system (if there
is one), but merely make it available for future malloc calls within the same program.
Question 7.24
Question 7.24
Must I free allocated memory before the program exits?
You shouldn't have to. A real operating system definitively reclaims all memory when a program exits.
Nevertheless, some personal computers are said not to reliably recover memory, and all that can be
inferred from the ANSI/ISO C Standard is that this is a ``quality of implementation issue.''
References: ANSI Sec. 4.10.3.2
ISO Sec. 7.10.3.2
Question 7.23
Question 7.23
I'm allocating structures which contain pointers to other dynamically-allocated objects. When I free a
structure, do I have to free each subsidiary pointer first?
Yes. In general, you must arrange that each pointer returned from malloc be individually passed to
free, exactly once (if it is freed at all).
A good rule of thumb is that for each call to malloc in a program, you should be able to point at the call
to free which frees the memory allocated by that malloc call.
See also question 7.24.
Question 7.22
Question 7.22
When I call malloc to allocate memory for a local pointer, do I have to explicitly free it?
Yes. Remember that a pointer is different from what it points to. Local variables are deallocated when
the function returns, but in the case of a pointer variable, this means that the pointer is deallocated, not
what it points to. Memory allocated with malloc always persists until you explicitly free it. In general,
for every call to malloc, there should be a corresponding call to free.
Question 7.21
Question 7.21
Why isn't a pointer null after calling free?
How unsafe is it to use (assign, compare) a pointer value after it's been freed?
When you call free, the memory pointed to by the passed pointer is freed, but the value of the pointer
in the caller remains unchanged, because C's pass-by-value semantics mean that called functions never
permanently change the values of their arguments. (See also question 4.8.)
A pointer value which has been freed is, strictly speaking, invalid, and any use of it, even if is not
dereferenced can theoretically lead to trouble, though as a quality of implementation issue, most
implementations will probably not go out of their way to generate exceptions for innocuous uses of
invalid pointers.
References: ANSI Sec. 4.10.3
ISO Sec. 7.10.3
Rationale Sec. 3.2.2.3
Question 4.8
Question 4.8
I have a function which accepts, and is supposed to initialize, a pointer:
void f(ip)
int *ip;
{
static int dummy = 5;
ip = &dummy;
}
But when I call it like this:
int *ip;
f(ip);
the pointer in the caller remains unchanged.
Are you sure the function initialized what you thought it did? Remember that arguments in C are passed
by value. The called function altered only the passed copy of the pointer. You'll either want to pass the
address of the pointer (the function will end up accepting a pointer-to-a-pointer), or have the function
return the pointer.
See also questions 4.9 and 4.11.
Question 4.9
Question 4.9
Can I use a void ** pointer to pass a generic pointer to a function by reference?
Not portably. There is no generic pointer-to-pointer type in C. void * acts as a generic pointer only
because conversions are applied automatically when other pointer types are assigned to and from void
*'s; these conversions cannot be performed (the correct underlying pointer type is not known) if an
attempt is made to indirect upon a void ** value which points at something other than a void *.
Question 4.10
Question 4.10
I have a function
extern int f(int *);
which accepts a pointer to an int. How can I pass a constant by reference? A call like
f(&5);
doesn't seem to work.
You can't do this directly. You will have to declare a temporary variable, and then pass its address to the
function:
int five = 5;
f(&five);
See also questions 2.10, 4.8, and 20.1.
Question 2.10
Question 2.10
How can I pass constant values to functions which accept structure arguments?
C has no way of generating anonymous structure values. You will have to use a temporary structure
variable or a little structure-building function; see question 14.11 for an example. (gcc provides
structure constants as an extension, and the mechanism will probably be added to a future revision of the
C Standard.) See also question 4.10.
Question 2.6
Question 2.6
I came across some code that declared a structure like this:
struct name {
int namelen;
char namestr[1];
};
and then did some tricky allocation to make the namestr array act like it had several elements. Is this
legal or portable?
This technique is popular, although Dennis Ritchie has called it ``unwarranted chumminess with the C
implementation.'' An official interpretation has deemed that it is not strictly conforming with the C
Standard. (A thorough treatment of the arguments surrounding the legality of the technique is beyond the
scope of this list.) It does seem to be portable to all known implementations. (Compilers which check
array bounds carefully might issue warnings.)
Another possibility is to declare the variable-size element very large, rather than very small; in the case
of the above example:
...
char namestr[MAXSIZE];
...
where MAXSIZE is larger than any name which will be stored. However, it looks like this technique is
disallowed by a strict interpretation of the Standard as well.
References: Rationale Sec. 3.5.4.2
Question 2.4
Question 2.4
What's the best way of implementing opaque (abstract) data types in C?
One good way is for clients to use structure pointers (perhaps additionally hidden behind typedefs)
which point to structure types which are not publicly defined.
Question 2.3
Question 2.3
Can a structure contain a pointer to itself?
Question 2.2
Question 2.2
Why doesn't
struct x { ... };
x thestruct;
work?
C is not C++. Typedef names are not automatically generated for structure tags. See also question 2.1.
2.22 What is the difference between an enumeration and a set of preprocessor #defines?
2.24 Is there an easy way to print enumeration values symbolically?
top
Question 20.1
Question 20.1
How can I return multiple values from a function?
Either pass pointers to several locations which the function can fill in, or have the function return a
structure containing the desired values, or (in a pinch) consider global variables. See also questions 2.7,
4.8, and 7.5.
Question 7.5
Question 7.5
I have a function that is supposed to return a string, but when it returns to its caller, the returned string is
garbage.
Make sure that the pointed-to memory is properly allocated. The returned pointer should be to a staticallyallocated buffer, or to a buffer passed in by the caller, or to memory obtained with malloc, but not to a
local (automatic) array. In other words, never do something like
char *itoa(int n)
{
char retbuf[20];
sprintf(retbuf, "%d", n);
return retbuf;
}
/* WRONG */
/* WRONG */
One fix (which is imperfect, especially if the function in question is called recursively, or if several of its
return values are needed simultaneously) would be to declare the return buffer as
static char retbuf[20];
See also questions 12.21 and 20.1.
References: ANSI Sec. 3.1.2.4
ISO Sec. 6.1.2.4
Question 12.21
Question 12.21
How can I tell how much destination buffer space I'll need for an arbitrary sprintf call? How can I
avoid overflowing the destination buffer with sprintf?
There are not (yet) any good answers to either of these excellent questions, and this represents perhaps
the biggest deficiency in the traditional stdio library.
When the format string being used with sprintf is known and relatively simple, you can usually
predict a buffer size in an ad-hoc way. If the format consists of one or two %s's, you can count the fixed
characters in the format string yourself (or let sizeof count them for you) and add in the result of
calling strlen on the string(s) to be inserted. You can conservatively estimate the size that %d will
expand to with code like:
#include <limits.h>
char buf[(sizeof(int) * CHAR_BIT + 2) / 3 + 1 + 1];
sprintf(buf, "%d", n);
(This code computes the number of characters required for a base-8 representation of a number; a base10 expansion is guaranteed to take as much room or less.)
When the format string is more complicated, or is not even known until run time, predicting the buffer
size becomes as difficult as reimplementing sprintf, and correspondingly error-prone (and
inadvisable). A last-ditch technique which is sometimes suggested is to use fprintf to print the same
text to a bit bucket or temporary file, and then to look at fprintf's return value or the size of the file
(but see question 19.12).
If there's any chance that the buffer might not be big enough, you won't want to call sprintf without
some guarantee that the buffer will not overflow and overwrite some other part of memory. Several
stdio's (including GNU and 4.4bsd) provide the obvious snprintf function, which can be used like
this:
snprintf(buf, bufsize, "You typed \"%s\"", answer);
and we can hope that a future revision of the ANSI/ISO C Standard will include this function.
Question 12.21
Question 19.12
Question 19.12
How can I find out the size of a file, prior to reading it in?
If the ``size of a file'' is the number of characters you'll be able to read from it in C, it is difficult or
impossible to determine this number exactly).
Under Unix, the stat call will give you an exact answer. Several other systems supply a Unix-like
stat which will give an approximate answer. You can fseek to the end and then use ftell, but these
tend to have the same problems: fstat is not portable, and generally tells you the same thing stat
tells you; ftell is not guaranteed to return a byte count except for binary files. Some systems provide
routines called filesize or filelength, but these are not portable, either.
Are you sure you have to determine the file's size in advance? Since the most accurate way of
determining the size of a file as a C program will see it is to open the file and read it, perhaps you can
rearrange the code to learn the size as it reads.
References: ANSI Sec. 4.9.9.4
ISO Sec. 7.9.9.4
H&S Sec. 15.5.1
PCS Sec. 12 p. 213
POSIX Sec. 5.6.2
Question 19.11
Question 19.11
How can I check whether a file exists? I want to warn the user if a requested input file is missing.
It's surprisingly difficult to make this determination reliably and portably. Any test you make can be
invalidated if the file is created or deleted (i.e. by some other process) between the time you make the
test and the time you try to open the file.
Three possible test routines are stat, access, and fopen. (To make an approximate test for file
existence with fopen, just open for reading and close immediately.) Of these, only fopen is widely
portable, and access, where it exists, must be used carefully if the program uses the Unix set-UID
feature.
Rather than trying to predict in advance whether an operation such as opening a file will succeed, it's
often better to try it, check the return value, and complain if it fails. (Obviously, this approach won't work
if you're trying to avoid overwriting an existing file, unless you've got something like the O_EXCL file
opening option available, which does just what you want in this case.)
References: PCS Sec. 12 pp. 189,213
POSIX Sec. 5.3.1, Sec. 5.6.2, Sec. 5.6.3
Question 19.10
Question 19.10
How can I do graphics?
Once upon a time, Unix had a fairly nice little set of device-independent plot routines described in plot(3)
and plot(5), but they've largely fallen into disuse.
If you're programming for MS-DOS, you'll probably want to use libraries conforming to the VESA or
BGI standards.
If you're trying to talk to a particular plotter, making it draw is usually a matter of sending it the
appropriate escape sequences; see also question 19.9. The vendor may supply a C-callable library, or you
may be able to find one on the net.
If you're programming for a particular window system (Macintosh, X windows, Microsoft Windows),
you will use its facilities; see the relevant documentation or newsgroup or FAQ list.
References: PCS Sec. 5.4 pp. 75-77
Question 19.9
Question 19.9
How do I send escape sequences to control a terminal or other device?
If you can figure out how to send characters to the device at all (see question 19.8), it's easy enough to
send escape sequences. In ASCII, the ESC code is 033 (27 decimal), so code like
fprintf(ofd, "\033[J");
sends the sequence ESC [ J .
Question 19.8
Question 19.8
How can I direct output to the printer?
Under Unix, either use popen (see question 19.30) to write to the lp or lpr program, or perhaps open
a special file like /dev/lp. Under MS-DOS, write to the (nonstandard) predefined stdio stream
stdprn, or open the special files PRN or LPT1.
References: PCS Sec. 5.3 pp. 72-74
Question 19.30
Question 19.30
How can I invoke another program or command and trap its output?
Unix and some other systems provide a popen routine, which sets up a stdio stream on a pipe connected
to the process running a command, so that the output can be read (or the input supplied). (Also,
remember to call pclose.)
If you can't use popen, you may be able to use system, with the output going to a file which you then
open and read.
If you're using Unix and popen isn't sufficient, you can learn about pipe, dup, fork, and exec.
(One thing that probably would not work, by the way, would be to use freopen.)
References: PCS Sec. 11 p. 169
Question 19.27
Question 19.27
How can I invoke another program (a standalone executable, or an operating system command) from
within a C program?
Use the library function system, which does exactly that. Note that system's return value is the
command's exit status, and usually has nothing to do with the output of the command. Note also that
system accepts a single string representing the command to be invoked; if you need to build up a
complex command line, you can use sprintf. See also question 19.30.
References: K&R1 Sec. 7.9 p. 157
K&R2 Sec. 7.8.4 p. 167, Sec. B6 p. 253
ANSI Sec. 4.10.4.5
ISO Sec. 7.10.4.5
H&S Sec. 19.2 p. 407
PCS Sec. 11 p. 179
Question 19.25
Question 19.25
How can I access memory (a memory-mapped device, or graphics memory) located at a certain address?
Set a pointer, of the appropriate type, to the right number (using an explicit cast to assure the compiler
that you really do intend this nonportable conversion):
unsigned int *magicloc = (unsigned int *)0x12345678;
Then, *magicloc refers to the location you want. (Under MS-DOS, you may find a macro like
MK_FP() handy for working with segments and offsets.)
References: K&R1 Sec. A14.4 p. 210
K&R2 Sec. A6.6 p. 199
ANSI Sec. 3.3.4
ISO Sec. 6.3.4
Rationale Sec. 3.3.4
H&S Sec. 6.2.7 pp. 171-2
Question 19.24
Question 19.24
What does the error message ``DGROUP data allocation exceeds 64K'' mean, and what can I do about it?
I thought that using large model meant that I could use more than 64K of data!
Even in large memory models, MS-DOS compilers apparently toss certain data (strings, some initialized
global or static variables) into a default data segment, and it's this segment that is overflowing. Either
use less global data, or, if you're already limiting yourself to reasonable amounts (and if the problem is
due to something like the number of strings), you may be able to coax the compiler into not using the
default data segment for so much. Some compilers place only ``small'' data objects in the default data
segment, and give you a way (e.g. the /Gt option under Microsoft compilers) to configure the threshold
for ``small.''
Question 19.23
Question 19.23
How can I allocate arrays or structures bigger than 64K?
A reasonable computer ought to give you transparent access to all available memory. If you're not so
lucky, you'll either have to rethink your program's use of memory, or use various system-specific
techniques.
64K is (still) a pretty big chunk of memory. No matter how much memory your computer has available,
it's asking a lot to be able to allocate huge amounts of it contiguously. (The C Standard does not
guarantee that a single object can be larger than 32K.) Often it's a good idea to use data structures which
don't require that all memory be contiguous. For dynamically-allocated multidimensional arrays, you can
use pointers to pointers, as illustrated in question 6.16. Instead of a large array of structures, you can use
a linked list, or an array of pointers to structures.
If you're using a PC-compatible (8086-based) system, and running up against a 640K limit, consider
using ``huge'' memory model, or expanded or extended memory, or malloc variants such as halloc or
farmalloc, or a 32-bit ``flat'' compiler (e.g. djgpp, see question 18.3), or some kind of a DOS
extender, or another operating system.
References: ANSI Sec. 2.2.4.1
ISO Sec. 5.2.4.1
Question 6.1
Question 6.1
I had the definition char a[6] in one source file, and in another I declared extern char *a. Why
didn't it work?
The declaration extern char *a simply does not match the actual definition. The type pointer-totype-T is not the same as array-of-type-T. Use extern char a[].
References: ANSI Sec. 3.5.4.2
ISO Sec. 6.5.4.2
CT&P Sec. 3.3 pp. 33-4, Sec. 4.5 pp. 64-5
Question 5.20
Question 5.20
What does a run-time ``null pointer assignment'' error mean? How do I track it down?
This message, which typically occurs with MS-DOS compilers (see, therefore, section 19) means that
you've written, via a null (perhaps because uninitialized) pointer, to location 0. (See also question 16.8.)
A debugger may let you set a data breakpoint or watchpoint or something on location 0. Alternatively,
you could write a bit of code to stash away a copy of 20 or so bytes from location 0, and periodically
check that the memory at location 0 hasn't changed.
System Dependencies
System Dependencies
19.18 How can I increase the allowable number of simultaneously open files?
19.20 How can I read a directory in a C program?
19.22 How can I find out how much memory is available?
19.23 How can I allocate arrays or structures bigger than 64K?
19.24 What does the error message ``DGROUP exceeds 64K'' mean?
19.25 How can I access memory located at a certain address?
19.27 How can I invoke another program from within a C program?
19.30 How can I invoke another program and trap its output?
19.31 How can my program discover the complete pathname to the executable from which it was
invoked?
19.32 How can I automatically locate a program's configuration files in the same directory as the
executable?
19.33 How can a process change an environment variable in its caller?
19.36 How can I read in an object file and jump to routines in it?
19.37 How can I implement a delay, or time a user's response, with sub-second resolution?
19.38 How can I trap or ignore keyboard interrupts like control-C?
19.39 How can I handle floating-point exceptions gracefully?
19.40 How do I... Use sockets? Do networking? Write client/server applications?
19.40b How do I use BIOS calls? How can I write ISR's? How can I create TSR's?
19.41 But I can't use all these nonstandard, system-dependent functions, because my program has to be
ANSI compatible!
System Dependencies
top
Question 19.1
Question 19.1
How can I read a single character from the keyboard without waiting for the RETURN key? How can I
stop characters from being echoed on the screen as they're typed?
Alas, there is no standard or portable way to do these things in C. Concepts such as screens and
keyboards are not even mentioned in the Standard, which deals only with simple I/O ``streams'' of
characters.
At some level, interactive keyboard input is usually collected and presented to the requesting program a
line at a time. This gives the operating system a chance to support input line editing
(backspace/delete/rubout, etc.) in a consistent way, without requiring that it be built into every program.
Only when the user is satisfied and presses the RETURN key (or equivalent) is the line made available to
the calling program. Even if the calling program appears to be reading input a character at a time (with
getchar or the like), the first call blocks until the user has typed an entire line, at which point
potentially many characters become available and many character requests (e.g. getchar calls) are
satisfied in quick succession.
When a program wants to read each character immediately as it arrives, its course of action will depend
on where in the input stream the line collection is happening and how it can be disabled. Under some
systems (e.g. MS-DOS, VMS in some modes), a program can use a different or modified set of OS-level
input calls to bypass line-at-a-time input processing. Under other systems (e.g. Unix, VMS in other
modes), the part of the operating system responsible for serial input (often called the ``terminal driver'')
must be placed in a mode which turns off line-at-a-time processing, after which all calls to the usual
input routines (e.g. read, getchar, etc.) will return characters immediately. Finally, a few systems
(particularly older, batch-oriented mainframes) perform input processing in peripheral processors which
cannot be told to do anything other than line-at-a-time input.
Therefore, when you need to do character-at-a-time input (or disable keyboard echo, which is an
analogous problem), you will have to use a technique specific to the system you're using, assuming it
provides one. Since comp.lang.c is oriented towards topics that C does deal with, you will usually get
better answers to these questions by referring to a system-specific newsgroup such as
comp.unix.questions or comp.os.msdos.programmer, and to the FAQ lists for these groups. Note that the
answers are often not unique even across different variants of a system; bear in mind when answering
system-specific questions that the answer that applies to your system may not apply to everyone else's.
However, since these questions are frequently asked here, here are brief answers for some common
situations.
Some versions of curses have functions called cbreak, noecho, and getch which do what you want.
http://www.eskimo.com/~scs/C-faq/q19.1.html (1 of 2) [22/07/2003 5:13:19 PM]
Question 19.1
If you're specifically trying to read a short password without echo, you might try getpass. Under Unix,
you can use ioctl to play with the terminal driver modes (CBREAK or RAW under ``classic'' versions;
ICANON, c_cc[VMIN] and c_cc[VTIME] under System V or POSIX systems; ECHO under all
versions), or in a pinch, system and the stty command. (For more information, see <sgtty.h> and
tty(4) under classic versions, <termio.h> and termio(4) under System V, or <termios.h> and
termios(4) under POSIX.) Under MS-DOS, use getch or getche, or the corresponding BIOS
interrupts. Under VMS, try the Screen Management (SMG$) routines, or curses, or issue low-level
$QIO's with the IO$_READVBLK function code (and perhaps IO$M_NOECHO, and others) to ask for
one character at a time. (It's also possible to set character-at-a-time or ``pass through'' modes in the VMS
terminal driver.) Under other operating systems, you're on your own.
(As an aside, note that simply using setbuf or setvbuf to set stdin to unbuffered will not
generally serve to allow character-at-a-time input.)
If you're trying to write a portable program, a good approach is to define your own suite of three
functions to (1) set the terminal driver or input system into character-at-a-time mode (if necessary), (2)
get characters, and (3) return the terminal driver to its initial state when the program is finished. (Ideally,
such a set of functions might be part of the C Standard, some day.) The extended versions of this FAQ
list (see question 20.40) contain examples of such functions for several popular systems.
See also question 19.2.
References: PCS Sec. 10 pp. 128-9, Sec. 10.1 pp. 130-1
POSIX Sec. 7
Question 19.2
Question 19.2
How can I find out if there are characters available for reading (and if so, how many)? Alternatively, how
can I do a read that will not block if there are no characters available?
These, too, are entirely operating-system-specific. Some versions of curses have a nodelay function.
Depending on your system, you may also be able to use ``nonblocking I/O'', or a system call named
select or poll, or the FIONREAD ioctl, c_cc[VTIME], or kbhit, or rdchk, or the O_NDELAY
option to open or fcntl. See also question 19.1.
Question 19.3
Question 19.3
How can I display a percentage-done indication that updates itself in place, or show one of those
``twirling baton'' progress indicators?
These simple things, at least, you can do fairly portably. Printing the character '\r' will usually give
you a carriage return without a line feed, so that you can overwrite the current line. The character '\b'
is a backspace, and will usually move the cursor one position to the left.
References: ANSI Sec. 2.2.2
ISO Sec. 5.2.2
Question 19.4
Question 19.4
How can I clear the screen?
How can I print things in inverse video?
How can I move the cursor to a specific x, y position?
Such things depend on the terminal type (or display) you're using. You will have to use a library such as
termcap, terminfo, or curses, or some system-specific routines, to perform these operations.
For clearing the screen, a halfway portable solution is to print a form-feed character ('\f'), which will
cause some displays to clear. Even more portable would be to print enough newlines to scroll everything
away. As a last resort, you could use system (see question 19.27) to invoke an operating system clearscreen command.
References: PCS Sec. 5.1.4 pp. 54-60, Sec. 5.1.5 pp. 60-62
Question 19.5
Question 19.5
How do I read the arrow keys? What about function keys?
Terminfo, some versions of termcap, and some versions of curses have support for these non-ASCII
keys. Typically, a special key sends a multicharacter sequence (usually beginning with ESC, '\033');
parsing these can be tricky. (curses will do the parsing for you, if you call keypad first.)
Under MS-DOS, if you receive a character with value 0 (not '0'!) while reading the keyboard, it's a flag
indicating that the next character read will be a code indicating a special key. See any DOS programming
guide for lists of keyboard codes. (Very briefly: the up, left, right, and down arrow keys are 72, 75, 77,
and 80, and the function keys are 59 through 68.)
References: PCS Sec. 5.1.4 pp. 56-7
Question 19.6
Question 19.6
How do I read the mouse?
Consult your system documentation, or ask on an appropriate system-specific newsgroup (but check its
FAQ list first). Mouse handling is completely different under the X window system, MS-DOS, the
Macintosh, and probably every other system.
References: PCS Sec. 5.5 pp. 78-80
Question 19.7
Question 19.7
How can I do serial (``comm'') port I/O?
It's system-dependent. Under Unix, you typically open, read, and write a device file in /dev, and use the
facilities of the terminal driver to adjust its characteristics. (See also questions 19.1 and 19.2.) Under MSDOS, you can use the predefined stream stdaux, or a special file like COM1, or some primitive BIOS
interrupts, or (if you require decent performance) any number of interrupt-driven serial I/O packages.
Several netters recommend the book C Programmer's Guide to Serial Communications, by Joe
Campbell.
Question 19.13
Question 19.13
How can a file be shortened in-place without completely clearing or rewriting it?
BSD systems provide ftruncate, several others supply chsize, and a few may provide a (possibly
undocumented) fcntl option F_FREESP. Under MS-DOS, you can sometimes use write(fd, "",
0). However, there is no portable solution, nor a way to delete blocks at the beginning. See also question
19.14.
Question 19.14
Question 19.14
How can I insert or delete a line (or record) in the middle of a file?
Short of rewriting the file, you probably can't. The usual solution is simply to rewrite the file. (Instead of
deleting records, you might consider simply marking them as deleted, to avoid rewriting.) See also
questions 12.30 and 19.13.
Question 12.30
Question 12.30
I'm trying to update a file in place, by using fopen mode "r+", reading a certain string, and writing
back a modified string, but it's not working.
Be sure to call fseek before you write, both to seek back to the beginning of the string you're trying to
overwrite, and because an fseek or fflush is always required between reading and writing in the
read/write "+" modes. Also, remember that you can only overwrite characters with the same number of
replacement characters; see also question 19.14.
References: ANSI Sec. 4.9.5.3
ISO Sec. 7.9.5.3
Question 12.26
Question 12.26
How can I flush pending input so that a user's typeahead isn't read at the next prompt? Will
fflush(stdin) work?
fflush is defined only for output streams. Since its definition of ``flush'' is to complete the writing of
buffered characters (not to discard them), discarding unread input would not be an analogous meaning
for fflush on input streams.
There is no standard way to discard unread characters from a stdio input stream, nor would such a way be
sufficient unread characters can also accumulate in other, OS-level input buffers.
References: ANSI Sec. 4.9.5.2
ISO Sec. 7.9.5.2
H&S Sec. 15.2
Question 12.25
Question 12.25
What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos and fsetpos good for?
fgetpos and fsetpos use a special typedef, fpos_t, for representing offsets (positions) in a file.
The type behind this typedef, if chosen appropriately, can represent arbitrarily large offsets, allowing
fgetpos and fsetpos to be used with arbitrarily huge files. ftell and fseek, on the other hand,
use long int, and are therefore limited to offsets which can be represented in a long int. See also
question 1.4.
References: K&R2 Sec. B1.6 p. 248
ANSI Sec. 4.9.1, Secs. 4.9.9.1,4.9.9.3
ISO Sec. 7.9.1, Secs. 7.9.9.1,7.9.9.3
H&S Sec. 15.5 p. 252
Question 1.4
Question 1.4
What should the 64-bit type on new, 64-bit machines be?
Some vendors of C products for 64-bit machines support 64-bit long ints. Others fear that too much
existing code is written to assume that ints and longs are the same size, or that one or the other of
them is exactly 32 bits, and introduce a new, nonstandard, 64-bit long long (or __longlong) type
instead.
Programmers interested in writing portable code should therefore insulate their 64-bit type needs behind
appropriate typedefs. Vendors who feel compelled to introduce a new, longer integral type should
advertise it as being ``at least 64 bits'' (which is truly new, a type traditional C does not have), and not
``exactly 64 bits.''
References: ANSI Sec. F.5.6
ISO Sec. G.5.6
Question 1.1
Question 1.1
How do you decide which integer type to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very
important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If welldefined overflow characteristics are important and negative values are not, or if you want to steer clear of
sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned
types. (Beware when mixing signed and unsigned values in expressions, though.)
Although character types (especially unsigned char) can be used as ``tiny'' integers, doing so is
sometimes more trouble than it's worth, due to unpredictable sign extension and increased code size.
(Using unsigned char can help; see question 12.1 for a related problem.)
A similar space/time tradeoff applies when deciding between float and double. None of the above
rules apply if the address of a variable is taken and must have a particular type.
If for some reason you need to declare something with an exact size (usually the only good reason for
doing so is when attempting to conform to some externally-imposed storage layout, but see question
20.5), be sure to encapsulate the choice behind an appropriate typedef.
References: K&R1 Sec. 2.2 p. 34
K&R2 Sec. 2.2 p. 36, Sec. A4.2 pp. 195-6, Sec. B11 p. 257
ANSI Sec. 2.2.4.2.1, Sec. 3.1.2.5
ISO Sec. 5.2.4.2.1, Sec. 6.1.2.5
H&S Secs. 5.1,5.2 pp. 110-114
Question 12.1
Question 12.1
What's wrong with this code?
char c;
while((c = getchar()) != EOF) ...
For one thing, the variable to hold getchar's return value must be an int. getchar can return all
possible character values, as well as EOF. By passing getchar's return value through a char, either a
normal character might be misinterpreted as EOF, or the EOF might be altered (particularly if type char
is unsigned) and so never seen.
References: K&R1 Sec. 1.5 p. 14
K&R2 Sec. 1.5.1 p. 16
ANSI Sec. 3.1.2.5, Sec. 4.9.1, Sec. 4.9.7.5
ISO Sec. 6.1.2.5, Sec. 7.9.1, Sec. 7.9.7.5
H&S Sec. 5.1.3 p. 116, Sec. 15.1, Sec. 15.6
CT&P Sec. 5.1 p. 70
PCS Sec. 11 p. 157
Question 11.35
Question 11.35
People keep saying that the behavior of i = i++ is undefined, but I just tried it on an ANSIconforming compiler, and got the results I expected.
A compiler may do anything it likes when faced with undefined behavior (and, within limits, with
implementation-defined and unspecified behavior), including doing what you expect. It's unwise to
depend on it, though. See also questions 11.32, 11.33, and 11.34.
Question 11.32
Question 11.32
Why won't the Frobozz Magic C Compiler, which claims to be ANSI compliant, accept this code? I
know that the code is ANSI, because gcc accepts it.
Many compilers support a few non-Standard extensions, gcc more so than most. Are you sure that the
code being rejected doesn't rely on such an extension? It is usually a bad idea to perform experiments
with a particular compiler to determine properties of a language; the applicable standard may permit
variations, or the compiler may be wrong. See also question 11.35.
Question 11.31
Question 11.31
Does anyone have a tool for converting old-style C programs to ANSI C, or vice versa, or for
automatically generating prototypes?
Two programs, protoize and unprotoize, convert back and forth between prototyped and ``old style''
function definitions and declarations. (These programs do not handle full-blown translation between
``Classic'' C and ANSI C.) These programs are part of the FSF's GNU C compiler distribution; see
question 18.3.
The unproto program (/pub/unix/unproto5.shar.Z on ftp.win.tue.nl) is a filter which sits between the
preprocessor and the next compiler pass, converting most of ANSI C to traditional C on-the-fly.
The GNU GhostScript package comes with a little program called ansi2knr.
Before converting ANSI C back to old-style, beware that such a conversion cannot always be made both
safely and automatically. ANSI C introduces new features and complexities not found in K&R C. You'll
especially need to be careful of prototyped function calls; you'll probably need to insert explicit casts.
See also questions 11.3 and 11.29.
Several prototype generators exist, many as modifications to lint. A program called CPROTO was
posted to comp.sources.misc in March, 1992. There is another program called ``cextract.'' Many vendors
supply simple utilities like these with their compilers. See also question 18.16. (But be careful when
generating prototypes for old functions with ``narrow'' parameters; see question 11.3.)
Finally, are you sure you really need to convert lots of old code to ANSI C? The old-style function
syntax is still acceptable, and a hasty conversion can easily introduce bugs. (See question 11.3.)
Question 18.3
Question 18.3
What's a free or cheap C compiler I can use?
A popular and high-quality free C compiler is the FSF's GNU C compiler, or gcc. It is available by
anonymous ftp from prep.ai.mit.edu in directory pub/gnu, or at several other FSF archive sites. An MSDOS port, djgpp, is also available; it can be found in the Simtel and Oakland archives and probably many
others, usually in a directory like pub/msdos/djgpp/ or simtel/msdos/djgpp/.
There is a shareware compiler called PCC, available as PCC12C.ZIP .
A very inexpensive MS-DOS compiler is Power C from Mix Software, 1132 Commerce Drive,
Richardson, TX 75801, USA, 214-783-6001.
Another recently-developed compiler is lcc, available for anonymous ftp from ftp.cs.princeton.edu in
pub/lcc.
Archives associated with comp.compilers contain a great deal of information about available compilers,
interpreters, grammars, etc. (for many languages). The comp.compilers archives (including an FAQ list),
maintained by the moderator, John R. Levine, are at iecc.com . A list of available compilers and related
resources, maintained by Mark Hopkins, Steven Robenalt, and David Muir Sharnoff, is at ftp.idiom.com
in pub/compilers-list/. (See also the comp.compilers directory in the news.answers archives at
rtfm.mit.edu and ftp.uu.net; see question 20.40.)
See also question 18.16.
Question 18.2
Question 18.2
How can I track down these pesky malloc problems?
A number of debugging packages exist to help track down malloc problems; one popular one is Conor
P. Cahill's ``dbmalloc,'' posted to comp.sources.misc in 1992, volume 32. Others are ``leak,'' available in
volume 27 of the comp.sources.unix archives; JMalloc.c and JMalloc.h in the ``Snippets'' collection; and
MEMDEBUG from ftp.crpht.lu in pub/sources/memdebug . See also question 18.16.
A number of commercial debugging tools exist, and can be invaluable in tracking down malloc-related
and other stubborn problems:
Bounds-Checker for DOS, from Nu-Mega Technologies, P.O. Box 7780, Nashua, NH 030607780, USA, 603-889-2386.
CodeCenter (formerly Saber-C) from Centerline Software (formerly Saber), 10 Fawcett Street,
Cambridge, MA 02138-1110, USA, 617-498-3000.
Insight, from ParaSoft Corporation, 2500 E. Foothill Blvd., Pasadena, CA 91107, USA, 818-7929941, insight@parasoft.com .
Purify, from Pure Software, 1309 S. Mary Ave., Sunnyvale, CA 94087, USA, 800-224-7873, infohome@pure.com .
SENTINEL, from AIB Software, 46030 Manekin Plaza, Dulles, VA 20166, USA, 703-430-9247,
800-296-3000, info@aib.com .
Question 18.1
Question 18.1
I need some C development tools.
Question 18.1
and comp.software-eng .
See also questions 18.16 and 18.3.
Question 10.18
Question 10.18
I inherited some code which contains far too many #ifdef's for my taste. How can I preprocess the
code to leave only one conditional compilation set, without running it through the preprocessor and
expanding all of the #include's and #define's as well?
There are programs floating around called unifdef, rmifdef, and scpp (``selective C
preprocessor'') which do exactly this. See question 18.16.
Question 10.16
Question 10.16
How can I use a preprocessor #if expression to tell if a machine is big-endian or little-endian?
You probably can't. (Preprocessor arithmetic uses only long integers, and there is no concept of
addressing. ) Are you sure you need to know the machine's endianness explicitly? Usually it's better to
write code which doesn't care ). See also question 20.9.
References: ANSI Sec. 3.8.1
ISO Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 20.9
Question 20.9
How can I determine whether a machine's byte order is big-endian or little-endian?
Question 20.8
Question 20.8
How can I implement sets or arrays of bits?
Use arrays of char or int, with a few macros to access the desired bit at the proper index. Here are
some simple macros to use with arrays of char:
#include <limits.h>
#define
#define
#define
#define
/* for CHAR_BIT */
Question 20.6
Question 20.6
If I have a char * variable pointing to the name of a function, how can I call that function?
The most straightforward thing to do is to maintain a correspondence table of names and function
pointers:
int func(), anotherfunc();
struct { char *name; int (*funcptr)(); } symtab[] = {
"func",
func,
"anotherfunc", anotherfunc,
};
Then, search the table for the name, and call via the associated function pointer. See also questions 2.15
and 19.36.
References: PCS Sec. 11 p. 168
Question 2.15
Question 2.15
How can I access structure fields by name at run time?
Build a table of names and offsets, using the offsetof() macro. The offset of field b in struct a
is
offsetb = offsetof(struct a, b)
If structp is a pointer to an instance of this structure, and field b is an int (with offset as computed
above), b's value can be set indirectly with
*(int *)((char *)structp + offsetb) = value;
Question 2.14
Question 2.14
How can I determine the byte offset of a field within a structure?
ANSI C defines the offsetof() macro, which should be used if available; see <stddef.h>. If you
don't have it, one possible implementation is
#define offsetof(type, mem) ((size_t) \
((char *)&((type *)0)->mem - (char *)(type *)0))
This implementation is not 100% portable; some compilers may legitimately refuse to accept it.
See question 2.15 for a usage hint.
References: ANSI Sec. 4.1.5
ISO Sec. 7.1.6
Rationale Sec. 3.5.4.2
H&S Sec. 11.1 pp. 292-3
Question 2.13
Question 2.13
Why does sizeof report a larger size than I expect for a structure type, as if there were padding at the
end?
Structures may have this padding (as well as internal padding), if necessary, to ensure that alignment
properties will be preserved when an array of contiguous structures is allocated. Even when the structure
is not part of an array, the end padding remains, so that sizeof can always return a consistent size. See
question 2.12.
References: H&S Sec. 5.6.7 pp. 139-40
``I think it's safe to say that no person today can hope to achieve basic life competence
without consulting my work on a regular basis.''
--- Cecil Adams
A major revision and expansion of the comp.lang.c FAQ list was published in late 1995 by AddisonWesley, to wit:
Author: Steve Summit
Title: C Programming FAQs: Frequently Asked Questions
Publisher: Addison-Wesley
Copyright: 1996
ISBN: 0-201-84519-9
Most technical bookstores, as well as several of the large chains, are carrying this book. If you can't find
it, you can order it (in the U.S., at least) direct from Addison-Wesley by calling 800-282-0693. I'm sure
you could also order it on-line from Amazon.com or other on-line booksellers.
Addison-Wesley has web pages describing this book as well as many others. You can also browse the
content corresponding to the on-line version of the FAQ list (but note that this corresponds to only about
half of the actual book!).
As the Preface mentions, ``it can be as hard to eradicate the last error from a large manuscript as it is to
stamp out the last bug in a program.'' If you've already obtained a copy of the book, you'll want to skim
this errata list. (
Updated with C99 changes.)
I'd like to publicly thank Addison-Wesley for their support of the FAQ list, for giving me the opportunity
to put it out in book form, and for being very easy to work with. All first-time authors should have it so
http://www.eskimo.com/~scs/C-faq/book/ (1 of 2) [22/07/2003 5:22:15 PM]
lucky.
C Programming Notes
C Programming Notes
Introductory C Programming Class Notes, Chapter 1
Steve Summit
These notes are part of the UW Experimental College course on Introductory C Programming. They are
based on notes prepared (beginning in Spring, 1995) to supplement the book The C Programming
Language, by Brian Kernighan and Dennis Ritchie, or K&R as the book and its authors are affectionately
known. (The second edition was published in 1988 by Prentice-Hall, ISBN 0-13-110362-8.) These notes
are now (as of Winter, 1995-6) intended to be stand-alone, although the sections are still cross-referenced
to those of K&R, for the reader who wants to pursue a more in-depth exposition.
Chapter 1: Introduction
Chapter 2: Basic Data Types and Operators
Chapter 3: Statements and Control Flow
Chapter 4: More about Declarations (and Initialization)
Chapter 5: Functions and Program Structure
Chapter 6: Basic I/O
Chapter 7: More Operators
Chapter 8: Strings
Chapter 9: The C Preprocessor
Chapter 10: Pointers
Chapter 11: Memory Allocation
Chapter 12: Input and Output
http://www.eskimo.com/~scs/cclass/notes/top.html (1 of 2) [22/07/2003 5:22:17 PM]
C Programming Notes
Read Sequentially
Chapter 1: Introduction
Chapter 1: Introduction
C is (as K&R admit) a relatively small language, but one which (to its admirers, anyway) wears well. C's
small, unambitious feature set is a real advantage: there's less to learn; there isn't excess baggage in the
way when you don't need it. It can also be a disadvantage: since it doesn't do everything for you, there's a
lot you have to do yourself. (Actually, this is viewed by many as an additional advantage: anything the
language doesn't do for you, it doesn't dictate to you, either, so you're free to do that something however
you want.)
C is sometimes referred to as a ``high-level assembly language.'' Some people think that's an insult, but
it's actually a deliberate and significant aspect of the language. If you have programmed in assembly
language, you'll probably find C very natural and comfortable (although if you continue to focus too
heavily on machine-level details, you'll probably end up with unnecessarily nonportable programs). If
you haven't programmed in assembly language, you may be frustrated by C's lack of certain higher-level
features. In either case, you should understand why C was designed this way: so that seemingly-simple
constructions expressed in C would not expand to arbitrarily expensive (in time or space) machine
language constructions when compiled. If you write a C program simply and succinctly, it is likely to
result in a succinct, efficient machine language executable. If you find that the executable program
resulting from a C program is not efficient, it's probably because of something silly you did, not because
of something the compiler did behind your back which you have no control over. In any case, there's no
point in complaining about C's low-level flavor: C is what it is.
A programming language is a tool, and no tool can perform every task unaided. If you're building a
house, and I'm teaching you how to use a hammer, and you ask how to assemble rafters and trusses into
gables, that's a legitimate question, but the answer has fallen out of the realm of ``How do I use a
hammer?'' and into ``How do I build a house?''. In the same way, we'll see that C does not have built-in
features to perform every function that we might ever need to do while programming.
As mentioned above, C imposes relatively few built-in ways of doing things on the programmer. Some
common tasks, such as manipulating strings, allocating memory, and doing input/output (I/O), are
performed by calling on library functions. Other tasks which you might want to do, such as creating or
listing directories, or interacting with a mouse, or displaying windows or other user-interface elements,
or doing color graphics, are not defined by the C language at all. You can do these things from a C
program, of course, but you will be calling on services which are peculiar to your programming
environment (compiler, processor, and operating system) and which are not defined by the C standard.
Since this course is about portable C programming, it will also be steering clear of facilities not provided
in all C environments.
Another aspect of C that's worth mentioning here is that it is, to put it bluntly, a bit dangerous. C does
not, in general, try hard to protect a programmer from mistakes. If you write a piece of code which will
(through some oversight of yours) do something wildly different from what you intended it to do, up to
Chapter 1: Introduction
and including deleting your data or trashing your disk, and if it is possible for the compiler to compile it,
it generally will. You won't get warnings of the form ``Do you really mean to...?'' or ``Are you sure you
really want to...?''. C is often compared to a sharp knife: it can do a surgically precise job on some
exacting task you have in mind, but it can also do a surgically precise job of cutting off your finger. It's
up to you to use it carefully.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that C is not a good
teaching language. C aficionados love this aspect of C because it means that C does not try to protect
them from themselves: when they know what they're doing, even if it's risky or obscure, they can do it.
Students of C hate this aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.
This is another aspect of the language which it's fairly pointless to complain about. If you take care and
pay attention, you can avoid many of the pitfalls. These notes will point out many of the obvious (and not
so obvious) trouble spots.
1.1 A First Example
1.2 Second Example
1.3 Program Structure
If you have a C compiler, the first thing to do is figure out how to type this program in and compile it and
run it and see where its output went. (If you don't have a C compiler yet, the first thing to do is to find
one.)
The first line is practically boilerplate; it will appear in almost all programs we write. It asks that some
definitions having to do with the ``Standard I/O Library'' be included in our program; these definitions
are needed if we are to call the library function printf correctly.
The second line says that we are defining a function named main. Most of the time, we can name our
functions anything we want, but the function name main is special: it is the function that will be
``called'' first when our program starts running. The empty pair of parentheses indicates that our main
function accepts no arguments, that is, there isn't any information which needs to be passed in when the
function is called.
The braces { and } surround a list of statements in C. Here, they surround the list of statements making
up the function main.
The line
printf("Hello, world!\n");
is the first statement in the program. It asks that the function printf be called; printf is a library
function which prints formatted output. The parentheses surround printf's argument list: the
information which is handed to it which it should act on. The semicolon at the end of the line terminates
the statement.
(printf's name reflects the fact that C was first developed when Teletypes and other printing terminals
were still in widespread use. Today, of course, video displays are far more common. printf's ``prints''
to the standard output, that is, to the default location for program output to go. Nowadays, that's almost
always a video screen or a window on that screen. If you do have a printer, you'll typically have to do
something extra to get a program to print to it.)
printf's first (and, in this case, only) argument is the string which it should print. The string, enclosed
in double quotes "", consists of the words ``Hello, world!'' followed by a special sequence: \n. In
strings, any two-character sequence beginning with the backslash \ represents a single special character.
The sequence \n represents the ``new line'' character, which prints a carriage return or line feed or
whatever it takes to end one line of output and move down to the next. (This program only prints one line
of output, but it's still important to terminate it.)
The second line in the main function is
return 0;
In general, a function may return a value to its caller, and main is no exception. When main returns
(that is, reaches its end and stops functioning), the program is at its end, and the return value from main
tells the operating system (or whatever invoked the program that main is the main function of) whether
it succeeded or not. By convention, a return value of 0 indicates success.
This program may look so absolutely trivial that it seems as if it's not even worth typing it in and trying
to run it, but doing so may be a big (and is certainly a vital) first hurdle. On an unfamiliar computer, it
can be arbitrarily difficult to figure out how to enter a text file containing program source, or how to
compile and link it, or how to invoke it, or what happened after (if?) it ran. The most experienced C
programmers immediately go back to this one, simple program whenever they're trying out a new system
or a new way of entering or building programs or a new way of printing output from within programs. As
Kernighan and Ritchie say, everything else is comparatively easy.
How you compile and run this (or any) program is a function of the compiler and operating system you're
using. The first step is to type it in, exactly as shown; this may involve using a text editor to create a file
containing the program text. You'll have to give the file a name, and all C compilers (that I've ever heard
of) require that files containing C source end with the extension .c. So you might place the program text
in a file called hello.c.
The second step is to compile the program. (Strictly speaking, compilation consists of two steps,
compilation proper followed by linking, but we can overlook this distinction at first, especially because
the compiler often takes care of initiating the linking step automatically.) On many Unix systems, the
command to compile a C program from a source file hello.c is
cc -o hello hello.c
You would type this command at the Unix shell prompt, and it requests that the cc (C compiler) program
be run, placing its output (i.e. the new executable program it creates) in the file hello, and taking its
input (i.e. the source code to be compiled) from the file hello.c.
The third step is to run (execute, invoke) the newly-built hello program. Again on a Unix system, this
is done simply by typing the program's name:
hello
Depending on how your system is set up (in particular, on whether the current directory is searched for
executables, based on the PATH variable), you may have to type
./hello
to indicate that the hello program is in the current directory (as opposed to some ``bin'' directory full
of executable programs, elsewhere).
You may also have your choice of C compilers. On many Unix machines, the cc command is an older
compiler which does not recognize modern, ANSI Standard C syntax. An old compiler will accept the
simple programs we'll be starting with, but it will not accept most of our later programs. If you find
yourself getting baffling compilation errors on programs which you've typed in exactly as they're shown,
it probably indicates that you're using an older compiler. On many machines, another compiler called
acc or gcc is available, and you'll want to use it, instead. (Both acc and gcc are typically invoked the
same as cc; that is, the above cc command would instead be typed, say, gcc -o hello hello.c .)
(One final caveat about Unix systems: don't name your test programs test, because there's already a
standard command called test, and you and the command interpreter will get badly confused if you try
to replace the system's test command with your own, not least because your own almost certainly does
something completely different.)
Under MS-DOS, the compilation procedure is quite similar. The name of the command you type will
depend on your compiler (e.g. cl for the Microsoft C compiler, tc or bcc for Borland's Turbo C, etc.).
You may have to manually perform the second, linking step, perhaps with a command named link or
tlink. The executable file which the compiler/linker creates will have a name ending in .exe (or
perhaps .com), but you can still invoke it by typing the base name (e.g. hello). See your compiler
documentation for complete details; one of the manuals should contain a demonstration of how to enter,
compile, and run a small program that prints some simple output, just as we're trying to describe here.
In an integrated or ``visual'' progamming environment, such as those on the Macintosh or under various
versions of Microsoft Windows, the steps you take to enter, compile, and run a program are somewhat
different (and, theoretically, simpler). Typically, there is a way to open a new source window, type
source code into it, give it a file name, and add it to the program (or ``project'') you're building. If
necessary, there will be a way to specify what other source files (or ``modules'') make up the program.
Then, there's a button or menu selection which compiles and runs the program, all from within the
programming environment. (There will also be a way to create a standalone executable file which you
can run from outside the environment.) In a PC-compatible environment, you may have to choose
between creating DOS programs or Windows programs. (If you have troubles pertaining to the printf
function, try specifying a target environment of MS-DOS. Supposedly, some compilers which are
targeted at Windows environments won't let you call printf, because until you call some fancier
functions to request that a window be created, there's no window for printf to print to.) Again, check
the introductory or tutorial manual that came with the programming package; it should walk you through
the steps necessary to get your first program running.
The keyword for indicates that we are setting up a ``for loop.'' A for loop is controlled by three
expressions, enclosed in parentheses and separated by semicolons. These expressions say that, in this
case, the loop starts by setting i to 0, that it continues as long as i is less than 10, and that after each
iteration of the loop, i should be incremented by 1 (that is, have 1 added to its value).
Finally, we have a call to the printf function, as before, but with several differences. First, the call to
printf is within the body of the for loop. This means that control flow does not pass once through the
printf call, but instead that the call is performed as many times as are dictated by the for loop. In this
case, printf will be called several times: once when i is 0, once when i is 1, once when i is 2, and so
on until i is 9, for a total of 10 times.
A second difference in the printf call is that the string to be printed, "i is %d", contains a percent
sign. Whenever printf sees a percent sign, it indicates that printf is not supposed to print the exact
text of the string, but is instead supposed to read another one of its arguments to decide what to print. The
letter after the percent sign tells it what type of argument to expect and how to print it. In this case, the
letter d indicates that printf is to expect an int, and to print it in decimal. Finally, we see that
printf is in fact being called with another argument, for a total of two, separated by commas. The
second argument is the variable i, which is in fact an int, as required by %d. The effect of all of this is
that each time it is called, printf will print a line containing the current value of the variable i:
i is 0
i is 1
i is 2
...
After several trips through the loop, i will eventually equal 9. After that trip through the loop, the third
control expression i = i + 1 will increment its value to 10. The condition i < 10 is no longer true,
so no more trips through the loop are taken. Instead, control flow jumps down to the statement following
the for loop, which is the return statement. The main function returns, and the program is finished.
incorrectly or in the face of inadvertent mistakes. The compiler will decide what ``the body of the loop''
is based on its own rules, not the indentation, so if the indentation does not match the compiler's
interpretation, confusion is inevitable.)
To drive home the point that the compiler doesn't care about indentation, line breaks, or other
whitespace, here are a few (extreme) examples: The fragments
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for(i = 0; i < 10; i = i + 1) printf("%d\n", i);
and
for(i=0;i<10;i=i+1)printf("%d\n",i);
and
for(i = 0; i < 10; i = i + 1)
printf("%d\n", i);
and
for
=
i
;
i
)
"%d\n"
)
(
0
<
i
+
printf
,
;
i
;
10
=
1
(
i
and
for
(i=0;
i<10;i=
i+1)printf
("%d\n", i);
2.1 Types
2.1 Types
[This section corresponds to K&R Sec. 2.2]
There are only a few basic data types in C. The first ones we'll be encountering and using are:
char a character
int an integer, in the range -32,767 to 32,767
long int a larger integer (up to +-2,147,483,647)
float a floating-point number
double a floating-point number, with more precision and perhaps greater range than float
If you can look at this list of basic types and say to yourself, ``Oh, how simple, there are only a few
types, I won't have to worry much about choosing among them,'' you'll have an easy time with
declarations. (Some masochists wish that the type system were more complicated so that they could
specify more things about each variable, but those of us who would rather not have to specify these extra
things each time are glad that we don't have to.)
The ranges listed above for types int and long int are the guaranteed minimum ranges. On some
systems, either of these types (or, indeed, any C type) may be able to hold larger values, but a program
that depends on extended ranges will not be as portable. Some programmers become obsessed with
knowing exactly what the sizes of data objects will be in various situations, and go on to write programs
which depend on these exact sizes. Determining or controlling the size of an object is occasionally
important, but most of the time we can sidestep size issues and let the compiler do most of the worrying.
(From the ranges listed above, we can determine that type int must be at least 16 bits, and that type
long int must be at least 32 bits. But neither of these sizes is exact; many systens have 32-bit ints,
and some systems have 64-bit long ints.)
You might wonder how the computer stores characters. The answer involves a character set, which is
simply a mapping between some set of characters and some set of small numeric codes. Most machines
today use the ASCII character set, in which the letter A is represented by the code 65, the ampersand & is
represented by the code 38, the digit 1 is represented by the code 49, the space character is represented
by the code 32, etc. (Most of the time, of course, you have no need to know or even worry about these
particular code values; they're automatically translated into the right shapes on the screen or printer when
characters are printed out, and they're automatically generated when you type characters on the keyboard.
Eventually, though, we'll appreciate, and even take some control over, exactly when these translations-from characters to their numeric codes--are performed.) Character codes are usually small--the largest
code value in ASCII is 126, which is the ~ (tilde or circumflex) character. Characters usually fit in a byte,
which is usually 8 bits. In C, type char is defined as occupying one byte, so it is usually 8 bits.
2.1 Types
Most of the simple variables in most programs are of types int, long int, or double. Typically,
we'll use int and double for most purposes, and long int any time we need to hold integer values
greater than 32,767. As we'll see, even when we're manipulating individual characters, we'll usually use
an int variable, for reasons to be discussed later. Therefore, we'll rarely use individual variables of type
char; although we'll use plenty of arrays of char.
2.2 Constants
2.2 Constants
[This section corresponds to K&R Sec. 2.3]
A constant is just an immediate, absolute value found in an expression. The simplest constants are
decimal integers, e.g. 0, 1, 2, 123 . Occasionally it is useful to specify constants in base 8 or base 16
(octal or hexadecimal); this is done by prefixing an extra 0 (zero) for octal, or 0x for hexadecimal: the
constants 100, 0144, and 0x64 all represent the same number. (If you're not using these non-decimal
constants, just remember not to use any leading zeroes. If you accidentally write 0123 intending to get
one hundred and twenty three, you'll get 83 instead, which is 123 base 8.)
We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler's. The
compiler doesn't care; it always converts everything into binary internally, anyway. (There is, however,
no good way to specify constants in source code in binary.)
A constant can be forced to be of type long int by suffixing it with the letter L (in upper or lower
case, although upper case is strongly recommended, because a lower case l looks too much like the digit
1).
A constant that contains a decimal point or the letter e (or both) is a floating-point constant: 3.14, 10.,
.01, 123e4, 123.456e7 . The e indicates multiplication by a power of 10; 123.456e7 is 123.456
times 10 to the 7th, or 1,234,560,000. (Floating-point constants are of type double by default.)
We also have constants for specifying characters and strings. (Make sure you understand the difference
between a character and a string: a character is exactly one character; a string is a set of zero or more
characters; a string containing one character is distinct from a lone character.) A character constant is
simply a single character between single quotes: 'A', '.', '%'. The numeric value of a character
constant is, naturally enough, that character's value in the machine's character set. (In ASCII, for
example, 'A' has the value 65.)
A string is represented in C as a sequence or array of characters. (We'll have more to say about arrays in
general, and strings in particular, later.) A string constant is a sequence of zero or more characters
enclosed in double quotes: "apple", "hello, world", "this is a test".
Within character and string constants, the backslash character \ is special, and is used to represent
characters not easily typed on the keyboard or for various reasons not easily typed in constants. The most
common of these ``character escapes'' are:
\n
\b
a ``newline'' character
a backspace
2.2 Constants
\r
\'
\"
\\
a
a
a
a
For example, "he said \"hi\"" is a string constant which contains two double quotes, and '\'' is
a character constant consisting of a (single) single quote. Notice once again that the character constant
'A' is very different from the string constant "A".
2.3 Declarations
2.3 Declarations
[This section corresponds to K&R Sec. 2.4]
Informally, a variable (also called an object) is a place you can store a value. So that you can refer to it
unambiguously, a variable needs a name. You can think of the variables in your program as a set of
boxes or cubbyholes, each with a label giving its name; you might imagine that storing a value ``in'' a
variable consists of writing the value on a slip of paper and placing it in the cubbyhole.
A declaration tells the compiler the name and type of a variable you'll be using in your program. In its
simplest form, a declaration consists of the type, the name of the variable, and a terminating semicolon:
char c;
int i;
float f;
You can also declare several variables of the same type in one declaration, separating them with
commas:
int i1, i2;
Later we'll see that declarations may also contain initializers, qualifiers and storage classes, and that we
can declare arrays, functions, pointers, and other kinds of data structures.
The placement of declarations is significant. You can't place them just anywhere (i.e. they cannot be
interspersed with the other statements in your program). They must either be placed at the beginning of a
function, or at the beginning of a brace-enclosed block of statements (which we'll learn about in the next
chapter), or outside of any function. Furthermore, the placement of a declaration, as well as its storage
class, controls several things about its visibility and lifetime, as we'll see later.
You may wonder why variables must be declared before use. There are two reasons:
1. It makes things somewhat easier on the compiler; it knows right away what kind of storage to
allocate and what code to emit to store and manipulate each variable; it doesn't have to try to intuit
the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly;
you must think about them enough to pick appropriate types for them. (The compiler's error
messages to you, telling you that you apparently forgot to declare a variable, are as often helpful
as they are a nuisance: they're helpful when they tell you that you misspelled a variable, or forgot
to think about exactly how you were going to use it.)
2.3 Declarations
Although there are a few places where declarations can be omitted (in which case the compiler will
assume an implicit declaration), making use of these removes the advantages of reason 2 above, so I
recommend always declaring everything explicitly.
Most of the time, I recommend writing one declaration per line. For the most part, the compiler doesn't
care what order declarations are in. You can order the declarations alphabetically, or in the order that
they're used, or to put related declarations next to each other. Collecting all variables of the same type
together on one line essentially orders declarations by type, which isn't a very useful order (it's only
slightly more useful than random order).
A declaration for a variable can also contain an initial value. This initializer consists of an equals sign
and an expression, which is usually a single constant:
int i = 1;
int i1 = 10, i2 = 20;
+
*
/
%
addition
subtraction
multiplication
division
modulus (remainder)
The - operator can be used in two ways: to subtract two numbers (as in a - b), or to negate one
number (as in -a + b or a + -b).
When applied to integers, the division operator / discards any remainder, so 1 / 2 is 0 and 7 / 4 is
1. But when either operand is a floating-point quantity (type float or double), the division operator
yields a floating-point result, with a potentially nonzero fractional part. So 1 / 2.0 is 0.5, and 7.0 /
4.0 is 1.75.
The modulus operator % gives you the remainder when two integers are divided: 1 % 2 is 1; 7 % 4 is
3. (The modulus operator can only be applied to integers.)
An additional arithmetic operation you might be wondering about is exponentiation. Some languages
have an exponentiation operator (typically ^ or **), but C doesn't. (To square or cube a number, just
multiply it by itself.)
Multiplication, division, and modulus all have higher precedence than addition and subtraction. The term
``precedence'' refers to how ``tightly'' operators bind to their operands (that is, to the things they operate
on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7, not 9. In
other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.
All of these operators ``group'' from left to right, which means that when two or more of them have the
same precedence and participate next to each other in an expression, the evaluation conceptually
proceeds from left to right. For example, 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not
+2. (``Grouping'' is sometimes called associativity, although the term is used somewhat differently in
programming than it is in mathematics. Not all C operators group from left to right; a few group from
right to left.)
Whenever the default precedence or associativity doesn't give you the grouping you want, you can
http://www.eskimo.com/~scs/cclass/notes/sx2e.html (1 of 2) [22/07/2003 5:29:07 PM]
always use explicit parentheses. For example, if you wanted to add 1 to 2 and then multiply the result by
3, you could write (1 + 2) * 3.
By the way, the word ``arithmetic'' as used in the title of this section is an adjective, not a noun, and it's
pronounced differently than the noun: the accent is on the third syllable.
and
int a;
/* later... */
a = 10;
0;
i;
or
i + 1;
are syntactically valid statements--they consist of an expression followed by a semicolon--but in each
case, they compute a value without doing anything with it, so the computed value is discarded, and the
statement is useless. But if the ``degenerate'' statements in this paragraph don't make much sense to you,
don't worry; it's because they, frankly, don't make much sense.)
It's also possible for a single expression to have multiple side effects, but it's easy for such an expression
to be (a) confusing or (b) undefined. For now, we'll only be looking at expressions (and, therefore,
statements) which do one well-defined thing at a time.
3.2 if Statements
[This section corresponds to K&R Sec. 3.2]
The simplest way to modify the control flow of a program is with an if statement, which in its simplest
form looks like this:
if(x > max)
max = x;
Even if you didn't know any C, it would probably be pretty obvious that what happens here is that if x is
greater than max, x gets assigned to max. (We'd use code like this to keep track of the maximum value
of x we'd seen--for each new x, we'd compare it to the old maximum value max, and if the new value
was greater, we'd update max.)
More generally, we can say that the syntax of an if statement is:
if( expression )
statement
where expression is any expression and statement is any statement.
What if you have a series of statements, all of which should be executed together or not at all depending
on whether some condition is true? The answer is that you enclose them in braces:
if( expression )
{
statement<sub>1</sub>
statement<sub>2</sub>
statement<sub>3</sub>
}
As a general rule, anywhere the syntax of C calls for a statement, you may write a series of statements
enclosed by braces. (You do not need to, and should not, put a semicolon after the closing brace, because
the series of statements enclosed by braces is not itself a simple expression statement.)
An if statement may also optionally contain a second statement, the ``else clause,'' which is to be
executed if the condition is not met. Here is an example:
if(n > 0)
average = sum / n;
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (1 of 4) [22/07/2003 5:29:24 PM]
else
{
printf("can't compute average\n");
average = 0;
}
The first statement or block of statements is executed if the condition is true, and the second statement or
block of statements (following the keyword else) is executed if the condition is not true. In this
example, we can compute a meaningful average only if n is greater than 0; otherwise, we print a message
saying that we cannot compute the average. The general syntax of an if statement is therefore
if( expression )
statement<sub>1</sub>
else
statement<sub>2</sub>
(where both statement<sub>1</sub> and statement<sub>2</sub> may be lists of statements
enclosed in braces).
It's also possible to nest one if statement inside another. (For that matter, it's in general possible to nest
any kind of statement or control flow construct within another.) For example, here is a little piece of code
which decides roughly which quadrant of the compass you're walking into, based on an x value which is
positive if you're walking east, and a y value which is positive if you're walking north:
if(x > 0)
{
if(y > 0)
printf("Northeast.\n");
else
printf("Southeast.\n");
}
else
{
if(y > 0)
printf("Northwest.\n");
else
printf("Southwest.\n");
}
When you have one if statement (or loop) nested inside another, it's a very good idea to use explicit
braces {}, as shown, to make it clear (both to you and to the compiler) how they're nested and which
else goes with which if. It's also a good idea to indent the various levels, also as shown, to make the
code more readable to humans. Why do both? You use indentation to make the code visually more
readable to yourself and other humans, but the compiler doesn't pay attention to the indentation (since all
whitespace is essentially equivalent and is essentially ignored). Therefore, you also have to make sure
that the punctuation is right.
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (2 of 4) [22/07/2003 5:29:24 PM]
Here is an example of another common arrangement of if and else. Suppose we have a variable
grade containing a student's numeric grade, and we want to print out the corresponding letter grade.
Here is code that would do the job:
if(grade >= 90)
printf("A");
else if(grade >= 80)
printf("B");
else if(grade >= 70)
printf("C");
else if(grade >= 60)
printf("D");
else
printf("F");
What happens here is that exactly one of the five printf calls is executed, depending on which of the
conditions is true. Each condition is tested in turn, and if one is true, the corresponding statement is
executed, and the rest are skipped. If none of the conditions is true, we fall through to the last one,
printing ``F''.
In the cascaded if/else/if/else/... chain, each else clause is another if statement. This may be
more obvious at first if we reformat the example, including every set of braces and indenting each if
statement relative to the previous one:
if(grade >= 90)
{
printf("A");
}
else
{
if(grade >= 80)
{
printf("B");
}
else
{
if(grade >= 70)
{
printf("C");
}
else
{
if(grade >= 60)
{
printf("D");
}
http://www.eskimo.com/~scs/cclass/notes/sx3b.html (3 of 4) [22/07/2003 5:29:24 PM]
else
{
printf("F");
}
}
}
}
By examining the code this way, it should be obvious that exactly one of the printf calls is executed,
and that whenever one of the conditions is found true, the remaining conditions do not need to be
checked and none of the later statements within the chain will be executed. But once you've convinced
yourself of this and learned to recognize the idiom, it's generally preferable to arrange the statements as
in the first example, without trying to indent each successive if statement one tabstop further out.
(Obviously, you'd run into the right margin very quickly if the chain had just a few more cases!)
<
<=
>
>=
==
!=
less than
less than or equal
greater than
greater than or equal
equal
not equal
statement if a is nonzero. But a will have just been assigned the value 0, so the ``true'' branch will never
be taken! (This could drive you crazy while debugging--you wanted to do something if a was 0, and after
the test, a is 0, whether it was supposed to be or not, but the ``true'' branch is nevertheless not taken.)
The relational operators work with arbitrary numbers and generate true/false values. You can also
combine true/false values by using the Boolean operators, which take true/false values as operands and
compute new true/false values. The three Boolean operators are:
&&
||
!
and
or
not (takes one operand; ``unary'')
The && (``and'') operator takes two true/false values and produces a true (1) result if both operands are
true (that is, if the left-hand side is true and the right-hand side is true). The || (``or'') operator takes two
true/false values and produces a true (1) result if either operand is true. The ! (``not'') operator takes a
single true/false value and negates it, turning false to true and true to false (0 to 1 and nonzero to 0).
For example, to test whether the variable i lies between 1 and 10, you might use
if(1 < i && i < 10)
...
Here we're expressing the relation ``i is between 1 and 10'' as ``1 is less than i and i is less than 10.''
It's important to understand why the more obvious expression
if(1 < i < 10)
/* WRONG */
would not work. The expression 1 < i < 10 is parsed by the compiler analogously to 1 + i + 10.
The expression 1 + i + 10 is parsed as (1 + i) + 10 and means ``add 1 to i, and then add the
result to 10.'' Similarly, the expression 1 < i < 10 is parsed as (1 < i) < 10 and means ``see if 1
is less than i, and then see if the result is less than 10.'' But in this case, ``the result'' is 1 or 0, depending
on whether i is greater than 1. Since both 0 and 1 are less than 10, the expression 1 < i < 10 would
always be true in C, regardless of the value of i!
Relational and Boolean expressions are usually used in contexts such as an if statement, where
something is to be done or not done depending on some condition. In these cases what's actually checked
is whether the expression representing the condition has a zero or nonzero value. As long as the
expression is a relational or Boolean expression, the interpretation is just what we want. For example,
http://www.eskimo.com/~scs/cclass/notes/sx3c.html (2 of 4) [22/07/2003 5:29:32 PM]
when we wrote
if(x > max)
the > operator produced a 1 if x was greater than max, and a 0 otherwise. The if statement interprets 0
as false and 1 (or any nonzero value) as true.
But what if the expression is not a relational or Boolean expression? As far as C is concerned, the
controlling expression (of conditional statements like if) can in fact be any expression: it doesn't have to
``look like'' a Boolean expression; it doesn't have to contain relational or logical operators. All C looks at
(when it's evaluating an if statement, or anywhere else where it needs a true/false value) is whether the
expression evaluates to 0 or nonzero. For example, if you have a variable x, and you want to do
something if x is nonzero, it's possible to write
if(x)
statement
and the statement will be executed if x is nonzero (since nonzero means ``true'').
This possibility (that the controlling expression of an if statement doesn't have to ``look like'' a Boolean
expression) is both useful and potentially confusing. It's useful when you have a variable or a function
that is ``conceptually Boolean,'' that is, one that you consider to hold a true or false (actually nonzero or
zero) value. For example, if you have a variable verbose which contains a nonzero value when your
program should run in verbose mode and zero when it should be quiet, you can write things like
if(verbose)
printf("Starting first pass\n");
and this code is both legal and readable, besides which it does what you want. The standard library
contains a function isupper() which tests whether a character is an upper-case letter, so if c is a
character, you might write
if(isupper(c))
...
Both of these examples (verbose and isupper()) are useful and readable.
However, you will eventually come across code like
if(n)
average = sum / n;
where n is just a number. Here, the programmer wants to compute the average only if n is nonzero
(otherwise, of course, the code would divide by 0), and the code works, because, in the context of the if
statement, the trivial expression n is (as always) interpreted as ``true'' if it is nonzero, and ``false'' if it is
zero.
``Coding shortcuts'' like these can seem cryptic, but they're also quite common, so you'll need to be able
to recognize them even if you don't choose to write them in your own code. Whenever you see code like
if(x)
or
if(f())
where x or f() do not have obvious ``Boolean'' names, you can read them as ``if x is nonzero'' or ``if
f() returns nonzero.''
}
After the loop finishes (when control ``falls out'' of it, due to the condition being false), n will have the
value 0.
You use a while loop when you have a statement or group of statements which may have to be
executed a number of times to complete their task. The controlling expression represents the condition
``the loop is not done'' or ``there's more work to do.'' As long as the expression is true, the body of the
loop is executed; presumably, it makes at least some progress at its task. When the expression becomes
false, the task is done, and the rest of the program (beyond the loop) can proceed. When we think about a
loop in this way, we can seen an additional important property: if the expression evaluates to ``false''
before the very first trip through the loop, we make zero trips through the loop. In other words, if the task
is already done (if there's no work to do) the body of the loop is not executed at all. (It's always a good
idea to think about the ``boundary conditions'' in a piece of code, and to make sure that the code will
work correctly when there is no work to do, or when there is a trivial task to do, such as sorting an array
of one number. Experience has shown that bugs at boundary conditions are quite common.)
...
expr<sub>2</sub>
statement
expr<sub>3</sub>
expr<sub>2</sub>
The first thing executed is expr<sub>1</sub>. expr<sub>3</sub> is evaluated after every trip
through the loop. The last thing executed is always expr<sub>2</sub>, because when
expr<sub>2</sub> evaluates false, the loop exits.
All three expressions of a for loop are optional. If you leave out expr<sub>1</sub>, there simply is
no initialization step, and the variable(s) used with the loop had better have been initialized already. If
you leave out expr<sub>2</sub>, there is no test, and the default for the for loop is that another trip
through the loop should be taken (such that unless you break out of it some other way, the loop runs
forever). If you leave out expr<sub>3</sub>, there is no increment step.
The semicolons separate the three controlling expressions of a for loop. (These semicolons, by the way,
have nothing to do with statement terminators.) If you leave out one or more of the expressions, the
semicolons remain. Therefore, one way of writing a deliberately infinite loop in C is
for(;;)
...
It's useful to compare C's for loop to the equivalent loops in other computer languages you might know.
The C loop
for(i = x; i <= y; i = i + z)
is roughly equivalent to:
for I = X to Y step Z
(BASIC)
do 10 i=x,y,z
(FORTRAN)
for i := x to y
(Pascal)
In C (unlike FORTRAN), if the test condition is false before the first trip through the loop, the loop won't
be traversed at all. In C (unlike Pascal), a loop control variable (in this case, i) is guaranteed to retain its
final value after the loop completes, and it is also legal to modify the control variable within the loop, if
you really want to. (When the loop terminates due to the test condition turning false, the value of the
control variable after the loop will be the first value for which the condition failed, not the last value for
which it succeeded.)
http://www.eskimo.com/~scs/cclass/notes/sx3e.html (2 of 4) [22/07/2003 5:29:36 PM]
It's also worth noting that a for loop can be used in more general ways than the simple, iterative
examples we've seen so far. The ``control variable'' of a for loop does not have to be an integer, and it
does not have to be incremented by an additive increment. It could be ``incremented'' by a multiplicative
factor (1, 2, 4, 8, ...) if that was what you needed, or it could be a floating-point variable, or it could be
another type of variable which we haven't met yet which would step, not over numeric values, but over
the elements of an array or other data structure. Strictly speaking, a for loop doesn't have to have a
``control variable'' at all; the three expressions can be anything, although the loop will make the most
sense if they are related and together form the expected initialize, test, increment sequence.
The powers-of-two example of the previous section does fit this pattern, so we could rewrite it like this:
int x;
for(x = 2; x < 1000; x = x * 2)
printf("%d\n", x);
There is no earth-shaking or fundamental difference between the while and for loops. In fact, given
the general for loop
for(expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
you could usually rewrite it as a while loop, moving the initialize and increment expressions to
statements before and within the loop:
expr<sub>1</sub> ;
while(expr<sub>2</sub>)
{
statement
expr<sub>3</sub> ;
}
Similarly, given the general while loop
while(expr)
statement
you could rewrite it as a for loop:
for(; expr; )
statement
Another contrast between the for and while loops is that although the test expression
(expr<sub>2</sub>) is optional in a for loop, it is required in a while loop. If you leave out the
controlling expression of a while loop, the compiler will complain about a syntax error. (To write a
deliberately infinite while loop, you have to supply an expression which is always nonzero. The most
obvious one would simply be while(1) .)
If it's possible to rewrite a for loop as a while loop and vice versa, why do they both exist? Which one
should you choose? In general, when you choose a for loop, its three expressions should all manipulate
the same variable or data structure, using the initialize, test, increment pattern. If they don't manipulate
the same variable or don't follow that pattern, wedging them into a for loop buys nothing and a while
loop would probably be clearer. (The reason that one loop or the other can be clearer is simply that, when
you see a for loop, you expect to see an idiomatic initialize/test/increment of a single variable, and if the
for loop you're looking at doesn't end up matching that pattern, you've been momentarily misled.)
i if the remainder of i divided by j is 0, so the code uses C's ``remainder'' or ``modulus'' operator % to
make this test. (Remember that i % j gives the remainder when i is divided by j.)
If the program finds a divisor, it uses break to break out of the inner loop, without printing anything.
But if it notices that j has risen higher than the square root of i, without its having found any divisors,
then i must not have any divisors, so i is prime, and its value is printed. (Once we've determined that i
is prime by noticing that j > sqrt(i), there's no need to try the other trial divisors, so we use a
second break statement to break out of the loop in that case, too.)
The simple algorithm and implementation we used here (like many simple prime number algorithms)
does not work for 2, the only even prime number, so the program ``cheats'' and prints out 2 no matter
what, before going on to test the numbers from 3 to 100.
Many improvements to this simple program are of course possible; you might experiment with it. (Did
you notice that the ``test'' expression of the inner loop for(j = 2; j < i; j = j + 1) is in a
sense unnecessary, because the loop always terminates early due to one of the two break statements?)
4.1 Arrays
4.1 Arrays
So far, we've been declaring simple variables: the declaration
int i;
declares a single variable, named i, of type int. It is also possible to declare an array of several
elements. The declaration
int a[10];
declares an array, named a, consisting of ten elements, each of type int. Simply speaking, an array is a
variable that can hold more than one value. You specify which of the several values you're referring to at
any given time by using a numeric subscript. (Arrays in programming are similar to vectors or matrices
in mathematics.) We can represent the array a above with a picture like this:
In C, arrays are zero-based: the ten elements of a 10-element array are numbered from 0 to 9. The
subscript which specifies a single element of an array is simply an integer expression in square brackets.
The first element of the array is a[0], the second element is a[1], etc. You can use these ``array
subscript expressions'' anywhere you can use the name of a simple variable, for example:
a[0] = 10;
a[1] = 20;
a[2] = a[0] + a[1];
Notice that the subscripted array references (i.e. expressions such as a[0] and a[1]) can appear on
either side of the assignment operator.
The subscript does not have to be a constant like 0 or 1; it can be any integral expression. For example,
it's common to loop over all elements of an array:
int i;
for(i = 0; i < 10; i = i + 1)
a[i] = 0;
This loop sets all ten elements of the array a to 0.
Arrays are a real convenience for many problems, but there is not a lot that C will do with them for you
http://www.eskimo.com/~scs/cclass/notes/sx4a.html (1 of 4) [22/07/2003 5:30:50 PM]
4.1 Arrays
automatically. In particular, you can neither set all elements of an array at once nor assign one array to
another; both of the assignments
a = 0;
/* WRONG */
int b[10];
b = a;
/* WRONG */
and
are illegal.
To set all of the elements of an array to some value, you must do so one by one, as in the loop example
above. To copy the contents of one array to another, you must again do so one by one:
int b[10];
for(i = 0; i < 10; i = i + 1)
b[i] = a[i];
Remember that for an array declared
int a[10];
there is no element a[10]; the topmost element is a[9]. This is one reason that zero-based loops are
also common in C. Note that the for loop
for(i = 0; i < 10; i = i + 1)
...
does just what you want in this case: it starts at 0, the number 10 suggests (correctly) that it goes through
10 iterations, but the less-than comparison means that the last trip through the loop has i set to 9. (The
comparison i <= 9 would also work, but it would be less clear and therefore poorer style.)
In the little examples so far, we've always looped over all 10 elements of the sample array a. It's
common, however, to use an array that's bigger than necessarily needed, and to use a second variable to
keep track of how many elements of the array are currently in use. For example, we might have an
integer variable
int na;
4.1 Arrays
Then, when we wanted to do something with a (such as print it out), the loop would run from 0 to na,
not 10 (or whatever a's size was):
for(i = 0; i < na; i = i + 1)
printf("%d\n", a[i]);
Naturally, we would have to ensure ensure that na's value was always less than or equal to the number of
elements actually declared in a.
Arrays are not limited to type int; you can have arrays of char or double or any other type.
Here is a slightly larger example of the use of arrays. Suppose we want to investigate the behavior of
rolling a pair of dice. The total roll can be anywhere from 2 to 12, and we want to count how often each
roll comes up. We will use an array to keep track of the counts: a[2] will count how many times we've
rolled 2, etc.
We'll simulate the roll of a die by calling C's random number generation function, rand(). Each time
you call rand(), it returns a different, pseudo-random integer. The values that rand() returns
typically span a large range, so we'll use C's modulus (or ``remainder'') operator % to produce random
numbers in the range we want. The expression rand() % 6 produces random numbers in the range 0
to 5, and rand() % 6 + 1 produces random numbers in the range 1 to 6.
Here is the program:
#include <stdio.h>
#include <stdlib.h>
main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
< 100; i = i + 1)
rand() % 6 + 1;
rand() % 6 + 1;
+ d2] = a[d1 + d2] + 1;
4.1 Arrays
loops:
for(i = 0; i < 5; i = i + 1)
{
for(j = 0; j < 7; j = j + 1)
printf("%d\t", a2[i][j]);
printf("\n");
}
(The character \t in the printf string is the tab character.)
Just to see more clearly what's going on, we could make the ``row'' and ``column'' subscripts explicit by
printing them, too:
for(j = 0; j < 7; j = j + 1)
printf("\t%d:", j);
printf("\n");
for(i = 0; i < 5; i = i + 1)
{
printf("%d:", i);
for(j = 0; j < 7; j = j + 1)
printf("\t%d", a2[i][j]);
printf("\n");
}
This last fragment would print
0:
1:
2:
3:
4:
0:
0
10
20
30
40
1:
1
11
21
31
41
2:
2
12
22
32
42
3:
3
13
23
33
43
4:
4
14
24
34
44
5:
5
15
25
35
45
6:
6
16
26
36
46
Finally, there's no reason we have to loop over the ``rows'' first and the ``columns'' second; depending on
what we wanted to do, we could interchange the two loops, like this:
for(j = 0; j < 7; j = j + 1)
{
for(i = 0; i < 5; i = i + 1)
printf("%d\t", a2[i][j]);
http://www.eskimo.com/~scs/cclass/notes/sx4ba.html (2 of 3) [22/07/2003 5:30:55 PM]
printf("\n");
}
Notice that i is still the first subscript and it still runs from 0 to 4, and j is still the second subscript and
it still runs from 0 to 6.
duration: they spring into existence when the function is called, and they (and their values) disappear
when the function returns. Global variables, on the other hand, have static duration: they last, and the
values stored in them persist, for as long as the program does. (Of course, the values can in general still
be overwritten, so they don't necessarily persist forever.)
Finally, it is possible to split a function up into several source files, for easier maintenance. When several
source files are combined into one program (we'll be seeing how in the next chapter) the compiler must
have a way of correlating the global variables which might be used to communicate between the several
source files. Furthermore, if a global variable is going to be useful for communication, there must be
exactly one of it: you wouldn't want one function in one source file to store a value in one global variable
named globalvar, and then have another function in another source file read from a different global
variable named globalvar. Therefore, a global variable should have exactly one defining instance, in
one place in one source file. If the same variable is to be used anywhere else (i.e. in some other source
file or files), the variable is declared in those other file(s) with an external declaration, which is not a
defining instance. The external declaration says, ``hey, compiler, here's the name and type of a global
variable I'm going to use, but don't define it here, don't allocate space for it; it's one that's defined
somewhere else, and I'm just referring to it here.'' If you accidentally have two distinct defining instances
for a variable of the same name, the compiler (or the linker) will complain that it is ``multiply defined.''
It is also possible to have a variable which is global in the sense that it is declared outside of any
function, but private to the one source file it's defined in. Such a variable is visible to the functions in that
source file but not to any functions in any other source files, even if they try to issue a matching
declaration.
You get any extra control you might need over visibility and lifetime, and you distinguish between
defining instances and external declarations, by using storage classes. A storage class is an extra
keyword at the beginning of a declaration which modifies the declaration in some way. Generally, the
storage class (if any) is the first word in the declaration, preceding the type name. (Strictly speaking, this
ordering has not traditionally been necessary, and you may see some code with the storage class, type
name, and other parts of a declaration in an unusual order.)
We said that, by default, local variables had automatic duration. To give them static duration (so that,
instead of coming and going as the function is called, they persist for as long as the function does), you
precede their declaration with the static keyword:
static int i;
By default, a declaration of a global variable (especially if it specifies an initial value) is the defining
instance. To make it an external declaration, of a variable which is defined somewhere else, you precede
it with the keyword extern:
extern int j;
http://www.eskimo.com/~scs/cclass/notes/sx4b.html (2 of 3) [22/07/2003 5:30:57 PM]
Finally, to arrange that a global variable is visible only within its containing source file, you precede it
with the static keyword:
static int k;
Notice that the static keyword can do two different things: it adjusts the duration of a local variable
from automatic to static, or it adjusts the visibility of a global variable from truly global to private-to-thefile.
To summarize, we've talked about two different attributes of a variable: visibility and duration. These are
orthogonal, as shown in this table:
visibility
duration
automatic
local
normal local
variables
global
N/A
static
static local
variables
normal global
variables
We can also distinguish between file-scope global variables and truly global variables, based on the
presence or absence of the static keyword.
We can also distinguish between external declarations and defining instances of global variables, based
on the presence or absence of the extern keyword.
4.4 Examples
4.4 Examples
Here is an example demonstrating almost everything we've seen so far:
int globalvar = 1;
extern int anotherglobalvar;
static int privatevar;
f()
{
int localvar;
int localvar2 = 2;
static int persistentvar;
}
Here we have six variables, three declared outside and three declared inside of the function f().
globalvar is a global variable. The declaration we see is its defining instance (it happens also to
include an initial value). globalvar can be used anywhere in this source file, and it could be used in
other source files, too (as long as corresponding external declarations are issued in those other source
files).
anotherglobalvar is a second global variable. It is not defined here; the defining instance for it
(and its initialization) is somewhere else.
privatevar is a ``private'' global variable. It can be used anywhere within this source file, but
functions in other source files cannot access it, even if they try to issue external declarations for it. (If
other source files try to declare a global variable called ``privatevar'', they'll get their own; they
won't be sharing this one.) Since it has static duration and receives no explicit initialization,
privatevar will be initialized to 0.
localvar is a local variable within the function f(). It can be accessed only within the function f().
(If any other part of the program declares a variable named ``localvar'', that variable will be distinct
from the one we're looking at here.) localvar is conceptually ``created'' each time f() is called, and
disappears when f() returns. Any value which was stored in localvar last time f() was running
will be lost and will not be available next time f() is called. Furthermore, since it has no explicit
initializer, the value of localvar will in general be garbage each time f() is called.
localvar2 is also local, and everything that we said about localvar applies to it, except that since
its declaration includes an explicit initializer, it will be initialized to 2 each time f() is called.
4.4 Examples
Finally, persistentvar is again local to f(), but it does maintain its value between calls to f(). It
has static duration but no explicit initializer, so its initial value will be 0.
The defining instances and external declarations we've been looking at so far have all been of simple
variables. There are also defining instances and external declarations of functions, which we'll be looking
at in the next chapter.
(Also, don't worry about static variables for now if they don't make sense to you; they're a relatively
sophisticated concept, which you won't need to use at first.)
The term declaration is a general one which encompasses defining instances and external declarations;
defining instances and external declarations are two different kinds of declarations. Furthermore, either
kind of declaration suffices to inform the compiler of the name and type of a particular variable (or
function). If you have the defining instance of a global variable in a source file, the rest of that source file
can use that variable without having to issue any external declarations. It's only in source files where the
defining instance hasn't been seen that you need external declarations.
You will sometimes hear a defining instance referred to simply as a ``definition,'' and you will sometimes
hear an external declaration referred to simply as a ``declaration.'' These usages are mildly ambiguous, in
that you can't tell out of context whether a ``declaration'' is a generic declaration (that might be a defining
instance or an external declaration) or whether it's an external declaration that specifically is not a
defining instance. (Similarly, there are other constructions that can be called ``definitions'' in C, namely
the definitions of preprocessor macros, structures, and typedefs, none of which we've met.) In these
notes, we'll try to make things clear by using the unambiguous terms defining instance and external
declaration. Elsewhere, you may have to look at the context to determine how the terms ``definition'' and
``declaration'' are being used.
#include <stdio.h>
extern int multbytwo(int);
int main()
{
int i, j;
i = 3;
j = multbytwo(i);
printf("%d\n", j);
return 0;
}
This looks much like our other test programs, with the exception of the new line
extern int multbytwo(int);
This is an external function prototype declaration. It is an external declaration, in that it declares
something which is defined somewhere else. (We've already seen the defining instance of the function
multbytwo, but maybe the compiler hasn't seen it yet.) The function prototype declaration contains the
three pieces of information about the function that a caller needs to know: the function's name, return
type, and argument type(s). Since we don't care what name the multbytwo function will use to refer to
its first argument, we don't need to mention it. (On the other hand, if a function takes several arguments,
giving them names in the prototype may make it easier to remember which is which, so names may
optionally be used in function prototype declarations.) Finally, to remind us that this is an external
declaration and not a defining instance, the prototype is preceded by the keyword extern.
The presence of the function prototype declaration lets the compiler know that we intend to call this
function, multbytwo. The information in the prototype lets the compiler generate the correct code for
calling the function, and also enables the compiler to check up on our code (by making sure, for example,
that we pass the correct number of arguments to each function we call).
Down in the body of main, the action of the function call should be obvious: the line
j = multbytwo(i);
calls multbytwo, passing it the value of i as its argument. When multbytwo returns, the return value
is assigned to the variable j. (Notice that the value of main's local variable i will become the value of
multbytwo's parameter x; this is absolutely not a problem, and is a normal sort of affair.)
This example is written out in ``longhand,'' to make each step equivalent. The variable i isn't really
needed, since we could just as well call
http://www.eskimo.com/~scs/cclass/notes/sx5a.html (2 of 4) [22/07/2003 5:31:07 PM]
j = multbytwo(3);
And the variable j isn't really needed, either, since we could just as well call
printf("%d\n", multbytwo(3));
Here, the call to multbytwo is a subexpression which serves as the second argument to printf. The
value returned by multbytwo is passed immediately to printf. (Here, as in general, we see the
flexibility and generality of expressions in C. An argument passed to a function may be an arbitrarily
complex subexpression, and a function call is itself an expression which may be embedded as a
subexpression within arbitrarily complicated surrounding expressions.)
We should say a little more about the mechanism by which an argument is passed down from a caller
into a function. Formally, C is call by value, which means that a function receives copies of the values of
its arguments. We can illustrate this with an example. Suppose, in our implementation of multbytwo,
we had gotten rid of the unnecessary retval variable like this:
int multbytwo(int x)
{
x = x * 2;
return x;
}
We might wonder, if we wrote it this way, what would happen to the value of the variable i when we
called
j = multbytwo(i);
When our implementation of multbytwo changes the value of x, does that change the value of i up in
the caller? The answer is no. x receives a copy of i's value, so when we change x we don't change i.
However, there is an exception to this rule. When the argument you pass to a function is not a single
variable, but is rather an array, the function does not receive a copy of the array, and it therefore can
modify the array in the caller. The reason is that it might be too expensive to copy the entire array, and
furthermore, it can be useful for the function to write into the caller's array, as a way of handing back
more data than would fit in the function's single return value. We'll see an example of an array argument
(which the function deliberately writes into) in the next chapter.
Some functions may be hard to write (if they have a hard job to do, or if it's hard to make them do it truly
well), but that difficulty should be compartmentalized along with the function itself. Once you've written
a ``hard'' function, you should be able to sit back and relax and watch it do that hard work on call from
the rest of your program. It should be pleasant to notice (in the ideal case) how much easier the rest of the
program is to write, now that the hard work can be deferred to this workhorse function.
(In fact, if a difficult-to-write function's interface is well-defined, you may be able to get away with
writing a quick-and-dirty version of the function first, so that you can begin testing the rest of the
program, and then go back later and rewrite the function to do the hard parts. As long as the function's
original interface anticipated the hard parts, you won't have to rewrite the rest of the program when you
fix the function.)
What I've been trying to say in the preceding few paragraphs is that functions are important for far more
important reasons than just saving typing. Sometimes, we'll write a function which we only call once,
just because breaking it out into a function makes things clearer and easier.
If you find that difficulties pervade a program, that the hard parts can't be buried inside black-box
functions and then forgotten about; if you find that there are hard parts which involve complicated
interactions among multiple functions, then the program probably needs redesigning.
For the purposes of explanation, we've been seeming to talk so far only about ``main programs'' and the
functions they call and the rationale behind moving some piece of code down out of a ``main program''
into a function. But in reality, there's obviously no need to restrict ourselves to a two-tier scheme. Any
function we find ourself writing will often be appropriately written in terms of sub-functions, sub-subfunctions, etc. (Furthermore, the ``main program,'' main(), is itself just a function.)
The preceding has all been about command-line compilers. If you're using some kind of integrated
development environment, such as Borland's Turbo C or the Microsoft Programmer's Workbench or
Visual C or Think C or Codewarrior, most of the mechanical details are taken care of for you. (There's
also less I can say here about these environments, because they're all different.) Typically you define a
``project,'' and there's a way to specify the list of files (modules) which make up your project. The
modules might be source files which you typed in or obtained elsewhere, or they might be source files
which you created within the environment (perhaps by requesting a ``New source file,'' and typing it in).
Typically, the programming environment has a single ``build'' button which does whatever's required to
build (and perhaps even execute) your program. There may also be configuration windows in which you
can specify compiler options (such as whether you'd like it to accept C or C++). ``See your manual for
details.''
6.1 <TT>printf</TT>
6.1 printf
printf's name comes from print formatted. It generates output under the control of a format string (its first
argument) which consists of literal characters to be printed and also special character sequences--format specifiers-which request that other arguments be fetched, formatted, and inserted into the string. Our very first program was
nothing more than a call to printf, printing a constant string:
printf("Hello, world!\n");
Our second program also featured a call to printf:
printf("i is %d\n", i);
In that case, whenever printf ``printed'' the string "i is %d", it did not print it verbatim; it replaced the two
characters %d with the value of the variable i.
There are quite a number of format specifiers for printf. Here are the basic ones :
%d
%ld
%c
%s
%f
%e
%g
%o
%x
%%
It is also possible to specify the width and precision of numbers and strings as they are inserted (somewhat like
FORTRAN format statements); we'll present those details in a later chapter. (Very briefly, for those who are
curious: a notation like %3d means to print an int in a field at least 3 spaces wide; a notation like %5.2f means
to print a float or double in a field at least 5 spaces wide, with two places to the right of the decimal.)
To illustrate with a few more examples: the call
printf("%c %d %f %e %s %d%%\n", '1', 2, 3.14, 56000000., "eight", 9);
would print
1 2 3.140000 5.600000e+07 eight 9%
6.1 <TT>printf</TT>
The call
printf("%d %o %x\n", 100, 100, 100);
would print
100 144 64
Successive calls to printf just build up the output a piece at a time, so the calls
printf("Hello, ");
printf("world!\n");
would also print Hello, world! (on one line of output).
Earlier we learned that C represents characters internally as small integers corresponding to the characters' values
in the machine's character set (typically ASCII). This means that there isn't really much difference between a
character and an integer in C; most of the difference is in whether we choose to interpret an integer as an integer or
a character. printf is one place where we get to make that choice: %d prints an integer value as a string of digits
representing its decimal value, while %c prints the character corresponding to a character set value. So the lines
char c = 'A';
int i = 97;
printf("c = %c, i = %d\n", c, i);
would print c as the character A and i as the number 97. But if, on the other hand, we called
printf("c = %d, i = %c\n", c, i);
we'd see the decimal value (printed by %d) of the character 'A', followed by the character (whatever it is) which
happens to have the decimal value 97.
You have to be careful when calling printf. It has no way of knowing how many arguments you've passed it or
what their types are other than by looking for the format specifiers in the format string. If there are more format
specifiers (that is, more % signs) than there are arguments, or if the arguments have the wrong types for the format
specifiers, printf can misbehave badly, often printing nonsense numbers or (even worse) numbers which
mislead you into thinking that some other part of your program is broken.
Because of some automatic conversion rules which we haven't covered yet, you have a small amount of latitude in
the types of the expressions you pass as arguments to printf. The argument for %c may be of type char or
int, and the argument for %d may be of type char or int. The string argument for %s may be a string constant,
an array of characters, or a pointer to some characters (though we haven't really covered strings or pointers yet).
Finally, the arguments corresponding to %e, %f, and %g may be of types float or double. But other
combinations do not work reliably: %d will not print a long int or a float or a double; %ld will not print
an int; %e, %f, and %g will not print an int.
6.1 <TT>printf</TT>
We said that a char variable could hold integers corresponding to character set values, and that an int
could hold integers of more arbitrary values (up to +-32767). Since most character sets contain a few
hundred characters (nowhere near 32767), an int variable can in general comfortably hold all char
values, and then some. Therefore, there's nothing wrong with declaring c as an int. But in fact, it's
important to do so, because getchar can return every character value, plus that special, non-character
value EOF, indicating that there are no more characters. Type char is only guaranteed to be able to hold
all the character values; it is not guaranteed to be able to hold this ``no more characters'' value without
possibly mixing it up with some actual character value. (It's like trying to cram five pounds of books into
a four-pound box, or 13 eggs into a carton that holds a dozen.) Therefore, you should always remember to
use an int for anything you assign getchar's return value to.
When you run the character copying program, and it begins copying its input (your typing) to its output
(your screen), you may find yourself wondering how to stop it. It stops when it receives end-of-file
(EOF), but how do you send EOF? The answer depends on what kind of computer you're using. On Unix
and Unix-related systems, it's almost always control-D. On MS-DOS machines, it's control-Z followed by
the RETURN key. Under Think C on the Macintosh, it's control-D, just like Unix. On other systems, you
may have to do some research to learn how to send EOF.
(Note, too, that the character you type to generate an end-of-file condition from the keyboard is not the
same as the special EOF value returned by getchar. The EOF value returned by getchar is a code
indicating that the input system has detected an end-of-file condition, whether it's reading the keyboard or
a file or a magnetic tape or a network connection or anything else. In a disk file, at least, there is not likely
to be any character in the file corresponding to EOF; as far as your program is concerned, EOF indicates
the absence of any more characters to read.)
Another excellent thing to know when doing any kind of programming is how to terminate a runaway
program. If a program is running forever waiting for input, you can usually stop it by sending it an end-offile, as above, but if it's running forever not waiting for something, you'll have to take more drastic
measures. Under Unix, control-C (or, occasionally, the DELETE key) will terminate the current program,
almost no matter what. Under MS-DOS, control-C or control-BREAK will sometimes terminate the
current program, but by default MS-DOS only checks for control-C when it's looking for input, so an
infinite loop can be unkillable. There's a DOS command,
break on
which tells DOS to look for control-C more often, and I recommend using this command if you're doing
any programming. (If a program is in a really tight infinite loop under MS-DOS, there can be no way of
killing it short of rebooting.) On the Mac, try command-period or command-option-ESCAPE.
Finally, don't be disappointed (as I was) the first time you run the character copying program. You'll type
a character, and see it on the screen right away, and assume it's your program working, but it's only your
computer echoing every key you type, as it always does. When you hit RETURN, a full line of characters
is made available to your program. It then zips several times through its loop, reading and printing all the
characters in the line in quick succession. In other words, when you run this program, it will probably
seem to copy the input a line at a time, rather than a character at a time. You may wonder how a program
could instead read a character right away, without waiting for the user to hit RETURN. That's an excellent
question, but unfortunately the answer is rather complicated, and beyond the scope of our discussion here.
(Among other things, how to read a character right away is one of the things that's not defined by the C
language, and it's not defined by any of the standard library functions, either. How to do it depends on
which operating system you're using.)
Stylistically, the character-copying program above can be said to have one minor flaw: it contains two
calls to getchar, one which reads the first character and one which reads (by virtue of the fact that it's
in the body of the loop) all the other characters. This seems inelegant and perhaps unnecessary, and it can
also be risky: if there were more things going on within the loop, and if we ever changed the way we read
characters, it would be easy to change one of the getchar calls but forget to change the other one. Is
there a way to rewrite the loop so that there is only one call to getchar, responsible for reading all the
characters? Is there a way to read a character, test it for EOF, and assign it to the variable c, all at the
same time?
There is. It relies on the fact that the assignment operator, =, is just another operator in C. An assignment
is not (necessarily) a standalone statement; it is an expression, and it has a value (the value that's assigned
to the variable on the left-hand side), and it can therefore participate in a larger, surrounding expression.
Therefore, most C programmers would write the character-copying loop like this:
while((c = getchar()) != EOF)
putchar(c);
What does this mean? The function getchar is called, as before, and its return value is assigned to the
variable c. Then the value is immediately compared against the value EOF. Finally, the true/false value of
the comparison controls the while loop: as long as the value is not EOF, the loop continues executing,
but as soon as an EOF is received, no more trips through the loop are taken, and it exits. The net result is
that the call to getchar happens inside the test at the top of the while loop, and doesn't have to be
repeated before the loop and within the loop (more on this in a bit).
Stated another way, the syntax of a while loop is always
while( expression ) ...
A comparison (using the != operator) is of course an expression; the syntax is
expression != expression
/* WRONG */
The main body of the function is a getchar loop, much as we used in the character-copying program. In
the body of this loop, however, we're storing the characters in an array (rather than immediately printing
them out). Also, we're only reading one line of characters, then stopping and returning.
There are several new things to notice here.
First of all, the getline function accepts an array as a parameter. As we've said, array parameters are an
exception to the rule that functions receive copies of their arguments--in the case of arrays, the function
does have access to the actual array passed by the caller, and can modify it. Since the function is
accessing the caller's array, not creating a new one to hold a copy, the function does not have to declare
the argument array's size; it's set by the caller. (Thus, the brackets in ``char line[]'' are empty.)
However, so that we won't overflow the caller's array by reading too long a line into it, we allow the caller
to pass along the size of the array, which we promise not to exceed.
Second, we see an example of the break statement. The top of the loop looks like our earlier charactercopying loop--it stops when it reaches EOF--but we only want this loop to read one line, so we also stop
(that is, break out of the loop) when we see the \n character signifying end-of-line. An equivalent loop,
without the break statement, would be
while((c = getchar()) != EOF && c != '\n')
{
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
We haven't learned about the internal representation of strings yet, but it turns out that strings in C are
simply arrays of characters, which is why we are reading the line into an array of characters. The end of a
string is marked by the special character, '\0'. To make sure that there's always room for that character,
on our way in we subtract 1 from max, the argument that tells us how many characters we may place in
the line array. When we're done reading the line, we store the end-of-string character '\0' at the end
of the string we've just built in the line array.
Finally, there's one subtlety in the code which isn't too important for our purposes now but which you
may wonder about: it's arranged to handle the possibility that a few characters (i.e. the apparent beginning
of a line) are read, followed immediately by an EOF, without the usual \n end-of-line character. (That's
why we return EOF only if we received EOF and we hadn't read any characters first.)
In any case, the function returns the length (number of characters) of the line it read, not including the \n.
(Therefore, it returns 0 for an empty line.) Like getchar, it returns EOF when there are no more lines to
http://www.eskimo.com/~scs/cclass/notes/sx6c.html (2 of 3) [22/07/2003 5:31:31 PM]
read. (It happens that EOF is a negative number, so it will never match the length of a line that getline
has read.)
Here is an example of a test program which calls getline, reading the input a line at a time and then
printing each line back out:
#include <stdio.h>
extern int getline(char [], int);
main()
{
char line[256];
while(getline(line, 256) != EOF)
printf("you typed \"%s\"\n", line);
return 0;
}
The notation char [] in the function prototype for getline says that getline accepts as its first
argument an array of char. When the program calls getline, it is careful to pass along the actual size
of the array. (You might notice a potential problem: since the number 256 appears in two places, if we
ever decide that 256 is too small, and that we want to be able to read longer lines, we could easily change
one of the instances of 256, and forget to change the other one. Later we'll learn ways of solving--that is,
avoiding--this sort of problem.)
1
10
n + 1
/= b
k *= n + 1
is interpreted as
k = k * (n + 1)
add 1 to i
subtract 1 from j
These correspond to the slightly longer i += 1 and j -= 1, respectively, and also to the fully
``longhand'' forms i = i + 1 and j = j - 1.
The ++ and -- operators apply to one operand (they're unary operators). The expression ++i adds 1 to
i, and stores the incremented result back in i. This means that these operators don't just compute new
values; they also modify the value of some variable. (They share this property--modifying some variable-with the assignment operators; we can say that these operators all have side effects. That is, they have
some effect, on the side, other than just computing a new value.)
The incremented (or decremented) result is also made available to the rest of the expression, so an
expression like
k = 2 * ++i
means ``add one to i, store the result back in i, multiply it by 2, and store that result in k.'' (This is a
pretty meaningless expression; our actual uses of ++ later will make more sense.)
Both the ++ and -- operators have an unusual property: they can be used in two ways, depending on
whether they are written to the left or the right of the variable they're operating on. In either case, they
increment or decrement the variable they're operating on; the difference concerns whether it's the old or
the new value that's ``returned'' to the surrounding expression. The prefix form ++i increments i and
returns the incremented value. The postfix form i++ increments i, but returns the prior, non-incremented
value. Rewriting our previous example slightly, the expression
k = 2 * i++
means ``take i's old value and multiply it by 2, increment i, store the result of the multiplication in k.''
The distinction between the prefix and postfix forms of ++ and -- will probably seem strained at first,
http://www.eskimo.com/~scs/cclass/notes/sx7b.html (1 of 4) [22/07/2003 5:31:41 PM]
but it will make more sense once we begin using these operators in more realistic situations.
For example, our getline function of the previous chapter used the statements
line[nch] = c;
nch = nch + 1;
as the body of its inner loop. Using the ++ operator, we could simplify this to
line[nch++] = c;
We wanted to increment nch after deciding which element of the line array to store into, so the postfix
form nch++ is appropriate.
Notice that it only makes sense to apply the ++ and -- operators to variables (or to other ``containers,''
such as a[i]). It would be meaningless to say something like
1++
or
(2+3)++
The ++ operator doesn't just mean ``add one''; it means ``add one to a variable'' or ``make a variable's
value one more than it was before.'' But (1+2) is not a variable, it's an expression; so there's no place for
++ to store the incremented result.
Another unfortunate example is
i = i++;
which some confused programmers sometimes write, presumably because they want to be extra sure that
i is incremented by 1. But i++ all by itself is sufficient to increment i by 1; the extra (explicit)
assignment to i is unnecessary and in fact counterproductive, meaningless, and incorrect. If you want to
increment i (that is, add one to it, and store the result back in i), either use
i = i + 1;
or
i += 1;
or
++i;
or
i++;
Don't try to use some bizarre combination.
Did it matter whether we used ++i or i++ in this last example? Remember, the difference between the
two forms is what value (either the old or the new) is passed on to the surrounding expression. If there is
no surrounding expression, if the ++i or i++ appears all by itself, to increment i and do nothing else,
you can use either form; it makes no difference. (Two ways that an expression can appear ``all by itself,''
with ``no surrounding expression,'' are when it is an expression statement terminated by a semicolon, as
above, or when it is one of the controlling expressions of a for loop.) For example, both the loops
for(i = 0; i < 10; ++i)
printf("%d\n", i);
and
for(i = 0; i < 10; i++)
printf("%d\n", i);
will behave exactly the same way and produce exactly the same results. (In real code, postfix increment is
probably more common, though prefix definitely has its uses, too.)
In the preceding section, we simplified the expression
a[d1 + d2] = a[d1 + d2] + 1;
from a previous chapter down to
a[d1 + d2] += 1;
Using ++, we could simplify it still further to
a[d1 + d2]++;
or
++a[d1 + d2];
(Again, in this case, both are equivalent.)
We'll see more examples of these operators in the next section and in the next chapter.
/* WRONG */
should do? I can't, and here's the important part: neither can the compiler. We know that the definition of
postfix ++ is that the former value, before the increment, is what goes on to participate in the rest of the
expression, but the expression a[i++] = b[i++] contains two ++ operators. Which of them happens
first? Does this expression assign the old ith element of b to the new ith element of a, or vice versa? No
one knows.
When the order of evaluation matters but is not well-defined (that is, when we can't say for sure which
order the compiler will evaluate the various dependent parts in) we say that the meaning of the
expression is undefined, and if we're smart we won't write the expression in the first place. (Why would
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (1 of 4) [22/07/2003 5:31:43 PM]
anyone ever write an ``undefined'' expression? Because sometimes, the compiler happens to evaluate it in
the order a programmer wanted, and the programmer assumes that since it works, it must be okay.)
For example, suppose we carelessly wrote this loop:
int i, a[10];
i = 0;
while(i < 10)
a[i] = i++;
/* WRONG */
It looks like we're trying to set a[0] to 0, a[1] to 1, etc. But what if the increment i++ happens before
the compiler decides which cell of the array a to store the (unincremented) result in? We might end up
setting a[1] to 0, a[2] to 1, etc., instead. Since, in this case, we can't be sure which order things would
happen in, we simply shouldn't write code like this. In this case, what we're doing matches the pattern of
a for loop, anyway, which would be a better choice:
for(i = 0; i < 10; i++)
a[i] = i;
Now that the increment i++ isn't crammed into the same expression that's setting a[i], the code is
perfectly well-defined, and is guaranteed to do what we want.
In general, you should be wary of ever trying to second-guess the order an expression will be evaluated
in, with two exceptions:
1. You can obviously assume that precedence will dictate the order in which binary operators are
applied. This typically says more than just what order things happens in, but also what the
expression actually means. (In other words, the precedence of * over + says more than that the
multiplication ``happens first'' in 1 + 2 * 3; it says that the answer is 7, not 9.)
2. Although we haven't mentioned it yet, it is guaranteed that the logical operators && and || are
evaluated left-to-right, and that the right-hand side is not evaluated at all if the left-hand side
determines the outcome.
To look at one more example, it might seem that the code
int i = 7;
printf("%d\n", i++ * i++);
would have to print 56, because no matter which order the increments happen in, 7*8 is 8*7 is 56. But
++ just says that the increment happens later, not that it happens immediately, so this code could print 49
(if the compiler chose to perform the multiplication first, and both increments later). And, it turns out that
ambiguous expressions like this are such a bad idea that the ANSI C Standard does not require compilers
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (2 of 4) [22/07/2003 5:31:43 PM]
to do anything reasonable with them at all. Theoretically, the above code could end up printing 42, or
8923409342, or 0, or crashing your computer.
Programmers sometimes mistakenly imagine that they can write an expression which tries to do too
much at once and then predict exactly how it will behave based on ``order of evaluation.'' For example,
we know that multiplication has higher precedence than addition, which means that in the expression
i + j * k
j will be multiplied by k, and then i will be added to the result. Informally, we often say that the
multiplication happens ``before'' the addition. That's true in this case, but it doesn't say as much as we
might think about a more complicated expression, such as
i++ + j++ * k++
In this case, besides the addition and multiplication, i, j, and k are all being incremented. We can not
say which of them will be incremented first; it's the compiler's choice. (In particular, it is not necessarily
the case that j++ or k++ will happen first; the compiler might choose to save i's value somewhere and
increment i first, even though it will have to keep the old value around until after it has done the
multiplication.)
In the preceding example, it probably doesn't matter which variable is incremented first. It's not too hard,
though, to write an expression where it does matter. In fact, we've seen one already: the ambiguous
assignment a[i++] = b[i++]. We still don't know which i++ happens first. (We can not assume,
based on the right-to-left behavior of the = operator, that the right-hand i++ will happen first.) But if we
had to know what a[i++] = b[i++] really did, we'd have to know which i++ happened first.
Finally, note that parentheses don't dictate overall evaluation order any more than precedence does.
Parentheses override precedence and say which operands go with which operators, and they therefore
affect the overall meaning of an expression, but they don't say anything about the order of subexpressions
or side effects. We could not ``fix'' the evaluation order of any of the expressions we've been discussing
by adding parentheses. If we wrote
i++ + (j++ * k++)
we still wouldn't know which of the increments would happen first. (The parentheses would force the
multiplication to happen before the addition, but precedence already would have forced that, anyway.) If
we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication or in any well-defined
http://www.eskimo.com/~scs/cclass/notes/sx7c.html (3 of 4) [22/07/2003 5:31:43 PM]
order; this parenthesized version would be just as undefined as i++ * i++ was.
There's a line from Kernighan & Ritchie, which I am fond of quoting when discussing these issues [Sec.
2.12, p. 54]:
The moral is that writing code that depends on order of evaluation is a bad programming
practice in any language. Naturally, it is necessary to know what things to avoid, but if you
don't know how they are done on various machines, you won't be tempted to take
advantage of a particular implementation.
The first edition of K&R said
...if you don't know how they are done on various machines, that innocence may help to
protect you.
I actually prefer the first edition wording. Many textbooks encourage you to write small programs to find
out how your compiler implements some of these ambiguous expressions, but it's just one step from
writing a small program to find out, to writing a real program which makes use of what you've just
learned. But you don't want to write programs that work only under one particular compiler, that take
advantage of the way that one compiler (but perhaps no other) happens to implement the undefined
expressions. It's fine to be curious about what goes on ``under the hood,'' and many of you will be curious
enough about what's going on with these ``forbidden'' expressions that you'll want to investigate them,
but please keep very firmly in mind that, for real programs, the very easiest way of dealing with
ambiguous, undefined expressions (which one compiler interprets one way and another interprets another
way and a third crashes on) is not to write them in the first place.
Chapter 8: Strings
Chapter 8: Strings
Strings in C are represented by arrays of characters. The end of the string is marked with a special
character, the null character, which is simply the character with the value 0. (The null character has no
relation except in name to the null pointer. In the ASCII character set, the null character is named NUL.)
The null or string-terminating character is represented by another character escape sequence, \0. (We've
seen it once already, in the getline function of chapter 6.)
Because C has no built-in facilities for manipulating entire arrays (copying them, comparing them, etc.),
it also has very few built-in facilities for manipulating strings.
In fact, C's only truly built-in string-handling is that it allows us to use string constants (also called string
literals) in our code. Whenever we write a string, enclosed in double quotes, C automatically creates an
array of characters for us, containing that string, terminated by the \0 character. For example, we can
declare and define an array of characters, and initialize it with a string constant:
char string[] = "Hello, world!";
In this case, we can leave out the dimension of the array, since the compiler can compute it for us based
on the size of the initializer (14, including the terminating \0). This is the only case where the compiler
sizes a string array for us, however; in other cases, it will be necessary that we decide how big the arrays
and other data structures we use to hold strings are.
To do anything else with strings, we must typically call functions. The C library contains a few basic
string manipulation functions, and to learn more about strings, we'll be looking at how these functions
might be implemented.
Since C never lets us assign entire arrays, we use the strcpy function to copy one string to another:
#include <string.h>
char string1[] = "Hello, world!";
char string2[20];
strcpy(string2, string1);
The destination string is strcpy's first argument, so that a call to strcpy mimics an assignment
expression (with the destination on the left-hand side). Notice that we had to allocate string2 big
enough to hold the string that would be copied to it. Also, at the top of any source file where we're using
the standard library's string-handling functions (such as strcpy) we must include the line
Chapter 8: Strings
#include <string.h>
which contains external declarations for these functions.
Since C won't let us compare entire arrays, either, we must call a function to do that, too. The standard
library's strcmp function compares two strings, and returns 0 if they are identical, or a negative number
if the first string is alphabetically ``less than'' the second string, or a positive number if the first string is
``greater.'' (Roughly speaking, what it means for one string to be ``less than'' another is that it would
come first in a dictionary or telephone book, although there are a few anomalies.) Here is an example:
char string3[] = "this is";
char string4[] = "a test";
if(strcmp(string3, string4) == 0)
printf("strings are equal\n");
else
printf("strings are different\n");
This code fragment will print ``strings are different''. Notice that strcmp does not return a Boolean,
true/false, zero/nonzero answer, so it's not a good idea to write something like
if(strcmp(string3, string4))
...
because it will behave backwards from what you might reasonably expect. (Nevertheless, if you start
reading other people's code, you're likely to come across conditionals like if(strcmp(a, b)) or
even if(!strcmp(a, b)). The first does something if the strings are unequal; the second does
something if they're equal. You can read these more easily if you pretend for a moment that strcmp's
name were strdiff, instead.)
Another standard library function is strcat, which concatenates strings. It does not concatenate two
strings together and give you a third, new string; what it really does is append one string onto the end of
another. (If it gave you a new string, it would have to allocate memory for it somewhere, and the
standard library string functions generally never do that for you automatically.) Here's an example:
char string5[20] = "Hello, ";
char string6[] = "world!";
printf("%s\n", string5);
strcat(string5, string6);
printf("%s\n", string5);
Chapter 8: Strings
The first call to printf prints ``Hello, '', and the second one prints ``Hello, world!'', indicating that the
contents of string6 have been tacked on to the end of string5. Notice that we declared string5
with extra space, to make room for the appended characters.
If you have a string and you want to know its length (perhaps so that you can check whether it will fit in
some other array you've allocated for it), you can call strlen, which returns the length of the string (i.e.
the number of characters in it), not including the \0:
char string7[] = "abc";
int len = strlen(string7);
printf("%d\n", len);
Finally, you can print strings out with printf using the %s format specifier, as we've been doing in
these examples already (e.g. printf("%s\n", string5);).
Since a string is just an array of characters, all of the string-handling functions we've just seen can be
written quite simply, using no techniques more complicated than the ones we already know. In fact, it's
quite instructive to look at how these functions might be implemented. Here is a version of strcpy:
mystrcpy(char dest[], char src[])
{
int i = 0;
while(src[i] != '\0')
{
dest[i] = src[i];
i++;
}
dest[i] = '\0';
}
We've called it mystrcpy instead of strcpy so that it won't clash with the version that's already in the
standard library. Its operation is simple: it looks at characters in the src string one at a time, and as long
as they're not \0, assigns them, one by one, to the corresponding positions in the dest string. When it's
done, it terminates the dest string by appending a \0. (After exiting the while loop, i is guaranteed to
have a value one greater than the subscript of the last character in src.) For comparison, here's a way of
writing the same code, using a for loop:
for(i = 0; src[i] != '\0'; i++)
dest[i] = src[i];
dest[i] = '\0';
http://www.eskimo.com/~scs/cclass/notes/sx8.html (3 of 8) [22/07/2003 5:31:47 PM]
Chapter 8: Strings
Yet a third possibility is to move the test for the terminating \0 character out of the for loop header and
into the body of the loop, using an explicit if and break statement, so that we can perform the test
after the assignment and therefore use the assignment inside the loop to copy the \0 to dest, too:
for(i = 0; ; i++)
{
dest[i] = src[i];
if(src[i] == '\0')
break;
}
(There are in fact many, many ways to write strcpy. Many programmers like to combine the
assignment and test, using an expression like (dest[i] = src[i]) != '\0'. This is actually the
same sort of combined operation as we used in our getchar loop in chapter 6.)
Here is a version of strcmp:
mystrcmp(char str1[], char str2[])
{
int i = 0;
while(1)
{
if(str1[i] != str2[i])
return str1[i] - str2[i];
if(str1[i] == '\0' || str2[i] == '\0')
return 0;
i++;
}
}
Characters are compared one at a time. If two characters in one position differ, the strings are different,
and we are supposed to return a value less than zero if the first string (str1) is alphabetically less than
the second string. Since characters in C are represented by their numeric character set values, and since
most reasonable character sets assign values to characters in alphabetical order, we can simply subtract
the two differing characters from each other: the expression str1[i] - str2[i] will yield a
negative result if the i'th character of str1 is less than the corresponding character in str2. (As it
turns out, this will behave a bit strangely when comparing upper- and lower-case letters, but it's the
traditional approach, which the standard versions of strcmp tend to use.) If the characters are the same,
we continue around the loop, unless the characters we just compared were (both) \0, in which case
we've reached the end of both strings, and they were both equal. Notice that we used what may at first
Chapter 8: Strings
appear to be an infinite loop--the controlling expression is the constant 1, which is always true. What
actually happens is that the loop runs until one of the two return statements breaks out of it (and the
entire function). Note also that when one string is longer than the other, the first test will notice this
(because one string will contain a real character at the [i] location, while the other will contain \0, and
these are not equal) and the return value will be computed by subtracting the real character's value from
0, or vice versa. (Thus the shorter string will be treated as ``less than'' the longer.)
Finally, here is a version of strlen:
int mystrlen(char str[])
{
int i;
for(i = 0; str[i] != '\0'; i++)
{}
return i;
}
In this case, all we have to do is find the \0 that terminates the string, and it turns out that the three
control expressions of the for loop do all the work; there's nothing left to do in the body. Therefore, we
use an empty pair of braces {} as the loop body. Equivalently, we could use a null statement, which is
simply a semicolon:
for(i = 0; str[i] != '\0'; i++)
;
Empty loop bodies can be a bit startling at first, but they're not unheard of.
Everything we've looked at so far has come out of C's standard libraries. As one last example, let's write
a substr function, for extracting a substring out of a larger string. We might call it like this:
char string8[] = "this is a test";
char string9[10];
substr(string9, string8, 5, 4);
printf("%s\n", string9);
The idea is that we'll extract a substring of length 4, starting at character 5 (0-based) of string8, and
copy the substring to string9. Just as with strcpy, it's our responsibility to declare the destination
string (string9) big enough. Here is an implementation of substr. Not surprisingly, it's quite similar
to strcpy:
Chapter 8: Strings
Chapter 8: Strings
about the modifiability of strings in a later chapter on pointers.) Since you're replacing a character, you
want a character constant, 'H'. It would not be right to write
string[0] = "H";
/* WRONG */
because "H" is a string (an array of characters), not a single character. (The destination of the
assignment, string[0], is a char, but the right-hand side is a string; these types don't match.)
On the other hand, when you need a string, you must use a string. To print a single newline, you could
call
printf("\n");
It would not be correct to call
printf('\n');
/* WRONG */
printf always wants a string as its first argument. (As one final example, putchar wants a single
character, so putchar('\n') would be correct, and putchar("\n") would be incorrect.)
We must also remember the difference between strings and integers. If we treat the character '1' as an
integer, perhaps by saying
int i = '1';
we will probably not get the value 1 in i; we'll get the value of the character '1' in the machine's
character set. (In ASCII, it's 49.) When we do need to find the numeric value of a digit character (or to go
the other way, to get the digit character with a particular value) we can make use of the fact that, in any
character set used by C, the values for the digit characters, whatever they are, are contiguous. In other
words, no matter what values '0' and '1' have, '1' - '0' will be 1 (and, obviously, '0' - '0'
will be 0). So, for a variable c holding some digit character, the expression
c - '0'
gives us its value. (Similarly, for an integer value i, i + '0' gives us the corresponding digit
character, as long as 0 <= i <= 9.)
Just as the character '1' is not the integer 1, the string "123" is not the integer 123. When we have a
string of digits, we can convert it to the corresponding integer by calling the standard function atoi:
char string[] = "123";
http://www.eskimo.com/~scs/cclass/notes/sx8.html (7 of 8) [22/07/2003 5:31:47 PM]
Chapter 8: Strings
int i = atoi(string);
int j = atoi("456");
Later we'll learn how to go in the other direction, to convert an integer into a string. (One way, as long as
what you want to do is print the number out, is to call printf, using %d in the format string.)
File inclusion: inserting the contents of another file into your source file, as if you had typed it all
in there.
Macro substitution: replacing instances of one piece of text with another.
Conditional compilation: Arranging that, depending on various circumstances, certain parts of
your source code are seen or not seen by the compiler at all.
The next three sections will introduce these three preprocessing functions.
The syntax of the preprocessor is different from the syntax of the rest of C in several respects. First of all,
the preprocessor is ``line based.'' Each of the preprocessor directives we're going to learn about (all of
which begin with the # character) must begin at the beginning of a line, and each ends at the end of the
line. (The rest of C treats line ends as just another whitespace character, and doesn't care how your
program text is arranged into lines.) Secondly, the preprocessor does not know about the structure of C-about functions, statements, or expressions. It is possible to play strange tricks with the preprocessor to
turn something which does not look like C into C (or vice versa). It's also possible to run into problems
when a preprocessor substitution does not do what you expected it to, because the preprocessor does not
respect the structure of C statements and expressions (but you expected it to). For the simple uses of the
preprocessor we'll be discussing, you shouldn't have any of these problems, but you'll want to be careful
before doing anything tricky or outrageous with the preprocessor. (As it happens, playing tricky and
outrageous games with the preprocessor is considered sporting in some circles, but it rapidly gets out of
hand, and can lead to bewilderingly impenetrable programs.)
9.1 File Inclusion
9.2 Macro Definition and Substitution
9.3 Conditional Compilation
External declarations of global variables and functions. We said that a global variable must have
exactly one defining instance, but that it can have external declarations in many places. We said
that it was a grave error to issue an external declaration in one place saying that a variable or
function has one type, when the defining instance in some other place actually defines it with
another type. (If the two places are two source files, separately compiled, the compiler will
probably not even catch the discrepancy.) If you put the external declarations in a header file,
however, and include the header wherever it's needed, the declarations are virtually guaranteed to
be consistent. It's a good idea to include the header in the source file where the defining instance
appears, too, so that the compiler can check that the declaration and definition match. (That is, if
you ever change the type, you do still have to change it in two places: in the source file where the
defining instance occurs, and in the header file where the external declaration appears. But at least
you don't have to change it in an arbitrary number of places, and, if you've set things up correctly,
the compiler can catch any remaining mistakes.)
Preprocessor macro definitions (which we'll meet in the next section).
Structure definitions (which we haven't seen yet).
Typedef declarations (which we haven't seen yet).
Defining instances of global variables. If you put these in a header file, and include the header file
in more than one source file, the variable will end up multiply defined.
Function bodies (which are also defining instances). You don't want to put these in headers for the
same reason--it's likely that you'll end up with multiple copies of the function and hence
``multiply defined'' errors. People sometimes put commonly-used functions in header files and
then use #include to bring them (once) into each program where they use that function, or use
#include to bring together the several source files making up a program, but both of these are
poor ideas. It's much better to learn how to use your compiler or linker to combine together
separately-compiled object files.
Since header files typically contain only external declarations, and should not contain function bodies,
you have to understand just what does and doesn't happen when you #include a header file. The
header file may provide the declarations for some functions, so that the compiler can generate correct
code when you call them (and so that it can make sure that you're calling them correctly), but the header
file does not give the compiler the functions themselves. The actual functions will be combined into your
program at the end of compilation, by the part of the compiler called the linker. The linker may have to
get the functions out of libraries, or you may have to tell the compiler/linker where to find them. In
particular, if you are trying to use a third-party library containing some useful functions, the library will
often come with a header file describing those functions. Using the library is therefore a two-step
process: you must #include the header in the files where you call the library functions, and you must
tell the linker to read in the functions from the library itself.
#define B 3
#define C A + B
(this is a pretty meaningless example, but the situation does come up in practice). Then, later, suppose
that you write
int x = C * 2;
If A, B, and C were ordinary variables, you'd expect x to end up with the value 10. But let's see what
happens.
The preprocessor always substitutes text for macros exactly as you have written it. So it first substitites
the replacement text for the macro C, resulting in
int x = A + B * 2;
Then it substitutes the macros A and B, resulting in
int x = 2 + 3 * 2;
Only when the preprocessor is done doing all this substituting does the compiler get into the act. But
when it evaluates that expression (using the normal precedence of multiplication over addition), it ends
up initializing x with the value 8!
To guard against this sort of problem, it is always a good idea to include explicit parentheses in the
definitions of macros which contain expressions. If we were to define the macro C as
#define C (A + B)
then the declaration of x would ultimately expand to
int x = (2 + 3) * 2;
and x would be initialized to 10, as we probably expected.
Notice that there does not have to be (and in fact there usually is not) a semicolon at the end of a
#define line. (This is just one of the ways that the syntax of the preprocessor is different from the rest
of C.) If you accidentally type
#define MAXLINE 100;
/* WRONG */
/* WRONG */
which is a syntax error. This is what we mean when we say that the preprocessor doesn't know much of
anything about the syntax of C--in this last example, the value or replacement text for the macro
MAXLINE was the 4 characters 1 0 0 ; , and that's exactly what the preprocessor substituted (even
though it didn't make any sense).
Simple macros like MAXLINE act sort of like little variables, whose values are constant (or constant
expressions). It's also possible to have macros which look like little functions (that is, you invoke them
with what looks like function call syntax, and they expand to replacement text which is a function of the
actual arguments they are invoked with) but we won't be looking at these yet.
You are trying to write a portable program, but the way you do something is different depending
on what compiler, operating system, or computer you're using. You place different versions of
your code, one for each situation, between suitable #ifdef directives, and when you compile the
progam in a particular environment, you arrange to have the macro names defined which select
the variants you need in that environment. (For this reason, compilers usually have ways of letting
you define macros from the invocation command line or in a configuration file, and many also
predefine certain macro names related to the operating system, processor, or compiler in use. That
way, you don't have to change the code to change the #define lines each time you compile it in
a different environment.)
For example, in ANSI C, the function to delete a file is remove. On older Unix systems,
however, the function was called unlink. So if filename is a variable containing the name of
a file you want to delete, and if you want to be able to compile the program under these older
Unix systems, you might write
#ifdef unix
unlink(filename);
#else
remove(filename);
#endif
Then, you could place the line
#define unix
at the top of the file when compiling under an old Unix system. (Since all you're using the macro
unix for is to control the #ifdef, you don't need to give it any replacement text at all. Any
definition for a macro, even if the replacement text is empty, causes an #ifdef to succeed.)
(In fact, in this example, you wouldn't even need to define the macro unix at all, because C
compilers on old Unix systems tend to predefine it for you, precisely so you can make tests like
these.)
You want to compile several different versions of your program, with different features present in
the different versions. You bracket the code for each feature with #ifdef directives, and (as for
the previous case) arrange to have the right macros defined or not to build the version you want to
build at any given time. This way, you can build the several different versions from the same
source code. (One common example is whether you turn debugging statements on or off. You can
bracket each debugging printout with #ifdef DEBUG and #endif, and then turn on
debugging only when you need it.)
For example, you might use lines like this:
#ifdef DEBUG
printf("x is %d\n", x);
#endif
to print out the value of the variable x at some point in your program to see if it's what you
expect. To enable debugging printouts, you insert the line
#define DEBUG
at the top of the file, and to turn them off, you delete that line, but the debugging printouts quietly
remain in your code, temporarily deactivated, but ready to reactivate if you find yourself needing
them again later. (Also, instead of inserting and deleting the #define line, you might use a
compiler flag such as -DDEBUG to define the macro DEBUG from the compiler invocatin line.)
Conditional compilation can be very handy, but it can also get out of hand. When large chunks of the
program are completely different depending on, say, what operating system the program is being
compiled for, it's often better to place the different versions in separate source files, and then only use
one of the files (corresponding to one of the versions) to build the program on any given system. Also, if
you are using an ANSI Standard compiler and you are writing ANSI-compatible code, you usually won't
need so much conditional compilation, because the Standard specifies exactly how the compiler must do
certain things, and exactly which library functions it much provide, so you don't have to work so hard to
accommodate the old variations among compilers and libraries.
We could say that the markers in the mailboxes for the president, secretary, and treasurer were pointers
to other mailboxes. In an analogous way, pointer variables in C contain pointers to other variables or
memory locations.
10.1 Basic Pointer Operations
10.2 Pointers and Arrays; Pointer Arithmetic
10.3 Pointer Subtraction and Comparison
10.4 Null Pointers
10.5 ``Equivalence'' between Pointers and Arrays
10.6 Arrays and Pointers as Function Arguments
10.7 Strings
10.8 Example: Breaking a Line into ``Words''
i is a variable of type int, so the value in its box is a number, 5. ip is a variable of type pointer-to-int,
so the ``value'' in its box is an arrow pointing at another box. Referring once again back to the ``two-step
process'' for setting a pointer variable: the & operator draws us the arrowhead pointing at i's box, and the
assignment operator =, with the pointer variable ip on its left, anchors the other end of the arrow in ip's
box.
We discover the value pointed to by a pointer using the ``contents-of'' operator, *. Placed in front of a
http://www.eskimo.com/~scs/cclass/notes/sx10a.html (1 of 6) [22/07/2003 5:32:08 PM]
pointer, the * operator accesses the value pointed to by that pointer. In other words, if ip is a pointer,
then the expression *ip gives us whatever it is that's in the variable or location pointed to by ip. For
example, we could write something like
printf("%d\n", *ip);
which would print 5, since ip points to i, and i is (at the moment) 5.
(You may wonder how the asterisk * can be the pointer contents-of operator when it is also the
multiplication operator. There is no ambiguity here: it is the multiplication operator when it sits between
two variables, and it is the contents-of operator when it sits in front of a single variable. The situation is
analogous to the minus sign: between two variables or expressions it's the subtraction operator, but in
front of a single operator or expression it's the negation operator. Technical terms you may hear for these
distinct roles are unary and binary: a binary operator applies to two operands, usually on either side of it,
while a unary operator applies to a single operand.)
The contents-of operator * does not merely fetch values through pointers; it can also set values through
pointers. We can write something like
*ip = 7;
which means ``set whatever ip points to to 7.'' Again, the * tells us to go to the location pointed to by ip,
but this time, the location isn't the one to fetch from--we're on the left-hand sign of an assignment
operator, so *ip tells us the location to store to. (The situation is no different from array subscripting
expressions such as a[3] which we've already seen appearing on both sides of assignments.)
The result of the assignment *ip = 7 is that i's value is changed to 7, and the picture changes to:
and write
ip = &j;
we've changed ip itself. The picture now looks like this:
We have to be careful when we say that a pointer assignment changes ``what the pointer points to.'' Our
earlier assignment
*ip = 7;
changed the value pointed to by ip, but this more recent assignment
ip = &j;
has changed what variable ip points to. It's true that ``what ip points to'' has changed, but this time, it
has changed for a different reason. Neither i (which is still 7) nor j (which is still 3) has changed. (What
has changed is ip's value.) If we again call
printf("%d\n", *ip);
this time it will print 3.
We can also assign pointer values to other pointer variables. If we declare a second pointer variable:
int *ip2;
then we can say
ip2 = ip;
Now ip2 points where ip does; we've essentially made a ``copy'' of the arrow:
/* WRONG */
5 is an integer, but ip is a pointer. You probably wanted to ``set the value pointed to by ip to 5,'' which
you express by writing
*ip = 5;
Similarly, you can't ``see what ip is'' by writing
printf("%d\n", ip);
/* WRONG */
Again, ip is a pointer-to-int, but %d expects an int. To print what ip points to, use
http://www.eskimo.com/~scs/cclass/notes/sx10a.html (4 of 6) [22/07/2003 5:32:08 PM]
printf("%d\n", *ip);
Finally, a few more notes about pointer declarations. The * in a pointer declaration is related to, but
different from, the contents-of operator *. After we declare a pointer variable
int *ip;
the expression
ip = &i
sets what ip points to (that is, which location it points to), while the expression
*ip = 5
sets the value of the location pointed to by ip. On the other hand, if we declare a pointer variable and
include an initializer:
int *ip3 = &i;
we're setting the initial value for ip3, which is where ip3 will point, so that initial value is a pointer. (In
other words, the * in the declaration int *ip3 = &i; is not the contents-of operator, it's the indicator
that ip3 is a pointer.)
If you have a pointer declaration containing an initialization, and you ever have occasion to break it up
into a simple declaration and a conventional assignment, do it like this:
int *ip3;
ip3 = &i;
Don't write
int *ip3;
*ip3 = &i;
or you'll be trying to mix oil and water again.
Also, when we write
int *ip;
although the asterisk affects ip's type, it goes with the identifier name ip, not with the type int on the
left. To declare two pointers at once, the declaration looks like
int *ip1, *ip2;
Some people write pointer declarations like this:
int* ip;
This works for one pointer, because C essentially ignores whitespace. But if you ever write
int* ip1, ip2;
/* PROBABLY WRONG */
it will declare one pointer-to-int ip1 and one plain int ip2, which is probably not what you meant.
What is all of this good for? If it was just for changing variables like i from 5 to 7, it would not be good
for much. What it's good for, among other things, is when for various reasons we don't know exactly
which variable we want to change, just like the bank didn't know exactly which club member it wanted to
send the statement to.
We'd use this ip just like the one in the previous section: *ip gives us what ip points to, which in this
case will be the value in a[3].
Once we have a pointer pointing into an array, we can start doing pointer arithmetic. Given that ip is a
pointer to a[3], we can add 1 to ip:
ip + 1
What does it mean to add one to a pointer? In C, it gives a pointer to the cell one farther on, which in this
case is a[4]. To make this clear, let's assign this new pointer to another pointer variable:
ip2 = ip + 1;
Now the picture looks like this:
If we now do
*ip2 = 4;
we've set a[4] to 4. But it's not necessary to assign a new pointer value to a pointer variable in order to
use it; we could also compute a new pointer value and use it immediately:
*(ip + 1) = 5;
In this last example, we've changed a[4] again, setting it to 5. The parentheses are needed because the
unary ``contents of'' operator * has higher precedence (i.e., binds more tightly than) the addition
operator. If we wrote *ip + 1, without the parentheses, we'd be fetching the value pointed to by ip,
and adding 1 to that value. The expression *(ip + 1), on the other hand, accesses the value one past
the one pointed to by ip.
Given that we can add 1 to a pointer, it's not surprising that we can add and subtract other numbers as
well. If ip still points to a[3], then
*(ip + 3) = 7;
sets a[6] to 7, and
*(ip - 2) = 4;
sets a[1] to 4.
Up above, we added 1 to ip and assigned the new pointer to ip2, but there's no reason we can't add one
to a pointer, and change the same pointer:
ip = ip + 1;
Now ip points one past where it used to (to a[4], if we hadn't changed it in the meantime). The
shortcuts we learned in a previous chapter all work for pointers, too: we could also increment a pointer
using
ip += 1;
or
ip++;
Of course, pointers are not limited to ints. It's quite common to use pointers to other types, especially
char. Here is the innards of the mystrcmp function we saw in a previous chapter, rewritten to use
pointers. (mystrcmp, you may recall, compares two strings, character by character.)
within another, returning a pointer to the string if it can, or a null pointer if it cannot. Here is the function, using the
obvious brute-force algorithm: at every character of the input string, the code checks for a match there of the pattern
string:
#include <stddef.h>
char *mystrstr(char input[], char pat[])
{
char *start, *p1, *p2;
for(start = &input[0]; *start != '\0'; start++)
{
/* for each position in input string... */
p1 = pat;
/* prepare to check for pattern string there */
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2) /* characters differ */
break;
p1++;
p2++;
}
if(*p1 == '\0')
/* found match */
return start;
}
return NULL;
}
The start pointer steps over each character position in the input string. At each character, the inner loop checks
for a match there, by using p1 to step over the pattern string (pat), and p2 to step over the input string (starting at
start). We compare successive characters until either (a) we reach the end of the pattern string (*p1 == '\0'),
or (b) we find two characters which differ. When we're done with the inner loop, if we reached the end of the pattern
string (*p1 == '\0'), it means that all preceding characters matched, and we found a complete match for the
pattern starting at start, so we return start. Otherwise, we go around the outer loop again, to try another starting
position. If we run out of those (if *start == '\0'), without finding a match, we return a null pointer.
Notice that the function is declared as returning (and does in fact return) a pointer-to-char.
We can use mystrstr (or its standard library counterpart strstr) to determine whether one string contains
another:
if(mystrstr("Hello, world!", "lo") == NULL)
printf("no\n");
else
printf("yes\n");
In general, C does not initialize pointers to null for you, and it never tests pointers to see if they are null before using
them. If one of the pointers in your programs points somewhere some of the time but not all of the time, an excellent
convention to use is to set it to a null pointer when it doesn't point anywhere valid, and to test to see if it's a null
pointer before using it. But you must use explicit code to set it to NULL, and to test it against NULL. (In other words,
http://www.eskimo.com/~scs/cclass/notes/sx10d.html (2 of 3) [22/07/2003 5:32:15 PM]
just setting an unused pointer variable to NULL doesn't guarantee safety; you also have to check for the null value
before using the pointer.) On the other hand, if you know that a particular pointer variable is always valid, you don't
have to insert a paranoid test against NULL before using it.
/* WRONG */
is illegal. As we've seen, though, you can assign two pointer variables:
int *ip1, *ip2;
ip1 = &a[0];
ip2 = ip1;
Pointer assignment is straightforward; the pointer on the left is simply made to point wherever the pointer
on the right does. We haven't copied the data pointed to (there's still just one copy, in the same place);
we've just made two pointers point to that one place.
The similarities between arrays and pointers end up being quite useful, and in fact C builds on the
similarities, leading to what is called ``the equivalence of arrays and pointers in C.'' When we speak of
this ``equivalence'' we do not mean that arrays and pointers are the same thing (they are in fact quite
different), but rather that they can be used in related ways, and that certain operations may be used
between them.
The first such operation is that it is possible to (apparently) assign an array to a pointer:
int a[10];
int *ip;
ip = a;
What can this mean? In that last assignment ip = a, aren't we mixing apples and oranges again? It
http://www.eskimo.com/~scs/cclass/notes/sx10e.html (1 of 3) [22/07/2003 5:32:17 PM]
turns out that we are not; C defines the result of this assignment to be that ip receives a pointer to the
first element of a. In other words, it is as if you had written
ip = &a[0];
The second facet of the equivalence is that you can use the ``array subscripting'' notation [i] on
pointers, too. If you write
ip[3]
it is just as if you had written
*(ip + 3)
So when you have a pointer that points to a block of memory, such as an array or a part of an array, you
can treat that pointer ``as if'' it were an array, using the convenient [i] notation. In other words, at the
beginning of this section when we talked about *ip, *(ip+1), *(ip+2), and *(ip+i), we could
have written ip[0], ip[1], ip[2], and ip[i]. As we'll see, this can be quite useful (or at least
convenient).
The third facet of the equivalence (which is actually a more general version of the first one we
mentioned) is that whenever you mention the name of an array in a context where the ``value'' of the
array would be needed, C automatically generates a pointer to the first element of the array, as if you had
written &array[0]. When you write something like
int a[10];
int *ip;
ip = a + 3;
it is as if you had written
ip = &a[0] + 3;
which (and you might like to convince yourself of this) gives the same result as if you had written
ip = &a[3];
For example, if the character array
char string[100];
Therefore, it knows that a function can never actually receive an array as a parameter. Therefore,
whenever it sees you defining a function that seems to accept an array as a parameter, the compiler
quietly pretends that you had declared it as accepting a pointer, instead. The definition of getline
above is compiled exactly as if it had been written
int getline(char *line, int max)
{
...
}
Let's look at how getline might be written if we thought of its first parameter (argument) as a pointer,
instead:
int
{
int
int
max
#ifndef FGETLINE
while((c = getchar()) != EOF)
#else
while((c = getc(fp)) != EOF)
#endif
{
if(c == '\n')
break;
if(nch < max)
{
*(line + nch) = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
*(line + nch) = '\0';
return nch;
}
But, as we've learned, we can also use ``array subscript'' notation with pointers, so we could rewrite the
http://www.eskimo.com/~scs/cclass/notes/sx10f.html (2 of 4) [22/07/2003 5:32:20 PM]
#ifndef FGETLINE
while((c = getchar()) != EOF)
#else
while((c = getc(fp)) != EOF)
#endif
{
if(c == '\n')
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
But this is exactly what we'd written before (see chapter 6, Sec. 6.3), except that the declaration of the
line parameter is different. In other words, within the body of the function, it hardly matters whether
we thought line was an array or a pointer, since we can use array subscripting notation with both arrays
and pointers.
These games that the compiler is playing with arrays and pointers may seem bewildering at first, and it
may seem faintly miraculous that everything comes out in the wash when you declare a function like
getline that seems to accept an array. The equivalence in C between arrays and pointers can be
confusing, but it does work and is one of the central features of C. If the games which the compiler plays
(pretending that you declared a parameter as a pointer when you thought you declared it as an array)
bother you, you can do two things:
1. Continue to pretend that functions can receive arrays as parameters; declare and use them that
way, but remember that unlike other arguments, a function can modify the copy in its caller of an
argument that (seems to be) an array.
2. Realize that arrays are always passed to functions as pointers, and always declare your functions
as accepting pointers.
10.7 Strings
10.7 Strings
Because of the ``equivalence'' of arrays and pointers, it is extremely common to refer to and manipulate
strings as character pointers, or char *'s. It is so common, in fact, that it is easy to forget that strings
are arrays, and to imagine that they're represented by pointers. (Actually, in the case of strings, it may not
even matter that much if the distinction gets a little blurred; there's certainly nothing wrong with referring
to a character pointer, suitably initialized, as a ``string.'') Let's look at a few of the implications:
1. Any function that manipulates a string will actually accept it as a char * argument. The caller
may pass an array containing a string, but the function will receive a pointer to the array's
(string's) first element (character).
2. The %s format in printf expects a character pointer.
3. Although you have to use strcpy to copy a string from one array to another, you can use simple
pointer assignment to assign a string to a pointer. The string being assigned might either be in an
array or pointed to by another pointer. In other words, given
char string[] = "Hello, world!";
char *p1, *p2;
both
p1 = string
and
p2 = p1
are legal. (Remember, though, that when you assign a pointer, you're making a copy of the pointer
but not of the data it points to. In the first example, p1 ends up pointing to the string in string.
In the second example, p2 ends up pointing to the same string as p1. In any case, after a pointer
assignment, if you ever change the string (or other data) pointed to, the change is ``visible'' to both
pointers.
4. Many programs manipulate strings exclusively using character pointers, never explicitly declaring
any actual arrays. As long as these programs are careful to allocate appropriate memory for the
strings, they're perfectly valid and correct.
When you start working heavily with strings, however, you have to be aware of one subtle fact.
When you initialize a character array with a string constant:
char string[] = "Hello, world!";
http://www.eskimo.com/~scs/cclass/notes/sx10g.html (1 of 3) [22/07/2003 5:32:22 PM]
10.7 Strings
you end up with an array containing the string, and you can modify the array's contents to your heart's
content:
string[0] = 'J';
However, it's possible to use string constants (the formal term is string literals) at other places in your
code. Since they're arrays, the compiler generates pointers to their first elements when they're used in
expressions, as usual. That is, if you say
char *p1 = "Hello";
int len = strlen("world");
it's almost as if you'd said
char internal_string_1[] = "Hello";
char internal_string_2[] = "world";
char *p1 = &internal_string_1[0];
int len = strlen(&internal_string_2[0]);
Here, the arrays named internal_string_1 and internal_string_2 are supposed to suggest
the fact that the compiler is actually generating little temporary arrays every time you use a string
constant in your code. However, the subtle fact is that the arrays which are ``behind'' the string constants
are not necessarily modifiable. In particular, the compiler may store them in read-only-memory.
Therefore, if you write
char *p3 = "Hello, world!";
p3[0] = 'J';
your program may crash, because it may try to store a value (in this case, the character 'J') into
nonwritable memory.
The moral is that whenever you're building or modifying strings, you have to make sure that the memory
you're building or modifying them in is writable. That memory should either be an array you've
allocated, or some memory which you've dynamically allocated by the techniques which we'll see in the
next chapter. Make sure that no part of your program will ever try to modify a string which is actually
one of the unnamed, unwritable arrays which the compiler generated for you in response to one of your
string constants. (The only exception is array initialization, because if you write to such an array, you're
writing to the array, not to the string literal which you used to initialize the array.)
10.7 Strings
#include <ctype.h>
getwords(char *line, char *words[], int maxwords)
{
char *p = line;
int nwords = 0;
while(1)
{
while(isspace(*p))
p++;
if(*p == '\0')
return nwords;
words[nwords++] = p;
while(!isspace(*p) && *p != '\0')
p++;
if(*p == '\0')
return nwords;
*p++ = '\0';
if(nwords >= maxwords)
return nwords;
}
}
Each time through the outer while loop, the function tries to find another word. First it skips over
whitespace (which might be leading spaces on the line, or the space(s) separating this word from the
previous one). The isspace function is new: it's in the standard library, declared in the header file
<ctype.h>, and it returns nonzero (``true'') if the character you hand it is a space character (a space or
a tab, or any other whitespace character there might happen to be).
When the function finds a non-whitespace character, it has found the beginning of another word, so it
places the pointer to that character in the next cell of the words array. Then it steps though the word,
looking at non-whitespace characters, until it finds another whitespace character, or the \0 at the end of
the line. If it finds the \0, it's done with the entire line; otherwise, it changes the whitespace character to
a \0, to terminate the word it's just found, and continues. (If it's found as many words as will fit in the
words array, it returns prematurely.)
Each time it finds a word, the function increments the number of words (nwords) it has found. Since
http://www.eskimo.com/~scs/cclass/notes/sx10h.html (2 of 3) [22/07/2003 5:32:24 PM]
arrays in C start at [0], the number of words the function has found so far is also the index of the cell in
the words array where the next word should be stored. The function actually assigns the next word and
increments nwords in one expression:
words[nwords++] = p;
You should convince yourself that this arrangement works, and that (in this case) the preincrement form
words[++nwords] = p;
/* WRONG */
The 100 bytes of memory (not all of which are shown) pointed to by line are those allocated by
malloc. (They are brand-new memory, conceptually a bit different from the memory which the compiler
arranges to have allocated automatically for our conventional variables. The 100 boxes in the figure don't
have a name next to them, because they're not storage for a variable we've declared.)
As a second example, we might have occasion to allocate a piece of memory, and to copy a string into it
with strcpy:
char *p = malloc(15);
http://www.eskimo.com/~scs/cclass/notes/sx11a.html (1 of 3) [22/07/2003 5:32:30 PM]
immediately check for a null pointer, and if it receives one, it should at the very least print an error
message and exit, or perhaps figure out some way of proceeding without the memory it asked for. But it
cannot go on to use the null pointer it got back from malloc in any way, because that null pointer by
definition points nowhere. (``It cannot use a null pointer in any way'' means that the program cannot use
the * or [] operators on such a pointer value, or pass it to any function that expects a valid pointer.)
A call to malloc, with an error check, typically looks something like this:
int *ip = malloc(100 * sizeof(int));
if(ip == NULL)
{
printf("out of memory\n");
exit or return
}
After printing the error message, this code should return to its caller, or exit from the program entirely; it
cannot proceed with the code that would have used ip.
Of course, in our examples so far, we've still limited ourselves to ``fixed size'' regions of memory,
because we've been calling malloc with fixed arguments like 10 or 100. (Our call to getline is still
limited to 100-character lines, or whatever number we set the linelen variable to; our ip variable still
points at only 100 ints.) However, since the sizes are now values which can in principle be determined
at run-time, we've at least moved beyond having to recompile the program (with a bigger array) to
accommodate longer lines, and with a little more work, we could arrange that the ``arrays'' automatically
grew to be as large as required. (For example, we could write something like getline which could read
the longest input line actually seen.) We'll begin to explore this possibility in a later section.
When thinking about malloc, free, and dynamically-allocated memory in general, remember again
the distinction between a pointer and what it points to. If you call malloc to allocate some memory, and
store the pointer which malloc gives you in a local pointer variable, what happens when the function
containing the local pointer variable returns? If the local pointer variable has automatic duration (which
is the default, unless the variable is declared static), it will disappear when the function returns. But
for the pointer variable to disappear says nothing about the memory pointed to! That memory still exists
and, as far as malloc and free are concerned, is still allocated. The only thing that has disappeared is
the pointer variable you had which pointed at the allocated memory. (Furthermore, if it contained the
only copy of the pointer you had, once it disappears, you'll have no way of freeing the memory, and no
way of using it, either. Using memory and freeing memory both require that you have at least one pointer
to the memory!)
Putting this all together, here is a piece of code which reads lines of text from the user, treats each line as an
integer by calling atoi, and stores each integer in a dynamically-allocated ``array'':
#define MAXLINE 100
char line[MAXLINE];
int *ip;
int nalloc, nitems;
nalloc = 100;
ip = malloc(nalloc * sizeof(int));
if(ip == NULL)
{
printf("out of memory\n");
exit(1);
}
/* initial allocation */
nitems = 0;
while(getline(line, MAXLINE) != EOF)
{
if(nitems >= nalloc)
{
/* increase allocation */
int *newp;
nalloc += 100;
newp = realloc(ip, nalloc * sizeof(int));
if(newp == NULL)
{
printf("out of memory\n");
exit(1);
}
ip = newp;
}
ip[nitems++] = atoi(line);
}
We use two different variables to keep track of the ``array'' pointed to by ip. nalloc is now many
elements we've allocated, and nitems is how many of them are in use. Whenever we're about to store
another item in the ``array,'' if nitems >= nalloc, the old ``array'' is full, and it's time to call realloc
to make it bigger.
Finally, we might ask what the return type of malloc and realloc is, if they are able to return pointers to
char or pointers to int or (though we haven't seen it yet) pointers to any other type. The answer is that
both of these functions are declared (in <stdlib.h>) as returning a type we haven't seen, void * (that is,
http://www.eskimo.com/~scs/cclass/notes/sx11c.html (2 of 3) [22/07/2003 5:32:35 PM]
pointer to void). We haven't really seen type void, either, but what's going on here is that void * is
specially defined as a ``generic'' pointer type, which may be used (strictly speaking, assigned to or from) any
pointer type.
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
Now we could read one line from ifp by calling
char line[MAXLINE];
...
fgetline(ifp, line, MAXLINE);
2
6
10
34
78
112
Suppose you wanted to read these numbers into an array. (Actually, the array will be an array of arrays,
or a ``multidimensional'' array; see section 4.1.2.) We can write code to do this by putting together
several pieces: the fgetline function we just showed, and the getwords function from chapter 10.
Assuming that the data file is named input.dat, the code would look like this:
#define MAXLINE 100
#define MAXROWS 10
#define MAXCOLS 10
int array[MAXROWS][MAXCOLS];
char *filename = "input.dat";
FILE *ifp;
char line[MAXLINE];
char *words[MAXCOLS];
int nrows = 0;
int n;
int i;
ifp = fopen(filename, "r");
if(ifp == NULL)
{
fprintf(stderr, "can't open %s\n", filename);
exit(EXIT_FAILURE);
}
while(fgetline(ifp, line, MAXLINE) != EOF)
{
if(nrows >= MAXROWS)
{
fprintf(stderr, "too many rows\n");
exit(EXIT_FAILURE);
}
n = getwords(line, words, MAXCOLS);
We have not said much about pointers to pointers, or arrays of arrays (i.e. multidimensional arrays), or
the ramifications of array/pointer equivalence on multidimensional arrays. (In particular, a reference to
an array of arrays does not generate a pointer to a pointer; it generates a pointer to an array. You cannot
pass a multidimensional array to a function which accepts pointers to pointers.)
Variables can be declared with a hint that they be placed in high-speed CPU registers, for efficiency.
(These hints are rarely needed or used today, because modern compilers do a good job of register
allocation by themselves, without hints.)
A mechanism called typedef allows you to define user-defined aliases (i.e. new and perhaps moreconvenient names) for other types.
Operators
Operators
The bitwise operators &, |, ^, and ~ operate on integers thought of as binary numbers or strings of bits.
The & operator is bitwise AND, the | operator is bitwise OR, the ^ operator is bitwise exclusive-OR
(XOR), and the ~ operator is a bitwise negation or complement. (&, |, and ^ are ``binary'' in that they
take two operands; ~ is unary.) These operators let you work with the individual bits of a variable; one
common use is to treat an integer as a set of single-bit flags. You might define the 3rd (2**2) bit as the
``verbose'' flag bit by defining
#define VERBOSE 4
Then you can ``turn the verbose bit on'' in an integer variable flags by executing
flags = flags | VERBOSE;
or
flags |= VERBOSE;
and turn it off with
flags = flags & ~VERBOSE;
or
flags &= ~VERBOSE;
and test whether it's set with
if(flags & VERBOSE)
The left-shift and right-shift operators << and >> let you shift an integer left or right by some number of
bit positions; for example, value << 2 shifts value left by two bits.
The ?: or conditional operator (also called the ``ternary operator'') essentially lets you embed an
if/then statement in an expression. The assignment
a = expr ? b : c;
is roughly equivalent to
if(expr)
else
a = b;
a = c;
Operators
Since you can use ?: anywhere in an expression, it can do things that if/then can't, or that would be
cumbersome with if/then. For example, the function call
f(a, b, c ? d : e);
is roughly equivalent to
if(c)
else
f(a, b, d);
f(a, b, e);
Operators
d = (double)i / j;
will set d to 0.5. You can also ``cast to void'' to explicitly indicate that you're ignoring a function's
return value, as in
(void)fclose(fp);
or
(void)printf("Hello, world!\n");
(Usually, it's a bad idea to ignore return values, but in some cases it's essentially inevitable, and the
(void) cast keeps some compilers from issuing warnings every time you ignore a value.)
There's a precise, mildly elaborate set of rules which C uses for converting values automatically, in the
absence of explicit casts.
The . and -> operators let you access the members (components) of structures and unions.
Statements
Statements
The switch statement allows you to jump to one of a number of numeric case labels depending on the
value of an expression; it's more convenient than a long if/else chain. (However, you can use
switch only when the expression is integral and all of the case labels are compile-time constants.)
The do/while loop is a loop that tests its controlling expression at the bottom of the loop, so that the
body of the loop always executes once even if the condition is initially false. (C's do/while loop is
therefore like Pascal's repeat/until loop, while C's while loop is like Pascal's while/do loop.)
Finally, when you really need to write ``spaghetti code,'' C does have the all-purpose goto statement,
and labels to go to.
Functions
Functions
Functions can't return arrays, and it's tricky to write a function as if it returns an array (perhaps by
simulating the array with a pointer) because you have to be careful about allocating the memory that the
returned pointer points to.
The functions we've written have all accepted a well-defined, fixed number of arguments. printf
accepts a variable number of arguments (depending on how many % signs there are in the format string)
but we haven't seen how to declare and write functions that do this.
C Preprocessor
C Preprocessor
If you're careful, it's possible (and can be useful) to use #include within a header file, so that you end
up with ``nested header files.''
It's possible to use #define to define ``function-like'' macros that accept arguments; the expansion of
the macro can therefore depend on the arguments it's ``invoked'' with.
Two special preprocessing operators # and ## let you control the expansion of macro arguments in
fancier ways.
The preprocessor directive #if lets you conditionally include (or, with #else, conditionally not
include) a section of code depending on some arbitrary compile-time expression. (#if can also do the
same macro-definedness tests as #ifdef and #ifndef, because the expression can use a defined()
operator.)
Other preprocessing directives are #elif, #error, #line, and #pragma.
There are a few predefined preprocessor macros, some required by the C standard, others perhaps
defined by particular compilation environments. These are useful for conditional compilation (#ifdef,
#ifndef).
strncat, and strncmp all accept a third argument telling them to stop after n characters if they
haven't found the \0 marking the end of the string. A third set of ``mem'' functions, including memcpy
and memcmp, operate on blocks of memory which aren't necessarily strings and where \0 is not treated
as a terminator. The strchr and strrchr functions find characters in strings. There is a motley
collection of ``span'' and ``scan'' functions, strspn, strcspn, and strpbrk, for searching out or
skipping over sequences of characters all drawn from a specified set of characters. The strtok function
aids in breaking up a string into words or ``tokens,'' much like our own getwords function.
The header file <ctype.h> contains several functions which let you classify and manipulate
characters: check for letters or digits, convert between upper- and lower-case, etc.
A host of mathematical functions are defined in the header file <math.h>. (As we've mentioned,
besides including <math.h>, you may on some Unix systems have to ask for a special library
containing the math functions while compiling/linking.)
There's a random-number generator, rand, and a way to ``seed'' it, srand. rand returns integers from
0 up to RAND_MAX (where RAND_MAX is a constant #defined in <stdlib.h>). One way of getting
random integers from 1 to n is to call
(int)(rand() / (RAND_MAX + 1.0) * n) + 1
Another way is
rand() / (RAND_MAX / n + 1) + 1
It seems like it would be simpler to just say
rand() % n + 1
but this method is imperfect (or rather, it's imperfect if n is a power of two and your system's
implementation of rand() is imperfect, as all too many of them are).
Several functions let you interact with the operating system under which your program is running. The
exit function returns control to the operating system immediately, terminating your program and
returning an ``exit status.'' The getenv function allows you to read your operating system's or process's
``environment variables'' (if any). The system function allows you to invoke an operating-system
command (i.e. another program) from within your program.
The qsort function allows you to sort an array (of any type); you supply a comparison function (via a
function pointer) which knows how to compare two array elements, and qsort does the rest. The
bsearch function allows you to search for elements in sorted arrays; it, too, operates in terms of a
caller-supplied comparison function.
http://www.eskimo.com/~scs/cclass/notes/sx14f.html (2 of 3) [22/07/2003 5:33:01 PM]
Copyright
This collection of hypertext pages is Copyright 1995-1997 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
C Programming Notes
C Programming Notes
Intermediate C Programming Class Notes, Chapter 15
Steve Summit
Read Sequentially
15.1: Structures
15.1: Structures
[This section corresponds to K&R Sec. 6.1]
The basic user-defined data type in C is the structure, or struct. (C structures are analogous to the
records found in some other languages.) Defining structures is a two-step process: first you define a
``template'' which describes the new type, then you declare variables having the new type (or functions
returning the new type, etc.).
As a simple example, suppose we wanted to define our own type for representing complex numbers. (If
you're blissfully ignorant of these beasts, a complex number consists of a ``real'' and ``imaginary'' part,
where the imaginary part is some multiple of the square root of negative 1. You don't have to understand
complex numbers to understand this example; you can think of the real and imaginary parts as the x and y
coordinates of a point on a plane.) FORTRAN has a built-in complex type, but C does not. How might
we add one? Since a complex number consists of a real and imaginary part, we need a way of holding
both these quantities in one data type, and a structure will do just the trick. Here is how we might declare
our complex type:
struct complex
{
double real;
double imag;
};
A structure declaration consists of up to four parts, of which we can see three in the example above. The
first part is the keyword struct which indicates that we are talking about a structure. The second part
is a name or tag by which this structure (that is, this new data type) will be known. The third part is a list
of the structure's members (also called components or fields). This list is enclosed in braces {}, and
contains what look like the declarations of ordinary variables. Each member has a name and a type, just
like ordinary variables, but here we are not declaring variables; we are setting up the structure of the
structure by defining the collection of data types which will make up the structure. Here we see that the
complex structure will be made up of two members, both of type double, one named real and one
named imag.
It's important to understand that what we've defined here is just the new data type; we have not yet
declared any variables of this new type! The name complex (the second part of the structure
declaration) is not the name of a variable; it's the name of the structure type. The names real and imag
are not the names of variables; they're identifiers for the two components of the structure.
We declare variables of our new complex type with declarations like these:
15.1: Structures
Actually, these pictures are a bit misleading; the outer box indicating each composite structure suggests
that there might be more inside them than just the two members, real and imag (that is, more than the
two values of type double). A simpler but more representative picture would be:
The only memory allocated is for two values of type double (the two boxes); all the names are just for
our convenience and the compiler's reference; none are typically stored in the program's memory at run
time.
Notice that when we define structures in this way we have not quite defined a new type on a par with
int or double. We can not say
complex c1;
/* WRONG */
The name complex does not become a full-fledged type name like int or double; it's just the name
of a particular structure, and we must use the keyword struct and the name of a particular structure
(e.g. complex) to talk about that structure type. (There is a way to define new full-fledged type names
like int and double, and in C++ a new structure does automatically become a full-fledged type, but
we won't worry about these wrinkles for now.)
I said that a structure definition consisted of up to four parts. We saw the first three of them in the first
example; the fourth part of a full strucure declaration is simply a list of variables, which are to be
declared as having the structure type at the same time as the structure itself is defined. For example, if we
had written
struct complex
http://www.eskimo.com/~scs/cclass/int/sx1a.html (2 of 4) [22/07/2003 5:33:13 PM]
15.1: Structures
{
double real;
double imag;
} c1, c2, c3;
we would have defined the type struct complex, and right away declared three variables c1, c2,
and c3 all of type struct complex.
In fact, three of the four parts of a structure declaration (all but the keyword struct) are optional. If a
declaration contains the keyword struct, a structure tag, and a brace-enclosed list of members (as in
the first structure definition we saw), it's a definition of the structure itself (that is, just the template). If a
declaration contains the keyword struct, a structure tag, and a list of variable names (as in the first
declarations of c1, c2, and c3 we saw), it's a declaration of those variables having that structure type
(the structure type itself must of course typically be declared elsewhere). If a declaration contains all four
elements (as in the second declaration of c1, c2, and c3 we saw), it's a definition of the structure type
and a declaration of some variables. It's also possible to use the first, third, and fourth parts:
struct
{
double real;
double imag;
} c1, c2, c3;
Here we declare c1, c2, and c3 as having a structure type with no tag name, which is not usually very
useful, because without a tag name we won't be able to declare any other variables or functions of this
type for c1, c2, and c3 to play with. (Finally, it's also possible to declare just the tag name, leaving both
the list of members and any declarations of variables for later, but this is only needed in certain fairly rare
and obscure situations.)
Because a structure definition can also declare variables, it's important not to forget the semicolon at the
end of a structure definition. If you accidentally write
struct complex
{
double real;
double imag;
}
without a semicolon, the compiler will keep looking for something later in the file and try to declare it as
being of type struct complex, which will either result in a confusing error message or (if the
compiler succeeds) a confusing misdeclaration.
15.1: Structures
p->m
selects that member of the pointed-to structure. The expression p->m is therefore exactly equivalent to
(*p).m
The list has two nodes, allocated by the compiler in response to the first two declarations we gave. We've
also allocated a pointer variable, head, which points at the ``head'' of the list.
Once we've built a list, we'll naturally want to do things with it. One of the simplest operations is to print
the list back out. The code to do so is very simple. We declare another list pointer lp and then cause it to
step over each node in the list, in turn:
http://www.eskimo.com/~scs/cclass/int/sx1e.html (1 of 5) [22/07/2003 5:33:26 PM]
in at the head of the list, by making the new node's next pointer point at the old head of the list, and
then resetting the head of the list to be the new node. (Another word for a list where we always add new
items at the beginning is a stack.)
Naturally, we'd like to encapsulate this operation of prepending an item to a list as a function. Doing so is
just a little bit tricky, because the list's head pointer is modified every time. There are several ways to
achieve this modification; the way we'll do it is to have our list-prepending function return a pointer to
the new head of the list. Here is the function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
looking at first:
struct listnode *newnode;
if((newnode = malloc(sizeof(struct listnode))) == NULL ||
(newnode->item = malloc(strlen(newword) + 1)) == NULL)
{
fprintf(stderr, "out of memory\n");
exit(1);
}
How does this work? First it calls malloc(sizeof(struct listnode)), and assigns the result
to newnode. Then it calls malloc(strlen(newword) + 1), and assigns the result to newnode>item. This code relies on two special guarantees of C's || operator, namely that it always evaluates its
left-hand side first, and that if the left-hand side evaluates as true, it doesn't bother to evaluate the righthand side, because once the left-hand side is true, the final result is definitely going to be true. Therefore,
we're guaranteed that we'll allocate space for newnode to point to before we try to fill in newnode>item, but if the first call to malloc fails, we won't call malloc a second time or try to fill in
newnode->item at all. These guarantees--of left-to-right evaluation, and of skipping the evaluation of
the right-hand side if the left-hand side determines the result--are unique to the || and && operators. It's
perfectly acceptable to rely on these guarantees when using || and &&, but don't assume that other
operators will operate deterministically from left to right, because most of them don't.)
Now that we have our prepend function, we can build a list by calling it several times in succession:
struct listnode *head = NULL;
head = prepend("world", head);
head = prepend("hello", head);
This code builds essentially the same list as our first, static example. Notice how we initialize the list
head pointer with a null pointer, which is synonymous with an empty list. (Notice also that the code we
wrote up above, for printing a list, would also deal correctly with an empty list.)
Using static calls to prepend is hardly more interesting than building a static link by hand. To make
things truly interesting, let's read words or strings typed by the user (or redirected from a file), and
prepend those to a list. The code is not hard:
#define MAXLINE 200
char line[MAXLINE];
struct listnode *head = NULL;
struct listnode *lp;
You can also tack two optional modifier characters onto the mode string:
Modes "r+" and "w+" let you read and write to the file. You can't read and write at the same time;
between stints of reading and stints of writing you must explicitly reposition the read/write indicator (see
section 16.9 below).
``Binary'' or "b" mode means that no translations are done by the stdio library when reading or
writing the file. Normally, the newline character \n is translated to and from some operating system
dependent end-of-line representation (LF on Unix, CR on the Macintosh, CRLF on MS-DOS, or an endof-record mark on record-oriented operating systems). On MS-DOS systems, without binary mode, a
control-Z character in a file is treated as an end-of-file indication; neither the control-Z nor any
characters following it will be seen by a program reading the file. In binary mode, on the other hand,
characters are read and written verbatim; newline characters (and, in MS-DOS, control-Z characters) are
not treated specially. You need to use binary mode when what you're reading and writing is arbitrary byte
values which are not to be interpreted as text characters.
Of course, it's possible to use both optional modes: "r+b", "w+b", etc. (For maximum portability, it's
preferable to put + before b in these cases.)
If, for any reason, fopen can't open the requested file in the requested mode, it returns a null pointer
(NULL). Whenever you call fopen, you must check that the returned file pointer is not null before
attempting to use the pointer for any I/O.
Most operating systems let you keep only a limited number of files open at a time. Also, many versions
of the stdio library allocate only a limited number of FILE structures for fopen to return pointers to.
Therefore, if a program opens many files in sequence, it's important for it to close them as it finishes with
them. Closing a file fp simply requires calling fclose(fp). (Any open streams are automatically
closed when the program exits normally.)
The standard I/O library normally buffers characters--that is, when you're writing, it saves up a chunk of
characters and then writes them to the actual file all at once; and when you're reading, it reads a chunk of
characters from the file all at once and them parcels them out to the program one at a time (or as many
characters at a time as the program asks for). The reasons for buffering have to do with efficiency--the
calls to the underlying operating system which request it to read and write files may be inefficient if
called once for each character or each few characters, and may be much more efficient if they're always
called for large blocks of characters. Normally, buffering is transparent to your program, but occasionally
it's necessary to ensure that some characters have actually been written. (One example is when you've
printed a prompt to the standard output, and you want to be sure that it's actually been written to the
screen.) In these cases, you can call fflush(fp), which flushes a stream's buffered output to the
underlying file (or screen, or other device). Naturally, the library automatically flushes output when you
call fclose on a stream, and also when your program exits.
(fflush is only defined for output streams. There is no standard way to discard unread, buffered input.)
The code may seem to work at first, but some day it will get confused when it reads a real character with a value which
seems to equal that which results when the non-char value EOF is crammed into a char.
One more reminder about getchar: although it returns and therefore seems to read one character at a time, it typically
delivers characters from internal buffers which may hold more characters which will be delivered later. For example,
most command-line-based operating systems let you type an entire line of input, and wait for you to type the RETURN
or ENTER key before making any of those characters available to a program (even if the program thought it was doing
character-at-a-time input with calls to getchar). There are, of course, ways to read characters immediately (without
waiting for the RETURN key), but they differ from operating system to operating system.
Writing single characters is just as easy as reading: putchar(c) writes the character c to standard output; putc(c,
fp) writes the character c to the stream fp. (The character c must be a real character. If you want to ``send'' an end-offile condition to a stream, that is, cause the program reading the stream to ``get'' end-of-file, you do that by closing the
stream, not by trying to write EOF to it.)
Occasionally, when reading characters, you sometimes find that you've read a bit too far. For example, if one part of
your code is supposed to read a number--a string of digits--from a file, leaving the characters after the digits on the input
stream for some other part of the program to read, the digit-reading part of the program won't know that it has read all
the digits until it has read a non-digit, at which point it's too late. (The situation recalls Dave Barry's recipe for ``food
heated up'': ``Put the food in a pot on the stove on medium heat until just before the kitchen fills with black smoke.'')
When reading characters with the standard I/O library, at least, we have an escape: the ungetc function ``un-reads'' one
character, pushing it back on the input stream for a later call to getc (or some other input function) to read. The
prototype for ungetc is
int ungetc(int c, FILE *fp)
where c is the character which is to be pushed back onto the stream fp. For example, here is a code scrap that reads
digits from a stream (and converts them to the corresponding integer), stopping at the first non-digit character and
leaving it on the input stream:
#include <ctype.h>
int c, n = 0;
while((c = getchar()) != EOF && isdigit(c))
n = 10 * n + (c - '0');
if(c != EOF)
http://www.eskimo.com/~scs/cclass/int/sx2c.html (1 of 2) [22/07/2003 5:33:36 PM]
ungetc(c, stdin);
It's only guaranteed that you can push one character back, but that's usually all you need.
The number of digits printed after the decimal point, for the floating-point formats %e, %f, and
%g; or
The maximum number of characters to be printed, for %s; or
The minimum number of digits to be printed, for the integer formats %d, %o, %x, and %u.
For example, printf("%.3s", "Hello, world!") prints Hel, and printf("%.5d", 12)
prints 00012.
Either the width or the precision (or both) can be specified as *, which indicates that the next int
http://www.eskimo.com/~scs/cclass/int/sx2e.html (1 of 4) [22/07/2003 5:33:42 PM]
argument from the argument list should be used as the field width or precision. For example,
printf("%.*f", 2, 76.54321) prints 76.54.
The flags are a few optional characters which modify the conversion in some way. They are:
- Force left adjustment, by padding (out to the field width) on the right.
0 Use 0 as the padding character, instead of a space.
space For numeric formats, if the converted number is positive, leave an extra space before it (so
that it will line up with negative numbers if printed in columns).
+ Print positive numbers with a leading + sign.
# Use an ``alternate form'' of the conversion. (The details of the ``alternate forms'' are described
below, under the individual format characters.)
The modifier specifies the size of the corresponding argument: l for long int, h for short int, L
for long double.
Finally, the format character controls the overall appearance of the conversion (and, along with the
modifier, specifies the type of the corresponding argument). We've seen many of these already. The
complete list of format characters is:
c Print a single character. The corresponding argument is an int (or, by the default argument
promotions, a char or short int).
d Print a decimal integer. The corresponding argument is an int, or a long int if the l
modifier appears, or a short int if the h modifier appears. If the number is negative, it is
preceded by a -. If the space flag appears and the the number is positive, it is preceded by a space.
If the + flag appears and the the number is positive, it is preceded by a +.
e Print a floating-point number in scientific notation: [-]m.nnnnnne[-]nn . The
corresponding argument is either a float or a double or, if the L modifier appears, a long
double. The precision gives the number of places after the decimal point; the default is 6. If the
# flag appears, a decimal point will be printed even if the precision is 0.
E Like e, but use a capital E to set off the exponent.
f Print a floating-point decimal number (mmm.nnnnnn). The corresponding argument is either a
float or a double or, if the L modifier appears, a long double. The precision gives the
number of places after the decimal point; the default is 6. If the # flag appears, a decimal point
will be printed even if the precision is 0.
g Use either e or f, whichever works best given the range and precision of the number. (Roughly
speaking, ``works best'' means to display the most precision in the least space.) If the # flag
appears, don't strip trailing 0's.
G Like g, but use E instead of e.
i Just like d.
n The corresponding argument is an int *. Rather than printing anything, %n stores the number
of characters printed so far (by this call to printf) into the integer pointed to by the
When you want to print to an arbitrary stream, instead of standard output, just use fprintf, which
takes a leading FILE * argument:
fprintf(stderr, "Syntax error on line %d\n", lineno);
Sometimes, it's useful to do some printf-like formatting, but not output the string right away. The
sprintf function is a printf variant which ``prints'' to an in-memory string rather than to a FILE *
stream. For example, one way to convert an integer to a string (the opposite of atoi) is:
int i = 123;
char string[20];
sprintf(string, "%d", i);
One thing to be careful of with sprintf, though, is that it's up to you to make sure that the destination
string is big enough.
The modifier characters are more significant. An h indicates that the corresponding integer pointer
argument (for %d, %u, %o, or %x) is a short int * or unsigned short int *. An l indicates
that the corresponding integer pointer argument (for %d, %u, %o, or %x) is a long int * or
unsigned long int *, or that the floating-point pointer argument (for %e, %f, or %g) is a
double * rather than a float *. (Similarly, an L indicates a long double *.)
The %c format will read more than one character if an explicit width greater than 1 is specified. The
corresponding argument must be a pointer to enough space to hold all the characters read.
The %e, %f, and %g formats all read strings in either scientific notation or conventional decimal fraction
m.n notation. (In other words, the three formats act just the same.) However, they assume a float *
argument unless the l modifier appears, in which case they expect a double *. (This is in contrast to
printf, which accepts either float or double arguments for %e, %f, and %g, due to the default
argument promotions.)
The %i format will read a number in decimal, octal, or hexadecimal, taking a leading 0 to indicate octal
and a leading 0x (or 0X) to indicate hexadecimal, i.e. the same rules as used by C constants.
The %n format causes the number of characters read so far (by this call to scanf) to be stored in the
integer pointed to by the corresponding argument.
The %s format will read a string, up to the next whitespace character, and copy the string, terminated by
a \0, to the corresponding argument, which must be a char *. The caller must ensure (perhaps by
using an explicit width) that there is enough space to hold the received characters.
scanf has a special format specifier %[...], which matches any string composed of characters specified
in the []. For example, %[abc] would match any string composed of a's, b's, and c's. The
corresponding argument is a char *; the matched string is written to the location pointed to, followed
by a \0. The caller must ensure (perhaps by using an explicit width) that there is enough space to hold
the received characters. A second form, %[^...], matches a string of characters not found in the set. For
example, scanf("(%[^)])", s) reads, into the string s, a string of characters (possibly including
whitespace) from an input in which the string appears enclosed in parentheses. It may also be possible to
specify ranges of characters (e.g. %[a-z], %[0-9], etc.), but these are not as portable.
With the exception of %c, %n, and %[, all of the conversion specifiers skip any leading whitespace
(spaces, tabs, or newlines) which might precede the value or string converted. Also, any whitespace
character in the format string matches any number of whitespace characters in the input. Therefore, the
format "%d %d" would match the input "12 34" or "12 34" or "12\t34". However, the format
"%d%d" would match all of these inputs as well, since the second %d first scans past any whitespace
preceding the 34.
scanf returns the number of items it successfully converts and stores. It will return a number less than
expected (less than the number of format specifiers not containing *, or less than the number of
corresponding pointer arguments) if the conversion fails at any point, and it will leave any unrecognized
characters (i.e. the ones that caused the last match to fail) waiting in the input for next time. scanf
returns EOF if it encounters end-of-file before converting anything.
If you want to read characters from an arbitrary stream, you can use fscanf, which takes an initial
FILE * argument.
You can scan and convert characters from a string (rather than from a stream) using sscanf. For
example,
int x, y;
sscanf("12 34", "%d %d", &x, &y);
would place 12 in x and 34 in y.
scanf and fscanf are seductively useful, but they have a number of drawbacks in practice. They
seem to make it very easy to, say, prompt the user for a number:
int x;
printf("Type a number:\n");
scanf("%d", &x);
But what happens if the user fumbles, and types something other than a number? Even if the code checks
scanf's return value, and prompts the user again if scanf returns 0, the non-numeric input remains on
the input, and will be encountered by the next call to scanf unless some other steps are taken. (That is,
scanf will rediscover the user's old, bad input before it gets to any new input.) It's also easy to write
things like
scanf("%d\n", &x);
but this code does not work as intended; the \n in the format string is a whitespace character, which asks
scanf to discard one or more whitespace characters, so it will keep reading characters as long as they
are whitespace characters, that is, it will read characters until it finds something that is not a whitespace
character. It won't read that eventual whitespace character once it finds it, but in the process of looking
for it it will seem to jam your program, since the call to scanf won't return right after the user types a
number.
Therefore, it's much better to read interactive user input a line at a time, and then use functions like atoi
(or perhaps sscanf) to interpret the line that the user typed.
of int might have the same value as the \n character, you would want to make sure that you had
opened the stream in binary or "wb" mode when calling fopen.
Later, you could try to read the integers in by calling
fread(array, sizeof(int), N, fp);
Similarly, if you had a variable of some structure type:
struct somestruct x;
you could write it out all at once by calling
fwrite(&x, sizeof(struct somestruct), 1, fp);
and read it in by calling
fread(&x, sizeof(struct somestruct), 1, fp);
Although this ``binary'' I/O using fwrite and fread looks easy and convenient, it has a number of
drawbacks, some of which we'll discuss in the next chapter.
/* for fopen */
/* for errno */
/* for strerror */
fp = fopen(filename, "r");
if(fp == NULL)
{
fprintf(stderr, "can't open %s for reading: %s\n",
filename, strerror(errno));
return;
}
errno is a global variable, declared in <errno.h>, which may contain a numeric code indicating the
reason for a recent system-related error such as inability to open a file. The strerror function takes an
errno code and returns a human-readable string such as ``No such file'' or ``Permission denied''.
An even more useful error message, especially for a ``toolkit'' program intended to be used in
conjunction with other programs, would include in the message text the name of the program reporting
the error.
width, the code would fail if a file ever had more than 9999 lines in it.)
Three other file-positioning functions are rewind, which rewinds a file to its beginning, and fgetpos
and fsetpos, which are like ftell and fseek except that they record positions in a special type,
fpos_t, which may be able to record positions in huge files for which even a long int might not be
sufficient.
If you're ever using one of the ``read/write'' modes ("r+" or "w+"), you must use a call to a filepositioning function (fseek, rewind, or fsetpos) before switching from reading to writing or vice
versa. (You can also call fflush while writing and then switch to reading, or reach end-of-file while
reading and then switch back to writing.)
In binary ("b") mode, the file positions returned by ftell and used by fseek are byte offsets, such
that it's possible to compute an fseek target without having to have it returned by an earlier call to
ftell. On many systems (including Unix, the Macintosh, and to some extent MS-DOS), file
positioning works this way in text mode as well. Code that relies on this isn't as portable, though, so it's
not a good idea to treat ftell/fseek positions in text files as byte offsets unless you really have to.
either the format with all the numbers on one line, or the format with one number per line. Notice that we
check fscanf's return value, to make sure that it successfully read in all the numbers we expected it to.
Since data files come in from the outside world, it's possible for them to be corrupted, and programs should
not blindly read them assuming that they're perfect. A program that crashes when it attempts to read a
damaged data file is terribly frustrating; a program that diagnoses the problem is much more polite.
We could also read the data file a line at a time, converting the text to integers via other means. If the
integers were stored one per line, we could use code like this:
#define MAXLINE 200
char line[MAXLINE];
for(i = 0; i < 10; i++)
{
if(fgets(line, MAXLINE, ifp) == NULL)
{
fprintf(stderr, "error in data file\n");
break;
}
a[i] = atoi(line);
}
(We could also use our own getline or fgetline function instead of fgets.) If the integers were
stored all on one line, we could use the getwords function from chapter 10 to separate the numbers at the
whitespace boundaries:
char *av[10];
if(fgets(line, MAXLINE, ifp) == NULL)
fprintf(stderr, "error in data file\n");
else if(getwords(line, av, 10) != 10)
fprintf(stderr, "error in data file\n");
else
{
for(i = 0; i < 10; i++)
a[i] = atoi(av[i]);
}
Suppose, now, that there were not always 10 elements in the array a; suppose we had a separate integer
variable na to record how many elements the array a currently contains. When writing the data out, we
would certainly then use a loop; we might also want to precede the data by the count, in case that will make
it easier for the reading program:
fprintf(ofp, "%d\n", na);
for(i = 0; i < na; i++)
http://www.eskimo.com/~scs/cclass/int/sx3a.html (3 of 13) [22/07/2003 5:34:01 PM]
}
na = atoi(line);
if(na > 10)
{
fprintf(stderr, "too many items in data file\n");
return;
}
for(i = 0; i < na; i++)
{
if(fgets(line, MAXLINE, ifp) == NULL)
{
fprintf(stderr, "error in data file\n");
return;
}
a[i] = atoi(line);
}
Or, if the data were all on one line, like this:
int ac;
char *av[11];
if(fgets(line, MAXLINE, ifp) == NULL)
{
fprintf(stderr, "error in data file\n");
return;
}
ac = getwords(line, av,
if(ac < 1)
{
fprintf(stderr,
return;
}
na = atoi(av[1]);
if(na > 10)
{
fprintf(stderr,
return;
}
if(na != ac - 1)
{
fprintf(stderr,
return;
10);
}
for(i = 0; i < na; i++)
a[i] = atoi(av[i+1]);
But sometimes, you don't need to save the count (na) explicitly; the reading program can deduce the number
of items from the number of items in the file. If the file contains only the integers in this array, then we can
simply read integers until we reach end-of-file. For example, using fscanf:
na = 0;
while(na < 10 && fscanf(ifp, "%d", &a[na]) == 1)
na++;
(This code is deceptively simple; we haven't carefully dealt with appropriate error messages for a data file
with more than 10 values, or a data file with a non-numeric ``value'' for which fscanf returns 0.)
Again, we could also use fgets. If the data is on separate lines:
na = 0;
while(na < 10 && fgets(line, MAXLINE, ifp) != NULL)
a[na++] = atoi(line);
If the data is all on one line:
if(fgets(line, MAXLINE, ifp) == NULL)
{
fprintf(stderr, "error in data file\n");
return;
}
na = getwords(line, av, 10);
if(na > 10)
{
fprintf(stderr, "too many items in data file\n");
return;
}
for(i = 0; i < na; i++)
a[i] = atoi(av[i]);
Notice that this last implementation does not require that the file consist of only data for the array a. One line
of the file consists of data for the array a, but other lines of the file could contain other data.
We could also scatter a's data on multiple lines, without using an explicit count, and with the ability for the
file to contain other data as well, if we marked the end of the array data with an explicit marker in the file,
rather than assuming that the array's data continued until end-of-file. For example, we could write the data
To write an instance of this structure out, we could simply print its fields on one line:
struct s x;
...
fprintf(ofp, "%d %g %s\n", x.i, x.f, x.s);
or on several lines:
fprintf(ofp, "%d\n", x.i);
fprintf(ofp, "%g\n", x.f);
fprintf(ofp, "%s\n", x.s);
or simply
fprintf(ofp, "%d\n%g\n%s\n", x.i, x.f, x.s);
(We use %g format for the float field because %g tends to print the most accurate representation in the
smallest space, e.g. 1.23e6 instead of 1230000 and 1.23e-6 instead of 0.00000123 or 0.000001.)
To read this structure back in, we could again either use fscanf, or fgets and some other functions. As
before, fscanf seems easier:
if(fscanf(ifp, "%d %g %s", &x.i, &x.f, &x.s) != 3)
{
fprintf(stderr, "error in data file\n");
return;
}
Here we have a problem, though: what if the third, string field contains a space? In the scanf family, the
%s format stops reading at whitespace, so if x.s had contained the string "Hello, world!", it would be
read back in as "Hello,". As it happens, we could fix it by using the less-obvious format string "%d %g
%[^\n]", where %[^\n] means ``match any string of characters not including \n''. But we also have
another problem: what if the string is longer than the 20 characters we allocated for the s field? We could fix
this by using %20s or %20[^\n], although we'd have to remember to change the scanf format string if
we ever changed the size of the array.
Let's leave fscanf for a moment and look at our other alternatives. If we'd printed the data all on one line,
we could use
#include <stdlib.h>
/* for atof() */
char *av[3];
message along the lines of ``this is a new data file which I am too old to read'', rather than printing the
(misleading, in this case) ``error in data file'' (or crashing).
Another technique which can be immensely useful and which we'll explore next is to define a data file
format in such a way that the overall format doesn't change even if new data is added to it.
It's easy to see why the simple data file fragments we've been looking at so far are not resilient in the face of
newly-introduced data fields. In the case of struct s, the reader always assumed that the first field in the
data file was i, the second field was f, and the third field was s. If we ever add any new fields, unless we're
careful to add them at the end of the file (and lucky on top of that), the simpleminded reader will get
confused.
One powerful way of getting around this problem is to tag each piece of data in the file, so that the reader
knows unambiguously what it is. For example, suppose that we wrote instances of our struct s out like
this:
fprintf(ofp, "i %d\n", x.i);
fprintf(ofp, "f %g\n", x.f);
fprintf(ofp, "s %s\n", x.s);
Now, each line begins with a little code which identifies it. (The code in the data file happens to match the
name of the corresponding structure member, but that's not necessary, nor is there any way of getting the
compiler to make any correspondence automatically.)
If we simply modified one of our previous file-reading code fragments to read this new, tagged format, we
might quickly end up with a mess. We'd be continually checking the tag on the line we just read against the
tag we expected to read, and constantly printing error messages or trying to resynchronize. But in fact, there's
no reason to expect the lines to come in a certain order, and it turns out that it's easier to read such a file a
line at a time, without that assumption, taking each line as it comes and not worrying what order the lines
come in. Here is how we might do it:
x.i = 0; x.f = 0.0; x.s[0] = '\0';
while(fgets(line, MAXLINE, ifp) != NULL)
{
if(*line == '#')
continue;
ac = getwords(line, av, 2);
if(ac == 0)
continue;
if(strcmp(av[0], "i") == 0)
x.i = atoi(av[1]);
else if(strcmp(av[0], "f") == 0)
x.f = atof(av[1]);
http://www.eskimo.com/~scs/cclass/int/sx3a.html (11 of 13) [22/07/2003 5:34:01 PM]
Instead, we quietly ignore lines we don't recognize! This strategy is admittedly on the simpleminded side,
and it would not be adequate under all circumstances, but it means that an old program can read a new data
file containing fields it's never heard of. The old program will still be able to pluck out the data it does
recognize and can use, while (deliberately) ignoring the (new) data it doesn't know about.
This code is not perfect. We still have the same sorts of problems with that string field, s: it might contain
spaces, which we get around (this time) by calling getwords with a second argument of 2, so that all but
the first word on the line end up ``in'' av[1]. Also, the code does not check to see that there actually was a
second word on the line before using it to set x.i, x.f, or x.s. (In this case, we could fix that by
complaining if getwords did not return 2.)
Finally, we still have the potential for overflow, and we might as well grit our teeth now and figure out how
to fix it. Since we already initialized x.s to the empty string with the assignment x.s[0] = '\0', one
way around the problem is to replace the call to strcpy with a call to strncat:
...
else if(strcmp(av[0], "s") == 0)
strncat(x.s, av[1], 19);
(or, again, perhaps strncat(x.s, av[1], sizeof(x.s)-1)). The strcat and strncat
functions are slightly misleadingly named: what they actually do is append the second string you hand them
(i.e. the second argument) to the first, in place. In the case of strncat, it never copies more than n
characters, where n is its third argument, although it does always append a \0, which is why we tell it to
copy at most 19 characters, not 20. (Since x1.s starts out empty, there's definitely room for 19, although we
would still have to worry about the possibility of a corrupted data file which contained two s lines. You
might wonder why we couldn't simply use strncpy, but it turns out that, for obscure historical reasons,
strncpy does not always append the \0.)
Although it has a few imperfections (which are easily remedied, and are left as exercises) this last example
(using fgets, getwords, and an if/strcmp/else... chain) is an excellent basis for a flexible, robust
data file reader.
One footnote about the troublesome string field, s: to get around the problem of fixed-size arrays, you might
one day decide to declare the s field of struct s as a pointer rather than a fixed-size array. You would
have to be careful while reading, however. It might seem that you could just write, for example,
x.s = av[1];
but this would not work; remember that whenever you use pointers you have to worry about memory
allocation. If you assigned x.s in that way, where would be the memory that it points to? It would be
wherever av[1] points, which is back into the line array. Not only is that (probably) a local array, valid
only while the file-reading functions are active, but it's also overwritten with each new line in the data file.
You'll obviously want x.s to retain a useful pointer value pointing to the text read from the file, which
means that you'll still have to make a copy, after allocating some memory. In this case, you might do
x.s = malloc(strlen(av[1]) + 1);
if(x.s == NULL)
{ fprintf(stderr, "out of memory\n"); return; }
strcpy(x.s, av[1]);
To some extent, the problems we've been having with field s are fundamental. In particular, any time you
use text formats which are based on whitespace-separated ``words,'' string fields which might contain spaces
are always tricky to handle.
these files have is that if the data is for any reason sensitive, it will certainly be a bit better concealed
from prying eyes.
We can get around these disadvantages of binary data files, but in so doing we'll lose many of the
advantages, such as blinding efficiency or programmer convenience. If we care about data file portability
or backwards or forwards compatibility, we will have to write structures one field at a time, not in one
fell swoop. Furthermore, if we have an int to write, we may choose not to write it using fwrite:
fwrite(&i, sizeof(int), 1, fp);
but rather a byte at a time, using putc:
putc(i / 256, fp);
putc(i % 256, fp);
In this way, we'd have precise control over the order in which the two halves of the int are being
written. (We're assuming here that there's no more than two bytes' worth of data in the int, which is a
safe assumption if we're portably assuming that ints can only hold up to +-32767.) When it came time
to read the int back in, we might do that a byte at a time, too:
i = getc(fp);
i = 256 * i + getc(fp);
(We could not collapse this to i = 256 * getc(fp) + getc(fp), because we wouldn't know
which order the two calls to getc would occur in.)
We might also choose to use tags to mark the various ``fields'' within our binary data file; the fields
would be more likely to be byte codes such as 0x00, 0x01, and 0x02 than the character or string codes
we used in the tagged text data file of the previous section.
If you do choose to use binary data files, you must open them for writing with fopen mode "wb" and
for reading with "rb" (or perhaps one of the + modes; the point is that you do need the b). Remember
that, in the default mode, the standard I/O functions all assume text files, and translate between \n and
the operating system's end-of-line representation. If you try to read or write a binary data file in text
mode, whenever your internal data happens to contain a byte which matches the code for \n, or your
external data happens to contain bytes which match the operating system's end-of-line representation,
they may be translated out from under you, screwing up your data.
18.1: Types
18.1: Types
So far, we've seen the basic types char, int, long int, float, and double. This section
introduces the last few basic types: void, short int, long double, and the unsigned types.
Also, we'll meet storage classes, typedef, and the type qualifiers const and volatile.
18.1.1: void
18.1.2: short int
18.1.3: unsigned integers
18.1.4: long double
18.1.5: Storage Classes
18.1.6 Type Definitions (typedef)
18.1.7: Type Qualifiers
18.1.1: <TT>void</TT>
18.1.1: void
From time to time we have seen the keyword void lurking in various programs and code samples.
void is sort of a ``placeholder'' type, used in circumstances where you need a type name but there's not
really any one right type to use. Formally, we can say that void is a type with no values.
There are three main uses of the void type:
1. As the return type of a function which does not return a value. A function declared as
void f()
is declared as ``void'' or ``returning void'' which actually means that it returns no value. The
compiler will not complain if you ``fall off the end'' of a void function without executing a
return statement; the compiler will complain if you execute a return statement that specifies a
value to be returned. (As far as the low-level syntax of the return statement is concerned, the
expression is optional; but the expression is required in functions that return values and
disallowed in void functions.)
2. As the argument list in the prototype of a function that accepts no parameters. In a function
definition such as
int f()
{
...
}
the empty parentheses indicate that the function accepts no parameters. But (for historical
reasons) in an external function prototype declaration such as
extern int f();
the empty parentheses indicate that we don't know how many (or what type of) parameters the
function accepts. In either case, we can make the fact that the function accepts zero parameters
explicit by using the keyword void in the parameter list:
extern int f(void);
int f(void)
{
...
}
18.1.1: <TT>void</TT>
For obvious reasons, if void appears in a parameter list, it must be the first and only parameter,
and it must not declare an argument name. (That is, prototypes like int f(int, void) and
int f(void x) are meaningless and illegal.)
3. As a pointer type, to indicate a ``generic pointer'' which might in fact point to any type. We need
``generic pointers'' when we're using functions like malloc. malloc returns a pointer to n bytes
of memory, which we may use as any type of pointer we wish. Normally, it is an error to use a
value of one pointer type where another pointer type is required. For example, the fragments
int i;
double *dp = &i;
/* WRONG */
/* WRONG */
and
would both generate errors, because you can't assign back and forth between int pointers and
double pointers. However, the type void * (``pointer to void'') is special: it is legal to assign
a value of type pointer-to-void to a variable of some other pointer type, and vice versa. (In case
the pointer types have different sizes or representations, the compiler will automatically perform
conversions. We'll say more about type conversions, including pointer type conversions, in a later
section.) So the lines
#include <stdlib.h>
char *cp = malloc(10);
int *ip = malloc(sizeof(int));
double *dp = malloc(sizeof(double));
are all legal, since <stdlib.h> declares malloc as returning void *, indicating that an
assignment of malloc's return value to any pointer type is permissible.
unsigned
unsigned
unsigned
unsigned
char
0 - 255
short int
0 - 65535
int
0 - 65535
long int
0 - 4294967295
These multiword type names can also be abbreviated. Instead of writing long int, you can write
long. Instead of writing short int, you can write short. Instead of writing unsigned int, you
can write unsigned. Instead of writing unsigned long int, you can write unsigned long.
Instead of writing unsigned short int, you can write unsigned short.
In the absence of the unsigned keyword, types short int, int, and long int are all signed.
However, depending on the particular compiler you're using, plain type char might be signed or
unsigned. If you need an explicitly signed character type, you can use signed char.
Using typedef, however, we can introduce a single-word complex type, after all:
typedef struct complex complextype;
complextype c1, c2;
It's also possible to define a structure and a typedef tag for it at the same time:
typedef struct complex
{
double real;
double imag;
} complextype;
Furthermore, when using typedef names, you may not need the structure tag at all; you can also write
typedef struct
{
double real;
double imag;
} complextype;
(At this point, of course, you culd use the cleaner name ``complex'' for the typedef, instead of
``complextype''. Actually, it turns out that you could have done this all along. Structure tags and
typedef names share separate namespaces, so the declaration
typedef struct complex
{
double real;
double imag;
} complex;
is legal, though possibly confusing.)
Defining new type names is done mostly for convenience, or to make the code more self-documenting, or
to make it possible to change the actual base type used for a lot of variables without rewriting the
declarations of all those variables.
A typedef declaration is a little bit like a preprocessor #define directive. We could imagine writing
#define count int
#define string char *
in an attempt to accomplish the same thing. This won't work nearly as well, however: given the macro
definition, the line
string string1, string2;
would expand to
char * string1, string2;
which would declare string1 as a char * but string2 as a plain char. The typedef declaration,
however, would work correctly.
Some programmers capitalize typedef names to make them stand out a little better, and others use the
convention of ending all typedef names with the characters ``_t''.
different locations), the locations pointed to (that is, *ci1 and *ci2) can not be modified. The
declaration
int * const cp;
on the other hand, declares a pointer which cannot be modified (it cannot be set to point anywhere else),
although the value it points to (*cp) can be modified.
Pointers to constants (such as ci1 and ci2 above) have a particularly important use: they can be used to
document (and enforce) pointer parameters which a function promises not to use to modify locations in
the caller.
Normally, C uses pass-by-value. A function receives copies of its arguments, which means that it cannot
modify any variables in the caller (since copies of those variables were passed). If a function receives a
pointer, however (including the pointer that results when the caller seems to ``pass'' an array), it can use
that pointer (more precisely, it can use its copy of the pointer) to modify locations in the caller.
Sometimes, this is just what is desired: when the caller ``passes'' an array which it wishes the function to
fill in, or when the function wants to return one or more values via pointers rather than as the
conventional return value, the function's modification of locations in the caller is deliberate and
understood by the caller. However, when a function receives a pointer argument for some other reason,
under circumstances in which the caller might not want the function to use the pointer to modify
anything in the caller, the caller might appreciate a guarantee that the pointer (within the function) won't
be used to modify anything. To make that guarantee, the function can declare the pointer as pointer-toconst.
For example, our old friend printf never scribbles on the string it's given as its format argument; it
merely uses it to decide what to print. Therefore, the prototype for printf is
int printf(const char *fmt, ...)
where the ... represents printf's optional arguments. If a caller writes something like
char mystring[] = "Hello, world!\n";
printf(mystring);
it knows, from printf's prototype, that printf won't be scribbling on mystring. Furthermore, with
that prototype for printf in scope, the actual author of the printf code couldn't accidentally write a
(buggy) version which inadvertently modified the format argument--since it's declared as const char
*, the compiler will complain if any attempt is made to write to the location(s) it points to.
const and volatile can also be used in combination. Theoretically, it's possible to have a single
variable which is both:
http://www.eskimo.com/~scs/cclass/int/sx4ga.html (2 of 3) [22/07/2003 5:34:17 PM]
~ 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0
------------------------------1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 1
The << operator shifts its first operand left by a number of bits given by its second operand, filling in
new 0 bits at the right. Similarly, the >> operator shifts its first operand right. If the first operand is
unsigned, >> fills in 0 bits from the left, but if the first operand is signed, >> might fill in 1 bits if the
high-order bit was already 1. (Uncertainty like this is one reason why it's usually a good idea to use all
unsigned operands when working with the bitwise operators.) For example, 0x56 << 2 is 0x258:
0 1 0 1 0 1 1 0 << 2
------------------0 1 0 1 0 1 1 0 0 0
And 0x56 >> 1 is 0x3b:
0 1 0 1 0 1 1 0 >> 1
--------------0 1 0 1 0 1 1
For both of the shift operators, bits that scroll ``off the end'' are discarded; they don't wrap around.
(Therefore, 0x56 >> 3 is 0x0a.)
The bitwise operators will make more sense if we take a look at some of the ways they're typically used.
We can use & to test if a certain bit is 1 or not. For example, 0x56 & 0x40 is 0x40, but 0x32 &
0x40 is 0x00:
0 1 0 1 0 1 1 0
& 0 1 0 0 0 0 0 0
--------------0 1 0 0 0 0 0 0
0 0 1 1 0 0 1 0
& 0 1 0 0 0 0 0 0
--------------0 0 0 0 0 0 0 0
Since any nonzero result is considered ``true'' in C, we can use an expression involving & directly to test
some condition, for example:
if(x & 0x04)
do something ;
(If we didn't like testing against the bitwise result, we could equivalently say if((x & 0x04) !=
0) . The extra parentheses are important, as we'll explain below.)
Notice that the value 0x40 has exactly one 1 bit in its binary representation, which makes it useful for
http://www.eskimo.com/~scs/cclass/int/sx4ab.html (2 of 6) [22/07/2003 5:34:23 PM]
testing for the presence of a certain bit. Such a value is often called a bit mask. Often, we'll define a series
of bit masks, all targeting different bits, and then treat a single integer value as a set of flags. A ``flag'' is
an on-off, yes-no condition, so we only need one bit to record it, not the 16 or 32 bits (or more) of an
int. Storing a set of flags in a single int does more than just save space, it also makes it convenient to
assign a set of flags all at once from one flag variable to another, using the conventional assignment
operator =. For example, if we made these definitions:
#define
#define
#define
#define
#define
DIRTY
OPEN
VERBOSE
RED
SEASICK
0x01
0x02
0x04
0x08
0x10
we would have set up 5 different bits as keeping track of those 5 different conditions (``dirty,'' ``open,''
etc.). If we had a variable
unsigned int flags;
which contained a set of these flags, we could write tests like
if(flags & DIRTY)
{ /* code for dirty case */ }
if(!(flags & OPEN))
{ /* code for closed case */ }
if(flags & VERBOSE)
{ /* code for verbose case */ }
else
{ /* code for quiet case */ }
A condition like if(flags & DIRTY) can be read as ``if the DIRTY bit is on''.
These bitmasks would also be useful for setting the flags. To ``turn on the DIRTY bit,'' we'd say
flags = flags | DIRTY;
How would we ``turn off'' a bit? The way to do it is to leave on every bit but the one we're turning off, if
they were on already. We do this with the & and ~ operators:
flags = flags & ~DIRTY;
This may be easier to see if we look at it in binary. If the DIRTY, RED, and SEASICK bits were already
on, flags would be 0x19, and we'd have
0 0 0 1 1 0 0 1
& 1 1 1 1 1 1 1 0
--------------0 0 0 1 1 0 0 0
As you can see, both the | operator when turning bits on and the & (plus ~) operator when turning bits
off have no effect if the targeted bit were already on or off, respectively.
The definition of the exclusive-OR operator means that you can use it to toggle a bit, that is, to turn it to
1 if it was 0 and to 0 if it was one:
flags = flags ^ VERBOSE;
It's common to use the ``op='' shorthand forms when doing all of these operations:
flags |= DIRTY;
flags &= ~OPEN;
flags ^= VERBOSE;
We can also use the bitwise operators to extract subsets of bits from the middle of an integer. For
example, to extract the second-to-last hexadecimal ``digit,'' we could use
(i & 0xf0) >> 4
If i was 0x56, we have:
i
& 0x56
0 1 0 1 0 1 1 0
& 1 1 1 1 0 0 0 0
--------------0 1 0 1 0 0 0 0
and shifting this result right by 4 bits gives us 0 1 0 1, or 5, as we wished. Replacing (or overwriting)
a subset of bits is a bit more complicated; we must first use & and ~ to clear all of the destination bits,
then use << and | to ``OR in'' the new bits. For example, to replace that second-to-last hexadecimal digit
with some new bits, we might use:
(i & ~0xf0) | (newbits << 4)
| (newbits << 4)
0 1 0 1 0 1 1 0
& 0 0 0 0 1 1 1 1
--------------0 0 0 0 0 1 1 0
| 0 1 1 0 0 0 0 0
--------------0 1 1 0 0 1 1 0
/* WRONG */
/* WRONG */
and
anyway (that is, if you write x * 4, the compiler might generate a left shift instruction all by itself),
they're not always as readable, and they're not always correct for negative numbers.
The issue of negative numbers, by the way, explains why the right-shift operator >> is not precisely
defined when the high-order bit of the value being shifted is 1. For signed values, if the high-order bit is a
1, the number is negative. (This is true for 1's complement, 2's complement, and sign-magnitude
representations.) If you were using a right shift to implement division, you'd want a negative number to
stay negative, so on some computers, under some compilers, when you shift a signed value right and the
high-order bit is 1, new 1 bits are shifted in at the left instead of 0s. However, you can't depend on this,
because not all computers and compilers implement right shift this way. In any case, shifting negative
numbers to the right (even if the high-order 1 bit propagates) gives you an incorrect answer if there's a
remainder involved: in 2's complement, 16-bit arithmetic, -15 is 0xfff1, so -15 >> 1 might give you
0xfff8shifted which is -8. But integer division is supposed to discard the remainder, so -15 / 2
would have given you -7. (If you're having trouble seeing the way the shift worked, 0xfff1 is
1111111111110001<sub>2</sub> and 0xfff8 is 1111111111111000<sub>2</sub>. The loworder 1 bit got shifted off to the right, but because the high-order bit was 1, a 1 got shifted in at the left.)
or
f = (float)i / (float)j;
It's sufficient to use a cast on one of the operands, but it certainly doesn't hurt to cast both.
A similar situation is
int i = 32000, j = 32000;
long int li;
li = i + j;
An int is only guaranteed to hold values up to 32,767. Here, the result i + j is 64,000, which is not
guaranteed to fit into an int. Even though the eventual destination is a long int, the compiler does not
look ahead to see this. The addition is performed using int arithmetic, and it may overflow. Again, the
solution is to use a cast to explicitly convert one of the operands to a long int:
li = (long int)i + j;
Now, since one of the operands is a long int, the addition is performed using long int arithmetic,
and does not overflow.
Cast operators do not have to involve simple types; they can also involve pointer or structure or more
complicated types. Once upon a time, before the void * type had been invented, malloc returned a
char *, which had to be converted to the type you were using. For example, one used to write things
like
int *iarray = (int *)malloc(100 * sizeof(int));
and
struct list *lp = (struct list *)malloc(sizeof(struct list));
These casts are not necessary under an ANSI C compiler (because malloc returns void * which the
compiler converts automatically), but you may still see them in older code.
also contains a comma operator, and again, performs first i++ and then j--.
Precisely stated, the meaning of the comma operator in the general expression
e1 , e2
is ``evaluate the subexpression e1, then evaluate e2; the value of the expression is the value of e2.''
Therefore, e1 had better involve an assignment or an increment ++ or decrement -- or function call or
some other kind of side effect, because otherwise it would calculate a value which would be discarded.
There's hardly any reason to use a comma operator anywhere other than in the first and third controlling
expressions of a for loop, and in fact most of the commas you see in C programs are not comma
operators. In particular, the commas between the arguments in a function call are not comma operators;
they are just punctuation which separate several argument expressions. It's pretty easy to see that they
cannot be comma operators, otherwise in a call like
printf("Hello, %s!\n", "world");
the action would be ``evaluate the string "Hello, %s!\n", discard it, and pass only the string
"world" to printf.'' This is of course not what we want; we expect both strings to be passed to
printf as two separate arguments (which is, of course, what happens).
func(a, b + 1, c + d, f, (g + h + i) / 2);
func(a, b + 1, c + d, 0, (g + h + i) / 2);
We could write this more compactly, more readably, and more safely (it's easier both to see and to
guarantee that the other arguments are always the same) by writing
func(a, b + 1, c + d, e ? f : 0, (g + h + i) / 2);
(The obscure name ``ternary,'' by the way, comes from the fact that the conditional operator is neither
unary nor binary; it takes three operands.)
18.3.1: <TT>switch</TT>
18.3.1: switch
[This section corresponds to K&R Sec. 3.4]
A frequent sort of pattern is exemplified by the sequence
if(x == e1)
/* some code */
else if(x == e2)
/* other code */
else if(x == e3)
/* some more code */
else if(x == e4)
/* yet more code */
else
/* default code */
Depending on the value of x, we have one of several chunks of code to execute, which we select with a long
if/else/if/else... chain. When the value we're selecting on is an integer, and when the values we're selecting
among are all constant, we can use a switch statement, instead. The switch statement evaluates an expression
and matches the result against a series of ``case labels''. The code beginning with the matching case label (if
any) is executed. A switch statement can also have a default case which is executed if none of the explicit
cases match.
A switch statement looks like this:
switch( expr )
{
case c1 :
... code
break;
case c2 :
... code
break;
case c3 :
... code
break;
...
default:
... code
break;
}
...
...
...
...
The expression expr is evaluated. If one of the case labels (c1, c2, c3, etc., which must all be integral constants)
matches, execution jumps to there, and continues until the next break statement. Otherwise, if there is a
default label, execution jumps to there (and continues to the next break statement). Otherwise, none of the
http://www.eskimo.com/~scs/cclass/int/sx4ac.html (1 of 4) [22/07/2003 5:34:37 PM]
18.3.1: <TT>switch</TT>
code in the switch statement is executed. (Yes, the break statement is also used to break out of loops. It breaks
out of the nearest enclosing loop or switch statement it finds itself in.)
The switch statement only works on integral arguments and expressions (char, the various sizes of int, and
enums, though we haven't met enums yet). There is no direct way to switch on strings, or on floating-point values.
The target case labels must be specified explicitly; there is no general way to specify a case which corresponds to
a range of values.
One peculiarity of the switch statement is that the break at the end of one case's block of code is optional. If
you leave it out, control will ``fall through'' from one case to the next. Occasionally, this is what you want, but
usually not, so remember to put a break statement after most cases. (Since falling through is so rare, many
programmers highlight it, when they do mean to use it, with a comment like /* FALL THROUGH */, to indicate
that it's not a mistake.) One way to make use of ``fallthrough'' is when you have a small set or range of cases which
should all map to the same code. Since the case labels are just labels, and since there doesn't have to be a
statement immediately following a case label, you can associate several case labels with one block of code:
switch(x)
{
case 1:
... code ...
break;
case 2:
... code ...
break;
case 3:
case 4:
case 5:
... code ...
break;
default:
... code ...
break;
}
Here, the same chunk of code is executed when x is 3, 4, or 5.
The case labels do not have to be in any particular order; the compiler is smart enough to find the matching one if
it's there. The default case doesn't have to go at the end, either.
It's common to switch on characters:
switch(c)
{
case '+':
/* code for + */
break;
http://www.eskimo.com/~scs/cclass/int/sx4ac.html (2 of 4) [22/07/2003 5:34:37 PM]
18.3.1: <TT>switch</TT>
case '-':
/* code for - */
break;
case '\n':
/* code for newline */
/* FALL THROUGH */
case ' ':
case '\t':
/* code for other whitespace */
break;
case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
/* code for digits */
break;
default:
/* code for all other characters */
break;
}
It's also common to have a set of #defined values, and to switch on those:
#define
#define
#define
#define
APPLE
ORANGE
CHERRY
BROCCOLI
1
2
3
4
...
switch(fruit)
{
case APPLE:
printf("turnover"); break;
case ORANGE:
printf("marmalade"); break;
case CHERRY:
printf("pie"); break;
case BROCCOLI:
printf("wait a minute... that's not a fruit"); break;
}
Read sequentially:
prev
18.3.1: <TT>switch</TT>
next
up
top
18.3.2: <TT>do</TT>/<TT>while</TT>
18.3.2: do/while
[This section corresponds to K&R Sec. 3.6]
Briefly stated, a do/while loop is like a while loop, except that the body of the loop is always
executed at least once, even if the condition is initially false. We'll motivate the usefulness of this loop
with a slightly long example.
We know that the digit character '1' is not the same as the int value 1, and that the string "123" is
not the same as the int value 123. We've learned that the atoi function will convert a string
(containing digits) to the corresponding integer, and that we can use the sprintf function to generate a
string of digits corresponding to an integer. Now let's see how we could convert an integer to a string of
digits by hand, if for some reason we couldn't use sprintf but had to do it ourselves.
If the number were less than 10 and not negative, it would be easy. Since we know that the digit
characters '0' to '9' have consecutive character set values, the expression i + '0' gives the
character corresponding to i's value if i is an integer between 0 and 9, inclusive. So our very first stab at
an integer-to-string routine, which would only work for one-digit numbers, might look like this:
char string[2];
string[0] = i + '0';
string[1] = '\0';
(Remember, the null character \0 is required to terminate strings in C.)
The limitation to single-digit numbers is obviously not acceptable. Suppose we went a little further, and
arranged to handle numbers less than 100, by using an if statement to choose between the 1-digit case
and the 2-digit case:
char string[3];
if(i < 10)
{
string[0]
string[1]
}
else
{
string[0]
string[1]
string[2]
}
= i + '0';
= '\0';
= (i / 10) + '0';
= (i % 10) + '0';
= '\0';
In the two-digit case, the subexpression i % 10 gives us the value of the low-order (1's) digit of the
http://www.eskimo.com/~scs/cclass/int/sx4bc.html (1 of 3) [22/07/2003 5:34:38 PM]
18.3.2: <TT>do</TT>/<TT>while</TT>
18.3.2: <TT>do</TT>/<TT>while</TT>
while( expression );
The statement is almost always a brace-enclosed block of statements, because a do/while loop without
braces looks odd even if there's only one statement in the body. Notice that there is a semicolon after the
close parenthesis after the controlling expression.
Using a do/while loop, we can write our final version of the integer-to-string converter:
char string[25];
int j = 24;
string[j] = '\0';
do
{
string[--j] = (i % 10) + '0';
i /= 10;
} while(i > 0);
This version is now almost perfect; its only deficiency is that it has no provision for negative numbers.
C's do/while loop is analogous to the repeat/until loop in Pascal. (C's while loop, on the other
hand, is like Pascal's while/do loop.)
18.3.3: <TT>goto</TT>
18.3.3: goto
[This section corresponds to K&R Sec. 3.8]
Finally, C has a goto statement, and labels to go to. You will hear many people disparage goto
statements, and without taking a stand on whether they are inherently evil, I will say that most code can
be written, and written pretty cleanly, without them.
You can attach a label to any statement:
statement1;
label1: statement2;
statement3;
label2: statement4;
statement5;
A label is simply an identifier followed by a colon. The names for labels follow the same rules as for
variables and other identifiers (they consist of letters, digit characters, and underscores, generally
beginning with a letter, in any case not beginning with a digit). A label can in principle have the same
name as a variable; variables and labels are quite distinct, so the compiler can keep them separate. Label
names must obviously be unique within a function.
Anywhere you want, you can say
goto label ;
where label is one of the labels in the current function, and control flow will jump to that point. You can
only jump around within the same function; you can't go to a label in some other function. (If goto isn't
enough for you, if for some reason you must jump back to another function, there's a library function,
longjmp, which will do so under certain circumstances.) If you jump into a brace-enclosed block of
statements, and if that block has local, automatic variables which have initializers (that is, attached to
their declarations), the initializers will not take effect. (In other words, initializers for block-local
variables take effect only when the block is entered normally, from the top.)
For the vast majority of functions, the more ``structured'' if/else, switch, and loop statements,
perhaps augmented with break and continue statements, will accomplish the required control flow,
in a clean and obvious way. (Without practice, it may not always be immediately obvious how to
structure a piece of code as, say, a clean loop, but the more important point is that once it's structured as a
clean loop, it will be more obvious to a later reader what it's doing.) Remember, too, that function calls
and return statements also accomplish control flow (analogous to the GOSUB in BASIC) in a clean
and structured way. The complaint about goto is that when it's used without restraint, a tangled mess of
unwieldy ``spaghetti code'' often results. There are really only two times you ever need to use a goto
http://www.eskimo.com/~scs/cclass/int/sx4cc.html (1 of 2) [22/07/2003 5:34:40 PM]
18.3.3: <TT>goto</TT>
char *itoa(int n)
{
static char retbuf[25];
sprintf(retbuf, "%d", n);
return retbuf;
}
Now, the retbuf array does not disappear when itoa returns, so the pointer is still valid by the time
the caller uses it.
Returning a pointer to a static array is a practical and popular solution to the problem of ``returning''
an array, but it has one drawback. Each time you call the function, it re-uses the same array and returns
the same pointer. Therefore, when you call the function a second time, whatever information it
``returned'' to you last time will be overwritten. (More precisely, the information, that the function
returned a pointer to, will be overwritten.) For example, suppose we had occasion to save the pointer
returned by itoa for a little while, with the intention of using it later, after calling itoa again in the
meantime:
int i = 23;
char *p1, *p2;
p1 = itoa(i);
i = i + 10;
char *p2 = itoa(i);
printf("old i = %s, new i = %s\n", p1, p2);
But this won't work as we expect--the second call to itoa will overwrite the string (stored in itoa's
static retbuf array) which was stored by the first call. Instead of printing i's old and new value, the
last line will print the new value, twice. Both p1 and p2 will point to the same place, to the retbuf
array down inside itoa, because each call to itoa always returns the same pointer to that same array.
We can see the same problem in an even simpler example. Suppose we had never heard of the %d format
specifier in printf. We might try to call something like this:
printf("i = %s, j = %s\n", itoa(i), itoa(j));
where i and j are two different int variables. What will happen? Either the compiler will make the first
call to itoa first, or the second. (It turns out that it's not specified which order the compiler will use;
different compilers behave differently in this respect.) Whichever call to itoa happens second will be
the one that gets to keep its return value in retbuf. The printf call will either print i's value twice,
or j's value twice, but it won't be able to print two distinct values.
The moral is that although the static return array technique will work, the caller has to be a little bit
http://www.eskimo.com/~scs/cclass/int/sx5.html (2 of 5) [22/07/2003 5:34:43 PM]
careful, and must never expect the return pointer from one call to the function to be usable after a later
call to the function. Sometimes this restriction is a real problem; other times it's perfectly acceptable.
(Some of the functions in the standard C library use this technique; one example is ctime, which
converts timestamp values to printable strings. When you see a cryptic sentence like ``The returned
pointer is to static data which is overwritten with each call'' in the documentation for a library function, it
means that the function is using this technique.) When this restriction would be too onerous on the caller,
we should use one of the other two techniques, described next.
If the function can't use a local or local static array to hold the return value, the next option is to have
the caller allocate an array, and use that. In this case, the function accepts at least one additional
argument (in addition to any data to be operated on): a pointer to the location to write the result back to.
Our familiar getline function has worked this way all along. If we rewrote itoa along these lines, it
might look like this:
char *itoa(int n, char buf[])
{
sprintf(buf, "%d", n);
return buf;
}
Now the caller must pass an int value to be converted and an array to hold the converted result:
int i = 23;
char buf[25];
char *str = itoa(i, buf);
There are two differences between this version of itoa and our old getline function. (Well, three,
really; of course the two functions do totally different things.) One difference is that getline accepted
another extra argument which was the size of the array in the caller, so that getline could promise not
to overflow that array. Our latest version of itoa does not accept such an argument, which is a
deficiency. If the caller ever passes an array which is too small to hold all the digits of the converted
integer, itoa (actually, sprintf) will sail off the end of the array and scribble on some other part of
memory. (Needless to say, this can be a disaster.)
Another difference is that the return value of this latest version of itoa isn't terribly useful. The pointer
which this version of itoa returns is always the same as the pointer you handed it. Even if this version
of itoa didn't return anything as its formal return value, you could still get your hands on the string it
created, since it would be sitting right there in your own array (the one that you passed to itoa). In the
case of getline, we had a second thing to return as the formal return value, namely the length of the
line we'd just read.
However, this second strategy is also popular and workable. Besides our own getline function, the
http://www.eskimo.com/~scs/cclass/int/sx5.html (3 of 5) [22/07/2003 5:34:43 PM]
standard library functions fgets and fread both use this technique.
When the limit of a single static return array within the function would be unacceptable, and when it
would be a nuisance for the caller to have to declare or otherwise allocate return arrays, a third option is
for the function to dynamically allocate some memory for the returned array by calling malloc. Here is
our last version of itoa, demonstrating this technique:
char *itoa(int n)
{
char *retbuf = malloc(25);
if(retbuf == NULL)
return NULL;
sprintf(retbuf, "%d", n);
return retbuf;
}
Now the caller can go back to saying simple things like
char *p = itoa(i);
and it no longer has to worry about the possibility that a later call to itoa will overwrite the results of
the first. However, the caller now has two new things to worry about:
1. This version of itoa returns a null pointer if malloc fails to return the memory that itoa
needs. The caller should really be checking for this null pointer return each time it calls itoa,
before using the pointer.
2. If the caller calls itoa 10,000 times, we'll have allocated 25 * 10,000 = 250,000 bytes of
memory, or a quarter of a meg. Unless someone is careful to call free to deallocate all of that
memory, it will be wasted. Few programs can afford to waste that much memory. (Once upon a
time, few programs could get that much memory, period.) The ``someone'' who is going to have
to call free isn't itoa; it has no idea when the caller is done with the memory returned by a
previous call to itoa, and in fact itoa might never get called again. So it will be the caller's
responsibility to keep track of each pointer returned by itoa, and to free it when it's no longer
needed, or else memory will gradually leak away.
We can work around the first problem--if we expect that there will usually be enough memory, such that
the call to malloc will rarely if ever fail, and if all the caller would do in an out-of-memory situation is
print an error message and abort, we can move the test down into the function:
char *retbuf = malloc(25);
if(retbuf == NULL)
{
http://www.eskimo.com/~scs/cclass/int/sx5.html (4 of 5) [22/07/2003 5:34:43 PM]
write
SETBIT(flags, DIRTY);
and
CLEARBIT(flags, DIRTY);
and
if(TESTBIT(flags, DIRTY))
The definition of the SETBIT() macro might look like this:
#define SETBIT(x, b) x |= b
When a function-like macro is expanded, the preprocessor keeps track of the ``arguments'' it was ``called'' with.
When we write
SETBIT(flags, DIRTY);
we're invoking the SETBIT() macro with a first argument of flags and a second argument of DIRTY. Within
the definition of the macro, those arguments were known as x and b. So in the replacement text of the macro, x
|= b, everywhere that x appears it will be further replaced by (in this case) flags, and everywhere that b
appears it will be replaced by DIRTY. So the invocation
SETBIT(flags, DIRTY);
will result in the expansion
flags |= DIRTY;
Notice that the semicolon had nothing to do with macro expansion; it appeared following the close parenthesis of
the invocation, and so shows up following the final expansion.
Similarly, we can define the CLEARBIT() and TESTBIT() macros like this:
#define CLEARBIT(x, b) x &= ~b
#define TESTBIT(x, b) x & b
Convince yourself that the invocations
CLEARBIT(flags, DIRTY);
and
http://www.eskimo.com/~scs/cclass/int/sx6a.html (2 of 7) [22/07/2003 5:34:49 PM]
if(TESTBIT(flags, DIRTY))
will result in the expansions
flags &= ~DIRTY;
and
if(flags & DIRTY)
as desired.
Just as for a regular function, parameter names such as x and b in a function-like macro definition are arbitrary;
they're just used to indicate where in the replacement text the actual argument ``values'' should be plugged in. Also,
those parameter names are not looked for within character or string constants. If you had a macro like
#define XX(a, b) printf("%s is a %s\n", a, b)
then the invocation
XX("John", "pumpkin-head");
would result in
#define XX(a, b) printf("%s is a %s\n", "John", "pumpkin-head");
It would not result in
#define XX(a, b) printf("%s is "John" %s\n", "John", "pumpkin-head");
which (in this case, anyway) would not have been at all what you wanted.
If we remember that (other than being careful not to expand macro arguments inside of string and character
constants) the preprocessor is otherwise pretty dumb and literal-minded, we can see why there must not be a space
between the macro name and the open parenthesis in a function-like macro definition. If we wrote
#define SETBIT (x, b) x |= b
the preprocessor would think we were defining a simple macro, named SETBIT, with the (rather meaningless)
replacement text (x, b) x |= b , and every time it saw SETBIT, it would replace it with (x, b) x |= b .
(It would ignore any parentheses and arguments that the invocation of SETBIT happened to be followed with; that
is, after the incorrect definition, the invocation
SETBIT(flags, DIRTY);
http://www.eskimo.com/~scs/cclass/int/sx6a.html (3 of 7) [22/07/2003 5:34:49 PM]
would expand to
(x, b) x |= b(flags, DIRTY);
where the (flags, DIRTY) part passed through without modification, along with the trailing semicolon.)
There are a few potential pitfalls associated with preprocessor macros, and with function-like ones in particular. To
illustrate these, let's look at another example. C has no built-in exponentiation operator; if you want to square
something, the easiest way is usually to multiply it by itself. Suppose that you got tired of writing
x * x
and
a * a + b * b
and
(x + 1) * (x + 1)
Knowing about function-like preprocessor macros, you might be inspired to define a SQUARE() macro:
#define SQUARE(z) z * z
Now you can write things like SQUARE(x) and SQUARE(a) + SQUARE(b), and this seems like it will be
workable and convenient. But wait: what about that third example? If you write
y = SQUARE(x + 1);
the simpleminded preprocessor will expand it to
y = x + 1 * x + 1;
Remember, the preprocessor doesn't evaluate arguments the same way a function call would, it just performs
textual substitutions. So in this last example, the ``value'' of the macro parameter z is x + 1, and everywhere that
a z had appeared in the replacement text, the preprocessor fills in x + 1. But when the rest of the compiler sees
the result, it will give multiplication higher precedence, as usual, and it will interpret the result as if you had
written
y = x + (1 * x) + 1;
which will not usually give you the result you wanted!
How can we fix this problem? We could forbid ourselves to ever ``call'' the SQUARE() macro on an argument that
http://www.eskimo.com/~scs/cclass/int/sx6a.html (4 of 7) [22/07/2003 5:34:49 PM]
wasn't a single constant or variable name, but this seems like a harsh restriction. A better solution is to play with
the definition of the macro itself: since the expansion we want is
(x + 1) * (x + 1)
we can achieve that by defining the macro like this:
#define SQUARE(z) (z) * (z)
Now
y = SQUARE(x + 1);
expands to
y = (x + 1) * (x + 1);
as we wished.
There's another problem, though: what if we write
q = 1 / SQUARE(r);
Now we get
q = 1 / (r) * (r)
and the rest of the compiler interprets this as
q = (1 / (r)) * (r)
(Multiplication and division have the same precedence, and by default they go from left to right.) What can we do
this time? We could enclose the invocation of the SQUARE() macro in extra parentheses, like this:
q = 1 / (SQUARE(r));
but that seems like a real nuisance to remember. A better solution is to build those extra parentheses into the
definition of the macro, too:
#define SQUARE(z) ((z) * (z))
Now the code 1 / SQUARE(r) expands to 1 / ((r) * (r)) and we have a macro that's safe against all
of the troublesome invocations we've tried so far.
right there at the top of funcs.h! Is that legal? Can the preprocessor handle seeing an #include
directive when it's already in the middle of processing another #include directive? The answer is that
yes, it can; header files (that is, #include directives) may be nested. (They may be nested up to a depth
of at least 8, although many compilers probably allow more.) Once funcs.h takes care of its own
needs, by including <stdio.h> itself, the eventual top-level file (that is, the one you compile, the one
that includes "funcs.h") won't get error messages about FILE being undefined, and won't have to
worry about whether it includes <stdio.h> or not.
Or will it? What if the top-level source file does include <stdio.h>? Now <stdio.h> will end up
being processed twice, once when the top-level source file asks for it, and once when funcs.h asks for
it. Will everything work correctly if <stdio.h> is included twice? Again, the answer is yes; the
Standard requires that the standard header files protect themselves against multiple inclusion.
It's good that the standard header files are protected in this way. But how do they protect themselves?
Suppose that we'd like to protect our own header files (such as funcs.h) in the same sort of way. How
would we do it?
Here's the usual trick. We rewrite funcs.h like this:
#ifndef FUNCS_H
#define FUNCS_H
#include <stdio.h>
extern int f1(int);
extern double f2(int, double);
extern int f3(int, FILE *);
#endif
All we've done is added the #ifndef and #define lines at the top, and the #ifndef line at the
bottom. (The macro name FUNCS_H doesn't really mean anything, it's just one we don't and won't use
anywhere else, so we use the convention of having its name mimic the name of the header file we're
protecting.) Now, here's what happens: the first time the compiler processes funcs.h, it comes across
the line
#ifndef FUNCS_H
and FUNCS_H is not defined, so it proceeds. The very next thing it does is #defines the macro
FUNCS_H (with a replacement text of nothing, but that's okay, because we're never going to expand
FUNCS_H, just test whether it's defined or not). Then it processes the rest of funcs.h, as usual. But, if
that same run of the compiler ever comes across funcs.h for a second time, when it comes to the first
http://www.eskimo.com/~scs/cclass/int/sx6b.html (2 of 3) [22/07/2003 5:34:51 PM]
#ifndef FUNCS_H line again, FUNCS_H will at that point be defined, so the preprocessor will skip
down to the #endif line, which will skip the whole header file. Nothing in the file will be processed a
second time.
(You might wonder what would tend to go wrong if a header file were processed multiple times. It's okay
to issue multiple external declarations for the same function or global variable, as long as they're all
consistent, so those wouldn't cause any problems. And the preprocessor also isn't supposed to complain if
you #define a macro which is already defined, as long as it has the same value, that is, the same
replacement text. But the compiler will complain if you try to define a structure type you've already
defined, or a typedef you've already defined (see section 18.1.6), etc. So the protection against multiple
inclusion is important in the general case.)
When header files are protected against multiple inclusion by the #ifndef trick, then header files can
include other files to get the declarations and definitions they need, and no errors will arise because one
file forgot to (or didn't know that it had to) include one header before another, and no multiple-definition
errors will arise because of multiple inclusion. I recommend this technique.
In closing, though, I might mention that this technique is somewhat controversial. When header files
include other header files, it can be hard to track down the chain of who includes what and who defines
what, if for some reason you need to know. Therefore, some style guides disallow nested header files. (I
don't know how these style guides recommend that you address the issue of having to require that certain
files be included before others.)
if(p != NULL) can only be used to mean ``is p valid?'' if you have taken care to make sure that all non-valid
pointers have been set to null.
(There is one condition under which C does guarantee that a pointer variable will be initialized to a null pointer, and
that is when the pointer variable is a global variable or a member of a global structure, or more precisely, when it is
part of a variable, array, or structure which has static duration.)
Remember, too, that the shorthand form
if(p)
is precisely equivalent to if(p != NULL). So you may be able to read if(p) as ``if p is valid'', but again, only if
you've ensured that whenever p is not valid, it is set to null.
The degree of care with which you have to implement a pointer management strategy may be different for different
pointer variables you use. If a pointer variable is immediately set to a valid pointer value, and if nothing ever happens
which could make it become invalid, then there's no need to check it before each time you use it. Similarly, if a
pointer is set to point to different locations from time to time, but it can be shown that it will always be valid, there's
again no reason to test it all the time. However, if a particular pointer is valid some of the time and invalid other of the
time, or in particular, if it records some optional data which might or might not be present, then you'll want to be very
careful to set the pointer to NULL whenever it's not valid (or whenever the optional data is not present), and to test the
pointer before using it (that is, before fetching or writing to the location that it points to).
Everything we've just said about ``pointer variables'' is equally true, and perhaps more important, for pointer fields
within structures. When you define a structure, you will typically be allocating many instances of that structure, so
you will have many instances of that pointer. You will typically have central pieces of code which operate on
instances of that structure, meaning that each time the piece of code runs, it may be operating on a different instance
of the structure, so if the pointer field is one that isn't always valid (that is, isn't valid in all instances of the structure),
the code had better test it before using it. Similarly, the code had better set the pointer field to NULL if it ever
invalidates it.
For example, one of the first features we added to the adventure game was a long description for objects and rooms.
But the long description is optional; not all objects and rooms have one. Suppose we chose to use a char * within
struct object and struct room to point at a dynamically-allocated string containing the long description.
(This choice would be preferable to a fixed-size array of char because it may be the case that some long descriptions
will be elaborately long, and we'd neither want to limit the potential length of descriptions by having a too-small array
nor waste space for objects with short or empty descriptions by always using a too-large array.) For each instance of
an object or room structure, we'd initialize the description field to contain a null pointer. For each room or object with
a long description, we'd set the description field to contain a pointer to the appropriate (and appropriately-allocated)
string. Finally, when it came time to print the descrition, we'd use code like
if(objp->desc != NULL)
printf("%s\n", objp->desc);
else
printf("You see nothing special about the %s.\n", objp->name);
Particular care is needed when pointers point to dynamically-allocated memory, managed with the standard library
functions malloc, free, and realloc. Somehow, it's easier to make mistakes here, and their consequences tend
http://www.eskimo.com/~scs/cclass/int/sx7.html (2 of 5) [22/07/2003 5:34:54 PM]
/* Beware... */
NULL does not mean that it never fails, but just that if/when it does fail, it signifies this by calling exit instead of
returning NULL.) Aborting the entire program when a call to malloc fails may seem draconian, and there are
programs (e.g. text editors) for which it would be a completely unacceptable strategy, but it's fine for our purposes,
especially if it doesn't happen very often. (In any case, aborting the program cleanly with a message like ``Out of
memory'' is still vastly preferable to crashing horribly and mysteriously, which is what programs that don't check
malloc's return value eventually do.)
Another area of concern is that when you're calling free and realloc, there are more ways for pointers to become
invalid. For example, consider the code
/* p is known to have come from malloc() */
free(p);
After calling free, is p valid or invalid? C uses pass-by-value, so p's value hasn't changed. (The free function
couldn't change it if it tried.) But p is most definitely now invalid; it no longer points to memory which the program
can use. However, it does still point just where it used to, so if the program accidentally uses it, there will still seem to
be data there, except that the data will be sitting in memory which may now have been allocated to ``someone else''!
Therefore, if the variable p persists (that is, if it's something other than a local variable that's about to disappear when
its function returns, or a pointer field within a structure which is all about to disappear), it would probably be a good
idea to set p to NULL:
free(p);
p = NULL;
(Of course, setting p to NULL only accomplishes something if later uses of p check it before using it.)
Finally, let's think about realloc. realloc, remember, attempts to enlarge a chunk of memory which we
originally obtained from malloc. (It lets us change our mind about how much memory we had asked for.) But
realloc is not always able to enlarge a chunk of memory in-place; sometimes it must go elsewhere in memory to
find a contiguous piece of memory big enough to satisfy the enlargement request. So what about this code?
newp = realloc(oldp, newsize);
Is oldp valid or invalid after this call? It depends on whether realloc returned the old pointer value or not (that is,
on whether it was able to enlarge the memory block in-place or had to go elsewhere). Most of the time, you will use
realloc something like this:
newp = realloc(p, newsize);
if(newp != NULL)
{
/* success; got newsize */
p = newp;
}
else
{
/* failure; p still points to block of old size */
}
With a setup like this, p remains valid, and newp is a temporary variable which we don't use further after testing it
and perhaps assigning it to p.
A final issue concerns pointer aliases. If several pointers point into the same block of memory, and if that block of
memory moves or disappears, all the old pointers become invalid. If you have a sequence of code which amounts to
p2 = p;
...
free(p);
p = NULL;
then setting p to NULL may not have been sufficient, because p2 just became invalid, too, and may also need setting
to NULL. The situation is particularly tricky with realloc: suppose that you have a pointer to a chunk of memory:
char *p = malloc(10);
and another pointer which points within that chunk:
char *p2 = p + 5;
Now, if you reallocate p, and if realloc has to go elsewhere and so returns a different pointer value which you
assign to p, you've also got to fix up p2, because it just had the rug yanked out from under it, and is now invalid. To
keep p2 up-to-date, you might use code like this:
int p2offset = p2 - p;
newp = realloc(p, newsize);
if(newp != NULL)
{
/* success; got newsize */
p = newp;
p2 = p + p2offset;
}
else
{
/* failure; p and p2 still point to block of old size */
}
Before calling realloc, we record (in the int variable p2offset) how far beyond p the secondary pointer p2
used to point, so that we can generate a corresponding new value of p2 if p moves.
If we say
*ipp = ip2;
we've changed the pointer pointed to by ipp (that is, ip1) to contain a copy of ip2, so that it (ip1)
now points at j:
http://www.eskimo.com/~scs/cclass/int/sx8.html (1 of 7) [22/07/2003 5:34:59 PM]
If we say
*ipp = &k;
we've changed the pointer pointed to by ipp (that is, ip1 again) to point to k:
What are pointers to pointers good for, in practice? One use is returning pointers from functions, via
pointer arguments rather than as the formal return value. To explain this, let's first step back and consider
the case of returning a simple type, such as int, from a function via a pointer argument. If we write the
function
f(int *ip)
{
*ip = 5;
}
and then call it like this:
int i;
f(&i);
then f will ``return'' the value 5 by writing it to the location specified by the pointer passed by the caller;
in this case, to the caller's variable i. A function might ``return'' values in this way if it had multiple
things to return, since a function can only have one formal return value (that is, it can only return one
value via the return statement.) The important thing to notice is that for the function to return a value
of type int, it used a parameter of type pointer-to-int.
Now, suppose that a function wants to return a pointer in this way. The corresponding parameter will
then have to be a pointer to a pointer. For example, here is a little function which tries to allocate
memory for a string of length n, and which returns zero (``false'') if it fails and 1 (nonzero, or ``true'') if it
http://www.eskimo.com/~scs/cclass/int/sx8.html (2 of 7) [22/07/2003 5:34:59 PM]
succeeds, returning the actual pointer to the allocated memory via a pointer:
#include <stdlib.h>
int allocstr(int len, char **retptr)
{
char *p = malloc(len + 1);
if(p == NULL)
return 0;
*retptr = p;
return 1;
}
/* +1 for \0 */
when trying to insert and delete items in linked lists. For simplicity, we'll consider lists of integers, built
using this structure:
struct list
{
int item;
struct list *next;
};
Suppose we're trying to write some code to delete a given integer from a list. The straightforward
solution looks like this:
/* delete node containing i from list pointed to by lp */
struct list *lp, *prevlp;
for(lp = list; lp != NULL; lp = lp->next)
{
if(lp->item == i)
{
if(lp == list)
list = lp->next;
else
prevlp->next = lp->next;
break;
}
prevlp = lp;
}
}
This code works, but it has two blemishes. One is that it has to use an extra variable to keep track of the
node one behind the one it's looking at, and the other is that it has to use an extra test to special-case the
situation in which the node being deleted is at the head of the list. Both of these problems arise because
the deletion of a node from the list involves modifying the previous pointer to point to the next node (that
is, the node before the deleted node to point to the one following). But, depending on whether the node
being deleted is the first node in the list or not, the pointer that needs modifying is either the pointer that
points to the head of the list, or the next pointer in the previous node.
To illustrate this, suppose that we have the list (1, 2, 3) and we're trying to delete the element 1. After
we've found the element 1, lp points to its node, which just happens to be the same node that the main
list pointer points to, as illustrated in (a) below:
To remove element 1 from the list, then, we must adjust the main list pointer so that it points to 2's
node, the new head of the list (as shown in (b)). If we were trying to delete node 2, on the other hand (as
illustrated in (c) above), we'd have to adjust node 1's next pointer to point to 3. The prevlp pointer
keeps track of the previous node we were looking at, since (at other than the first node in the list) that's
the node whose next pointer will need adjusting. (Notice that if we were to delete node 3, we would
copy its next pointer over to 2, but since 3's next pointer is the null pointer, copying it to node 2
would make node 2 the end of the list, as desired.)
We can write another version of the list-deletion code, which is (in some ways, at least) much cleaner, by
using a pointer to a pointer to a struct list. This pointer will point at the pointer which points at
the node we're looking at; it will either point at the head pointer or at the next pointer of the node we
looked at last time. Since this pointer points at the pointer that points at the node we're looking at (got
that?), it points at the pointer which we need to modify if the node we're looking at is the node we're
deleting. Let's see how the code looks:
struct list **lpp;
for(lpp = &list; *lpp != NULL; lpp = &(*lpp)->next)
{
if((*lpp)->item == i)
{
*lpp = (*lpp)->next;
break;
}
}
}
That single line
*lpp = (*lpp)->next;
updates the correct pointer, to splice the node it refers to out of the list, regardless of whether the pointer
being updated is the head pointer or one of the next pointers. (Of course, the payoff is not absolute,
because the use of a pointer to a pointer to a struct list leads to an algorithm which might not be
http://www.eskimo.com/~scs/cclass/int/sx8.html (5 of 7) [22/07/2003 5:34:59 PM]
In both cases, lpp points at a struct node pointer which points at the node to be deleted. In both
cases, the pointer pointed to by lpp (that is, the pointer *lpp) is the pointer that needs to be updated. In
both cases, the new pointer (the pointer that *lpp is to be updated to) is the next pointer of the node
being deleted, which is always (*lpp)->next.
One other aspect of the code deserves mention. The expression
(*lpp)->next
describes the next pointer of the struct node which is pointed to by *lpp, that is, which is pointed
to by the pointer which is pointed to by lpp. The expression
lpp = &(*lpp)->next
sets lpp to point to the next field of the struct list pointed to by *lpp. In both cases, the
parentheses around *lpp are needed because the precedence of * is lower than ->.
As a second, related example, here is a piece of code for inserting a new node into a list, in its proper
order. This code uses a pointer-to-pointer-to-struct list for the same reason, namely, so that it
doesn't have to worry about treating the beginning of the list specially.
/* insert node newlp into list */
struct list **lpp;
for(lpp = &list; *lpp != NULL; lpp = &(*lpp)->next)
{
struct list *lp = *lpp;
if(newlp->item < lp->item)
{
newlp->next = lp;
*lpp = newlp;
break;
http://www.eskimo.com/~scs/cclass/int/sx8.html (6 of 7) [22/07/2003 5:34:59 PM]
}
}
}
/* XXX WRONG */
a2[i, j] = 0;
/* XXX WRONG */
a2[j][i] = 0;
/* XXX WRONG */
nor
nor
What about multidimensional arrays? What kind of pointer is passed down to the function?
The answer is, a pointer to the array's first element. And, since the first element of a multidimensional
array is another array, what gets passed to the function is a pointer to an array. If you want to declare the
function func in a way that explicitly shows the type which it receives, the declaration would be
func(int (*a)[7])
{
...
}
The declaration int (*a)[7] says that a is a pointer to an array of 7 ints. Since declarations like
this are hard to write and hard to understand, and since pointers to arrays are generally confusing, I
recommend that when you write functions which accept multidimensional arrays, you declare the
parameters using array notation, not pointer notation.
What if you don't know what the dimensions of the array will be? What if you want to be able to call a
function with arrays of different sizes and shapes? Can you say something like
func(int x, int y, int a[x][y])
{
...
}
where the array dimensions are specified by other parameters to the function? Unfortunately, in C, you
cannot. (You can do so in FORTRAN, and you can do so in the extended language implemented by gcc,
and you will be able to do so in the new version of the C Standard (``C9X'') to be completed in 1999, but
you cannot do so in standard, portable C, today.)
Finally, we might explicitly note that if we pass a multidimensional array to a function:
int a2[5][7];
func(a2);
we can not declare that function as accepting a pointer-to-pointer:
func(int **a)
{
...
}
/* WRONG */
As we said above, the function ends up receiving a pointer to an array, not a pointer to a pointer.
http://www.eskimo.com/~scs/cclass/int/sx9a.html (2 of 3) [22/07/2003 5:35:04 PM]
Once we've done this, we can (just as for the one-dimensional case) use array-like syntax to access our
simulated multidimensional array. If we write
array[i][j]
we're asking for the i'th pointer pointed to by array, and then for the j'th int pointed to by that inner
pointer. (This is a pretty nice result: although some completely different machinery, involving two levels
of pointer dereferencing, is going on behind the scenes, the simulated, dynamically-allocated twodimensional ``array'' can still be accessed just as if it were an array of arrays, i.e. with the same pair of
bracketed subscripts.)
If a program uses simulated, dynamically allocated multidimensional arrays, it becomes possible to write
``heterogeneous'' functions which don't have to know (at compile time) how big the ``arrays'' are. In
other words, one function can operate on ``arrays'' of various sizes and shapes. The function will look
something like
func2(int **array, int nrows, int ncolumns)
{
}
This function does accept a pointer-to-pointer-to-int, on the assumption that we'll only be calling it with
simulated, dynamically allocated multidimensional arrays. (We must not call this function on arrays like
the ``true'' multidimensional array a2 of the previous sections). The function also accepts the dimensions
of the arrays as parameters, so that it will know how many ``rows'' and ``columns'' there are, so that it can
iterate over them correctly. Here is a function which zeros out a pointer-to-pointer, two-dimensional
``array'':
void zeroit(int **array, int nrows, int ncolumns)
{
int i, j;
for(i = 0; i < nrows; i++)
{
for(j = 0; j < ncolumns; j++)
http://www.eskimo.com/~scs/cclass/int/sx9b.html (2 of 3) [22/07/2003 5:35:08 PM]
array[i][j] = 0;
}
}
Finally, when it comes time to free one of these dynamically allocated multidimensional ``arrays,'' we
must remember to free each of the chunks of memory that we've allocated. (Just freeing the top-level
pointer, array, wouldn't cut it; if we did, all the second-level pointers would be lost but not freed, and
would waste memory.) Here's what the code might look like:
for(i = 0; i < nrows; i++)
free(array[i]);
free(array);
and this would declare a function returning a pointer to int. With the explicit parentheses, however,
int (*pfi)() tells us that pfi is a pointer first, and that what it's a pointer to is a function, and
what that function returns is an int.
It's common to use typedefs (see section 18.1.6) with complicated types such as function pointers. For
example, after defining
typedef int (*funcptr)();
the identifier funcptr is now a synonym for the type ``pointer to function returning int''. This typedef
would make declaring pointers such as pfi considerably easier:
funcptr pfi;
(In section 18.1.6, we mentioned that typedefs were a little bit like preprocessor define directives, but
better. Here we see another reason: there's no way we could define a preprocessor macro which would
expand to a correct function pointer declaration, but the funcptr type we just defined using typedef
will work just fine.)
Once declared, a function pointer can of course be set to point to some function. If we declare some
functions:
http://www.eskimo.com/~scs/cclass/int/sx10a.html (1 of 4) [22/07/2003 5:35:12 PM]
pfi
This is a pointer to a function. Then, we put the * operator in front, to ``take the contents of the pointer'':
*pfi
Now we have a function. Finally, we append an argument list in parentheses, along with an extra set of
parentheses to get the precedence right, and we have a function call:
(*pfi)(arg1, arg2)
The extra parentheses are needed here for almost exactly the same reason as they were in the declaration
of pfi. Without them, we'd have
*pfi(arg1, arg2)
and this would say, ``call the function pfi (which had better return a pointer), passing it the arguments
arg1 and arg2, and take the contents of the pointer it returns.'' However, what we want to do is take
the contents of pfi (which is a pointer to a function) and call the pointed-to function, passing it the
arguments arg1 and arg2. Again, the explicit parentheses override the default precedence, arranging
that we apply the * operator to pfi and then do the function call.
Just to confuse things, though, parts of the syntax are optional here as well. There's nothing you can do
with a function pointer except assign it to another function pointer, compare it to another function
pointer, or call the function that it points to. If you write
pfi(arg1, arg2)
it's obvious, based on the parenthesized argument list, that you're trying to call a function, so the
compiler goes ahead and calls the pointed to function, just as if you'd written
(*pfi)(arg1, arg2)
When calling the function pointed to by a function pointer, the * operator (and hence the extra set of
parentheses) is optional. I prefer to use the explicit *, because that's the way I learned it and it makes a
bit more sense to me that way, but you'll also see code which leaves it out.
will depend on the strcmp function's definition of what it means for one string to be ``less than'' or
``greater than'' another. How strcmp compares strings (as we saw in chapter 8) is to look at them a
character at a time, based on their values in the machine's character set (which is how C always treats
characters). In ASCII, the character set that most computers use, the codes representing the letters are in
order, so strcmp gives you something pretty close to alphabetical order, with the significant difference
that all the upper-case letters come before the lower-case letters. So a string-sorting function built around
strcmp would sort the words ``Zeppelin,'' ``able,'' ``baker,'' and ``Charlie'' into the order
Charlie
Zeppelin
able
baker
strcmp is quite useless for sorting strings which consist entirely of digits, because it goes from left to
right, stopping when it finds the first difference, without regard to whether it was comparing the ten's digit
of one number to the hundred's of another. For example, a strcmp-based sort would sort the strings "1",
"2", "3", "4", "12", "24", and "234" into the order
1
12
2
234
24
3
4
Depending on circumstances, we might want our string sorting function to sort into the order that strcmp
uses, or into ``dictionary'' order (that is, with all the a's together, both upper-case and lower-case), or into
numeric order. We could pass our stringsort function a flag or code telling it which comparison
strategy to use, although that would mean that whenever we invented a new comparison strategy, we
would have to define a new code or flag value and rewrite stringsort. But, if we observe that the final
ordering depends entirely on the behavior of the comparison function, a neater implementation is if we
write our stringsort function to accept a pointer to the function which we want it to use to compare
each pair of strings. It will go through its usual routine of comparing and exchanging, but whenever it
makes the comparisons, the actual function it calls will be the function pointed to by the function pointer
we hand it. Making it sort in a different order, according to a different comparison strategy, will then not
require rewriting stringsort at all, but instead will just involve passing it a pointer to a different
comparison function (after perhaps writing that function).
Here is a complete implementation of stringsort, which also accepts a pointer to the string
comparison function:
void stringsort(char *strings[], int nstrings, int (*cmpfunc)())
http://www.eskimo.com/~scs/cclass/int/sx10b.html (2 of 8) [22/07/2003 5:35:17 PM]
{
int i, j;
int didswap;
do
{
didswap = 0;
for(i = 0; i < nstrings - 1; i++)
{
j = i + 1;
if((*cmpfunc)(strings[i], strings[j]) > 0)
{
char *tmp = strings[i];
strings[i] = strings[j];
strings[j] = tmp;
didswap = 1;
}
}
} while(didswap);
}
(This code uses a fairly simpleminded sorting algorithm. It repeatedly compares adjacent pairs of
elements, keeping track of whether it had to exchange any. If it makes it all the way through the array
without exchanging any, it's done; otherwise, it has at least made progress, and it goes back for another
pass. This is not the world's best algorithm; in fact it's not far from the infamous ``bubble sort.'' Although
our focus here is on function pointers, not sorting, in a minute we'll take time out and look at a simple
improvement to this algorithm which does make it a decent one.)
Now, if we have an array of strings
char *array1[] = {"Zeppelin", "able", "baker", "Charlie"};
we can sort it into strcmp order by calling
stringsort(array1, 4, strcmp);
Notice that in this call, we are not calling strcmp immediately; we are generating a pointer to strcmp,
and passing that pointer as the third argument in our call to stringsort.
If we wanted to sort these words in ``dictionary'' order, we could write a version of strcmp which
ignores case when comparing letters:
#include <ctype.h>
}
} while(didswap);
}
}
The inner loops are the same as before, except that where we had before always been computing j = i
+ 1, now we compute j = i + gap, where gap is a new variable (controlled by a third, outer loop)
which starts out large (nstrings / 2), and then diminishes until it's 1. Although this code contains
three nested loops instead of two, it will end up making far fewer trips through the inner loop, and so will
execute faster.
The Standard C library contains a general-purpose sort function, qsort (declared in <stdlib.h>),
which can sort any type of data (not just strings). It's a bit trickier to use (due to this generality), and you
almost always have to write an auxiliary comparison function. For example, due to the generic way in
which qsort calls your comparison function, you can't use strcmp directly even when you're sorting
strings and would be satisfied with strcmp's ordering. Here is an auxiliary comparison function and the
corresponding call to qsort which would behave like our earlier call to stringsort(array1, 4,
strcmp) :
/* compare strings via pointers */
int pstrcmp(const void *p1, const void *p2)
{
return strcmp(*(char * const *)p1, *(char * const *)p2);
}
...
qsort(array1, 4, sizeof(char *), pstrcmp);
When you call qsort, it calls your comparison function with pointers to the elements of your array.
Since array1 is an array of pointers, the comparison function ends up receiving pointers to pointers. But
qsort doesn't know that the array is an array of pointers; it's written so that it can sort arrays of anything.
(That's why qsort's third argument is the size of the array element.) Since qsort doesn't know what the
type of the elements being sorted is, it uses void pointers to those elements when it calls the comparison
function. (The use of void pointers here recalls their use with malloc, where the situation is similar:
malloc returns pointers to void because it doesn't know what type we'll use the pointers to point to.) In
the ``wrapper'' function pstrcmp, we use the explicit cast (char * const *) to convert the void
pointers which the function receives into pointers to (pointers to char) so that when we use one *
operator to find out what they point to, we get one of the character pointers (char *) which we know our
array1 array actually contains. We pass the resulting two char *'s to strcmp to do most of the work,
but we have to do a bit of work first to recover the correct pointers. (The extra const in the cast (char
* const *) keeps the compiler from complaining, since the pointers being passed in to the
comparison function are actually const void *, meaning that although it may not be clear what they
point to, it's guaranteed that we won't be using them to modify whatever it is they point to. We need to
keep a level of const-ness in the converted pointers so that the compiler doesn't worry that we're going
to accidentally use the converted pointers to modify what we shouldn't.)
That was a rather elaborate first example of what function pointers can be used for! Let's move on to some
others.
Suppose you wanted a program to plot equations. You would give it a range of x and y values (perhaps -10
to 10 along each axis), and ask it to plot y = f(x) over that range (e.g. for -10 <= x <= 10). How would you
tell it what function to plot? By a function pointer, of course! The plot function could then step its x
variable over the range you specified, calling the function pointed to by your pointer each time it needed
to compute y. You might call
plot(-10, 10, -10, 10, sin)
to plot a sine wave, and
plot(-10, 10, -10, 10, sqrt)
to plot the square root function.
(If you were to try to write this plot function, your first question would be how to draw lines or
otherwise do graphics in C at all, and unfortunately this is one of the questions which the C language
doesn't answer. There are potentially different ways of doing graphics, with different system functions to
call, for every different kind of computer, operating system, and graphics device.)
One of the simplest (and allegedly least user-friendly) styles of user interfaces is the command line
interface, or CLI. The user types a command, hits the RETURN key, and the system interprets the
command. Often, the first ``word'' on the line is the command name, and any remaining words are
``arguments'' or ``option flags.'' The various shells under Unix, COMMAND.COM under MS-DOS, and
the adventure game we've been writing are all examples of CLI's. If you sit down to write some code
implementing a CLI, it's simple enough to read a line of text typed by the user, and simple enough to
break it up into words. But how do you map the first word on the line to the code which implements that
command? A straightforward, rather simpleminded way is to use a giant if/else chain:
if(strcmp(command, "agitate") == 0)
{ code for ``agitate'' command }
else if(strcmp(command, "blend") == 0)
{ code for ``blend'' command }
else if(strcmp(command, "churn") == 0)
{ code for ``churn'' command }
...
http://www.eskimo.com/~scs/cclass/int/sx10b.html (7 of 8) [22/07/2003 5:35:17 PM]
else
This works, but can become unwieldy. Another technique is to write several separate functions, each
implementing one command, and then to build a table (typically an array of structures) associating the
command name as typed by the user with the function implementing that command. In the table, the
command name is a string and the function is represented by a function pointer. With this table in hand,
processing the user's command becomes a relatively simple matter of searching the table for the matching
command string and then calling the corresponding pointed-to function. (We'll see an example of this
technique in this week's assignment.)
Our last example concerns larger, more elaborate systems which manipulate many types of data. (Here, by
``types of data,'' I am referring to data structures used by the program, presumably implemented as
structs, but in any case more elaborate than C's basic data types.) It is often the case that similar
operations must be performed on dissimilar data types. The conventional way of implementing such
operations is to have the code for each operation look at the data type it's operating on and adjust its
behavior accordingly; in the worst case, each piece of code (for each operation) will have a long switch
statement or if/else chain switching among n separate, different ways of performing the operation on n
different data types. If a new data type is ever added, new cases must be added to all segments of the code
implementing all of the operations.
Another way of organizing such code is to place one or more function pointers right there in the data
structures describing each data type. These pointers point to functions which are streamlined and
optimized for performing the operations on just one data type. (Each piece of data would obviously have
its pointer(s) set to point to the function(s) for its own data type.) This idea is one of the cornerstones of
Object-Oriented Programming. We could use a version of it in our adventure game, too: rather than
writing new, global pieces of code implementing each new command verb we want to let the user type,
and then making each of those pieces of code check all sorts of attributes to ensure that the command can't
be used on inappropriate objects (``break water with cup'', ``light candle with bucket,'' etc.) we could
attach special-purpose pieces of code to the individual objects themselves (via function pointers, of
course) and arrange that the code would only fire up if the player were trying to use the relevant object.
putchar(*p);
continue;
}
switch(*++p)
{
case 'c':
fetch and print a character
break;
case 'd':
fetch and print an integer
break;
case 's':
fetch and print a string
break;
case 'x':
print an integer, in hexadecimal
break;
case '%':
print a single %
break;
}
}
}
(For clarity, we've rearranged the former if/else statement slightly. If the character we're looking at is
not %, we print it out and continue immediately with the next iteration of the for loop. This is a good
example of the use of the continue statement. Everything else in the body of the loop then takes care
of the case where we are looking at a %.)
Printing these various argument types out will be relatively straightforward. The $64,000 question, of
course, is how to fetch the actual arguments. The answer involves some specialized macros defined for
us by the standard header <stdarg.h>. The macros we will use are va_list, va_start(),
va_arg(), and va_end(). va_list is a special ``pointer'' type which allows us to manipulate a
variable-length argument list. va_start() begins the processing of an argument list, va_arg()
fetches arguments from it, and va_end() finishes processing. (Therefore, va_list is a little bit like
the stdio FILE * type, and va_start is a bit like fopen.)
Here is the final version of our myprintf function, illustrating the fetching, formatting, and printing of
the various argument types. (For simplicity--of presentation, if nothing else--the formatting step is
http://www.eskimo.com/~scs/cclass/int/sx11b.html (2 of 5) [22/07/2003 5:35:25 PM]
i = va_arg(argp, int);
s = itoa(i, fmtbuf, 16);
fputs(s, stdout);
break;
case '%':
putchar('%');
break;
}
}
va_end(argp);
}
Looking at the new lines, we have:
#include <stdarg.h>
This header file is required in any file which uses the variable argument list (va_) macros.
va_list argp;
This line declares a variable, argp, which we use while manipulating the variable-length argument list.
The type of the variable is va_list, a special type defined for us by <stdarg.h>.
va_start(argp, fmt);
This line initializes argp and initiates the processing of the argument list. The second argument to
va_start() is simply the name of the function's last fixed argument. va_start() uses this to
figure out where the variable arguments begin.
i = va_arg(argp, int);
And here's the heart of the matter. va_arg() fetches the next argument from the argument list. The
second argument to va_arg() is the type of the argument we expect. Notice carefully that we must
supply this argument, which implies that we must somehow know what type of argument to expect next.
The variable-length argument list machinery does not know. In this case, we know what the type of the
next argument should be because it's supposed to match the format character we're processing. We can
see, then, why such havoc results when printf's arguments do not match its format string: printf
tells the va_arg machinery to grab an argument of one type, with the type determined by one of the
format specifiers, but since the va_arg machinery doesn't know what the actual argument type is,
there's no way for it to do any automatic conversion. If the actual argument has the right type for the
http://www.eskimo.com/~scs/cclass/int/sx11b.html (4 of 5) [22/07/2003 5:35:25 PM]
va_arg call which grabs it (as of course it's supposed to), it works, otherwise it doesn't.
(You may have noticed that we fetched the character to print for %c as an int, not a char. That's
deliberate, and is explained in the next section.)
s = va_arg(argp, char *);
Here's another invocation of va_arg(), this time fetching a string, represented as a character pointer,
or char *.
va_end(argp);
Finally, when we're all finished processing the argument list, we call va_end(), which performs any
necessary cleanup.
The tricky line is the second call to fprintf. We have the string, msg, we want to print, possibly
containing % characters. How do we pass down to fprintf the extra arguments which our caller passed to
us?
The answer is that we don't; there's no way to say ``call this function with the same arguments I got, however
many of them there are.'' (The run-time system simply doesn't have enough information to do this sort of
thing, which is why it can't tell you how many arguments you got called with, either.) However, there's a
variant version of fprintf which is designed for just this sort of situation. The variant is called
vfprintf (where the v stands for ``varargs''), and a call to it looks something like this:
void error(char *msg, ...)
{
va_list argp;
fprintf(stderr, "%s, line %d: error:", filename, lineno);
va_start(argp, msg);
vfprintf(stderr, msg, argp);
va_end(argp);
fprintf(stderr, "\n");
}
We declare a local variable of type va_list and call va_start(), as before. However, all we do with
our argp variable is pass it to vfprintf as its third argument. vfprintf then does all the work--if we
could look inside it, it would look a lot like our version of printf above, except that vfprintf does not
call va_start(), because its caller already has. Notice that vfprintf does not accept a variable number
of arguments; it accepts exactly three arguments, the third of which is essentially a ``pointer'' to the extra
arguments it will need.
There are also ``varargs'' versions of printf and sprintf, namely vprintf and vsprintf. These
follow the same pattern, accepting a single last argument of type va_list in lieu of an actual variablelength argument list.
(Notice that the error function above also called va_end(). This makes sense, since error was the one
who called va_start(). The above pattern works, but more complicated ones may not. For example, it's
not guaranteed that you can pick a few arguments off of a va_list, pass the va_list to a subfunction to
pick a few more off, and then pick the last ones off yourself. Also, there's no direct way to ``rewind'' a
va_list, although it's permissible to call va_end() and then va_start() again, to start over again.)
Copyright
This collection of hypertext pages is Copyright 1996 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
C Programming Notes
C Programming Notes
A Short Introduction to Programming
Steve Summit
At its most basic level, programming a computer simply means telling it what to do, and this vapidsounding definition is not even a joke. There are no other truly fundamental aspects of computer
programming; everything else we talk about will simply be the details of a particular, usually artificial,
mechanism for telling a computer what to do. Sometimes these mechanisms are chosen because they
have been found to be convenient for programmers (people) to use; other times they have been chosen
because they're easy for the computer to understand. The first hard thing about programming is to learn,
become comfortable with, and accept these artificial mechanisms, whether they make ``sense'' to you or
not.
In fact, you shouldn't worry if some (or even many) of the mechanisms used for programming a
computer don't make sense. It doesn't make sense that the cold water faucet has to be on the right side
and the hot one has to be on the left; that's just the convention we've settled on. Similarly, many
computer programming mechanisms are quite arbitrary, and were chosen not because of any theoretical
motivation but simply because we needed an unambiguous way to say something to a computer.
In this introduction to programming, we'll talk about several things: skills needed in programming, a
simplified programming model, elements of real programming languages, computer representation of
numbers, characters and strings, and compiler terminology.
[A German translation of this essay is also available.]
C Programming Notes
Read Sequentially
optional: if you don't set the darkness control, it'll do the best it can. You don't have to tell it how
many slices of bread you're toasting, or what kind. (``Modern'' toasters have begun to reverse this
trend...) Compare this user interface to most microwave ovens: they won't even let you enter the
cooking time until you've entered the power level.
When you're programming, it helps to be able to ``think'' as stupidly as the computer does, so that
you're in the right frame of mind for specifying everything in minute detail, and not assuming that
the right thing will happen unless you tell it to.
(This is not to say that you have to specify everything; the whole point of a high-level
programming language like C is to take some of the busiwork burden off the programmer. A C
compiler is willing to intuit a few things--for example, if you assign an integer variable to a
floating-point variable, it will supply a conversion automatically. But you have to know the rules
for what the compiler will assume and what things you must specify explicitly.)
3. good memory
There are a lot of things to remember while programming: the syntax of the language, the set of
prewritten functions that are available for you to call and what parameters they take, what
variables and functions you've defined in your program and how you're using them, techniques
you've used or seen in the past which you can apply to new problems, bugs you've had in the past
which you can either try to avoid or at least recognize by their symptoms. The more of these
details you can keep in your head at one time (as opposed to looking them up all the time), the
more successful you'll be at programming.
4. ability to abstract, think on several levels
This is probably the most important skill in programming. Computers are some of the most
complex systems we've ever built, and if while programming you had to keep in mind every
aspect of the functioning of the computer at all levels, it would be a Herculean task to write even a
simple program.
One of the most powerful techniques for managing the complexity of a software system (or any
complex system) is to compartmentalize it into little ``black box'' processes which perform useful
tasks but which hide some details so you don't have to think about them all the time.
We compartmentalize tasks all the time, without even thinking about it. If I tell you to go to the
store and pick up some milk, I don't tell you to walk to the door, open the door, go outside, open
the car door, get in the car, drive to the store, get out of the car, walk into the store, etc. I
especially don't tell you, and you don't even think about, lifting each leg as you walk, grasping
door handles as you open them, etc. You never (unless perhaps if you're gravely ill) have to worry
about breathing and pumping your blood to enable you to perform all of these tasks and subtasks.
We can carry this little example in the other direction, as well. If I ask you to make some ice
cream, you might realize that we're out of milk and go and get some without my asking you to. If
I ask you to help put on a party for our friends, you might decide to make ice cream as part of that
http://www.eskimo.com/~scs/cclass/progintro/sx1.html (2 of 3) [22/07/2003 5:35:36 PM]
3. The instructions are pretty vague about the details of reading the next number in the input list and
detecting the end of the list.
4. The ``program'' contains several bugs! It uses registers 1, 2, and 3, but we never say what to store in
them in the first place. Unless they all happen to start out containing zero, the average or maximum value
computed by the ``program'' will be incorrect. (Actually, if all of the numbers in the list are negative,
having register 3 start out as 0 won't work, either.)
Here is a somewhat more detailed version of the ``program,'' which removes some of the extraneous
information and ambiguity, makes the input list handling a bit more precise, and fixes at least some of the
bugs. (To make the concept of ``the number just read from the list'' unambiguous, this ``program'' stores
it in register 4, rather than referring to it by ``it.'' Also, for now, we're going to assume that the numbers
in the input list are non-negative.)
``Store 0 in registers 1, 2, and 3. Read the next number from the list. If you're at the end of
the list, you're done. Otherwise, store the number in register 4. Add register 4 to register 1.
Add 1 to register 2. If register 4 is greater than register 3, store register 4 in register 3.
When you're done, divide register 1 by register 2 and print the result, and print the contents
of register 3.''
When we add the initialization step (storing 0 in the registers), we realize that it's not quite obvious
which steps happen once only and which steps happen once for each number in the input list (that is,
each time through the processing loop). Also, we've assumed that the calculator can do arithmetic
operations directly into memory registers. To make the loop boundaries explicit, and the calculations
even simpler (assuming that all the calculator can do is store or recall memory registers from or to the
display, and do calculations in the display), the instructions would get more elaborate still:
``Store 0 in register 1. Store 0 in register 2. Store 0 in register 3. Here is the start of the
loop: read the next number from the list. If you're at the end of the list, you're done.
Otherwise, store the number in register 4. Recall from register 1, recall from register 4, add
them, store in register 1. Recall from register 2, add 1, store in register 2. Recall from
register 3, recall from register 4, if greater store in register 3. Go back to the beginning of
the loop. When you're done: recall from register 1, recall from register 2, divide them,
print; recall from register 3, print.''
We could continue to ``dumb down'' this list of instructions even further, but hopefully you're getting the
point: the instructions we use when programming computers have to be very precise, and at a level of
pickiness and detail which we don't usually use with each other. (Actually, things aren't quite as bad as
these examples might suggest. The ``dumbing down'' we've been doing has been somewhat in the
direction of assembly language, which wise programmers don't use much any more. In a higher-level
language such as C, you don't have to worry so much about register assignment and individual arithmetic
operators.)
Real computers can do quite a bit more than 4-function pocket calculators can; for one thing, they can
manipulate strings of text and other kinds of data besides numbers. Let's leave pocket calculators behind,
and start looking at what real computers (at least under the control of programming languages like C) can
do.
that few character sets have left arrows in them, and the left arrow key on the keyboard usually moves
the cursor rather than typing a left arrow.)
If assignment seems natural and unconfusing so far, consider the line
i = i + 1
What can this mean? In algebra, we'd subtract i from both sides and end up with
0 = 1
which doesn't make much sense. In programming, however, lines like
i = i + 1
are extremely common, and as long as we remember how assignment works, they're not too hard to
understand: the variable i receives (its new value is), as always, what we get when we evaluate the
expression on the right-hand side. The expression says to fetch i's (old) value, and add 1 to it, and this
new value is what will get stored into i. So i = i + 1 adds 1 to i; we say that it increments i.
(We'll eventually see that, in C, assignments are just another kind of expression.)
4. There are conditionals which can be used to determine whether some condition is true, such as
whether one number is greater than another. (In some languages, including C, conditionals are actually
expressions which compare two values and compute a ``true'' or ``false'' value.)
5. Variables and expressions may have types, indicating the nature of the expected values. For instance,
you might declare that one variable is expected to hold a number, and that another is expected to hold a
piece of text. In many languages (including C), your declarations of the names of the variables you plan
to use and what types you expect them to hold must be explicit.
There are all sorts of data types handled by various computer languages. There are single characters,
integers, and ``real'' (floating point) numbers. There are text strings (i.e. strings of several characters),
and there are arrays of integers, reals, or other types. There are types which reference (point at) values of
other types. Finally, there may be user-defined data types, such as structures or records, which allow the
programmer to build a more complicated data structure, describing a more complicated object, by
accreting together several simpler types (or even other user-defined types).
6. There are statements which contain instructions describing what a program actually does. Statements
may compute expressions, perform assignments, or call functions (see below).
7. There are control flow constructs which determine what order statements are performed in. A certain
statement might be performed only if a condition is true. A sequence of several statements might be
repeated over and over, until some condition is met; this is called a loop.
8. An entire set of statements, declarations, and control flow constructs can be lumped together into a
function (also called routine, subroutine, or procedure) which another piece of code can then call as a
unit.
When you call a function, you transfer control to it and wait for it to do its job, after which it returns to
you; it may also return a value as a result of what it has done. You may also pass values to the function
on which it will operate or which otherwise direct its work.
Placing code into functions not only avoids repetition if the same sequence of actions must be performed
at several places within a program, but it also makes programs easy to understand, because you can see
that some function is being called, and performing some (presumably) well-defined subtask, without
always concerning yourself with the details of how that function does its job. (If you've ever done any
knitting, you know that knitting instructions are often written with little sub-instructions or patterns
which describe a sequence of stitches which is to be performed multiple times during the course of the
main piece. These sub-instructions are very much like function calls in programming.)
9. A set of functions, global variables, and other elements makes up a program. An additional wrinkle is
that the source code for a program may be distributed among one or more source files. (In the other
direction, it is also common for a suite of related programs to work closely together to perform some
larger task, but we'll not worry about that ``large scale integration'' for now.)
10. In the process of specifying a program in a form suitable for a compiler, there are usually a few
logistical details to keep track of. These details may involve the specification of compiler parameters or
interdependencies between different functions and other parts of the program. Specifying these details
often involves miscellaneous syntax which doesn't fall into any of the other categories listed here, and
which we might lump together as ``boilerplate.''
Many of these elements exist in a hierarchy. A program typically consists of functions and global
variables; a function is made up of statements; statements usually contain expressions; expressions
operate on objects. (It is also possible to extend the hierarchy in the other direction; for instance,
sometimes several interrelated but distinct programs are assembled into a suite, and used in concert to
perform complex tasks. The various ``office'' packages--integrated word processor, spreadsheet, etc.--are
an example.)
As we mentioned, many of the concepts in programming are somewhat arbitrary. This is particularly so
for the terms expression, statement, and function. All of these could be defined as ``an element of a
program that actually does something.'' The differences are mainly in the level at which the ``something''
is done, and it's not necessary at this point to define those ``levels.'' We'll come to understand them as we
http://www.eskimo.com/~scs/cclass/progintro/sx4.html (3 of 4) [22/07/2003 5:35:46 PM]
Compiler Terminology
Compiler Terminology
C is a compiled language. This means that the programs you write are translated, by a program called a
compiler, into executable machine-language programs which you can actually run. Executable machinelanguage programs are self-contained and run very quickly. Since they are self-contained, you don't need
copies of the source code (the original programming-language text you composed) or the compiler in
order to run them; you can distribute copies of just the executable and that's all someone else needs to run
it. Since they run relatively quickly, they are appropriate for programs which will be written once and run
many times.
A compiler is a special kind of progam: it is a program that builds other programs. What happens is that
you invoke the compiler (as a program), and it reads the programming language statements that you have
written and turns them into a new, executable program. When the compiler has finished its work, you
then invoke your program (the one the compiler just built) to see if it works.
The main alternative to a compiled computer language or program is an interpreted one, such as BASIC.
An interpreted language is interpreted (by, not surprisingly, a program called an interpreter) and its
actions performed immediately. If you gave a copy of an interpreted program to someone else, they
would also need a copy of the interpreter to run it. No standalone executable machine-language binary
program is produced.
In other words, for each statement that you write, a compiler translates into a sequence of machine
language instructions which does the same thing, while an interpreter simply does it (where ``it'' is
whatever the statement that you wrote is supposed to do).
The big advantage of an interpreted language is that your program runs right away; you don't have to
perform--and wait for--the separate tasks of compiling and then running your program. (Actually, on a
modern computer, neither compiling nor interpreting takes much time, so some of these distinctions
become less important.)
Actually, whether a language is compiled or interpreted is not always an inherent part of the language.
There are interpreters for C, and there are compilers for BASIC. However, most languages were designed
with one or the other mechanism in mind, and there are usually a few difficulties when trying to compile
a language which is traditionally interpreted, or vice versa.
The distinction between compilation and interpretation, while it is very significant and can make a big
difference, is not one to get worked up over. Most of the time, once you get used to the details of how
you get your programs to run, you don't need to worry about the distinction too much. But it is a useful
distinction to have a basic understanding of, and to keep in the back of your mind, because it will help
you understand why certain aspects of computer programming (and particular languages) work the way
they do.
http://www.eskimo.com/~scs/cclass/progintro/sx7.html (1 of 2) [22/07/2003 5:35:55 PM]
Compiler Terminology
When you're working with a compiled language, there are several mechanical details which you'll want
to be aware of. You create one or more source files which are simple text files containing your program,
written in whatever language you're using. You typically use a text editor to work with source files
(typically you don't want to use a full-fledged text editor, since the compiler won't understand its
formatting codes). You supply each source file (you may have one, or more than one) to the compiler,
which creates an object file containing machine-language instructions corresponding to your program.
Your program is not ready to run yet, however: if you called any functions which you didn't write (such
as the standard library functions provided as part of a programming language environment), you must
arrange for them to be inserted into your program, too. The task of combining object files together, while
also locating and inserting any library functions, is the job of the linker. The linker puts together the
object files you give it, noticing if you call any functions which you haven't supplied and which must
therefore be library functions. It then searches one or more library files (a library file is simply a
collection of object files) looking for definitions of the still-unresolved functions, and pulls in any that it
finds. When it's done, it either builds the final, executable file, or, if there were any errors (such as a
function called but not defined anywhere) complains.
If you're using some kind of an integrated programming environment, many of these steps may be taken
care of for you so automatically and seamlessly that you're hardly aware of them.
C Programming Notes
C Programming Notes
A Brief Refresher on Some Math Often Used in Computing
Steve Summit
There are a few mathematical concepts which figure prominently in programming. None of these involve
higher math (or even algebra, and as we'll see, knowing algebra too well can make one particular
programming idiom rather confusing). You don't have to understand these with deep mathematical
sophistication, but keeping them in mind will help some things make more sense.
Read Sequentially
Exponential Notation
Exponential Notation
Exponential or Scientific Notation is simply a method of writing a number as a base number times some
power of ten. For example, we could write the number 2,000,000 as 2 x 10<sup>6</sup>, the number
0.00023 as 2.3 x 10<sup>-4</sup>, and the number 123.456 as 1.23456 x 10<sup>2</sup>.
Binary Numbers
Binary Numbers
Our familiar decimal number system is based on powers of 10. The number 123 is actually 100 + 20 + 3
or 1 x 10<sup>2</sup> + 2 x 10<sup>1</sup> + 3 x 10<sup>0</sup>.
The binary number system is based on powers of 2. The number 100101<sub>2</sub> (that is,
``100101 base two'') is 1 x 2<sup>5</sup> + 0 x 2<sup>4</sup> + 0 x 2<sup>3</sup> + 1 x
2<sup>2</sup> + 0 x 2<sup>1</sup> + 1 x 2<sup>0</sup> or 32 + 4 + 1 or 37.
We usually speak of the individual numerals in a decimal number as digits, while the ``digits'' of a binary
number are usually called ``bits.''
Besides decimal and binary, we also occasionally speak of octal (base 8) and hexadecimal (base 16)
numbers. These work similarly: The number 45<sub>8</sub> is 4 x 8<sup>1</sup> + 5 x
8<sup>0</sup> or 32 + 5 or 37. The number 25<sub>16</sub> is 2 x 16<sup>1</sup> + 5 x
16<sup>0</sup> or 32 + 5 or 37. (So 37<sub>10</sub>, 100101<sub>2</sub>,
45<sub>8</sub>, and 25<sub>16</sub> are all the same number.)
Boolean Algebra
Boolean Algebra
Boolean algebra is a system of algebra (named after the mathematician who studied it, George Boole)
based on only two numbers, 0 and 1, commonly thought of as ``false'' and ``true.'' Binary numbers and
Boolean algebra are natural to use with modern digital computers, which deal with switches and
electrical currents which are either on or off. (In fact, binary numbers and Boolean algebra aren't just
natural to use with modern digital computers, they are the fundamental basis of modern digital
computers.)
There are four arithmetic operators in Boolean algebra: NOT, AND, OR, and EXCLUSIVE OR.
NOT takes one operand (that is, applies to a single value) and negates it: NOT 0 is 1, and NOT 1 is 0.
AND takes two operands, and yields a true value if both of its operands are true: 1 AND 1 is 1, but 0
AND 1 is 0, and 0 AND 0 is 0.
OR takes two operands, and yields a true value if either of its operands (or both) are true: 0 OR 0 is 0, but
0 OR 1 is 1, and 1 OR 1 is 1.
EXCLUSIVE OR, or XOR, takes two operands, and yields a true value if one of its operands, but not
both, is true: 0 XOR 0 is 0, 0 XOR 1 is 1, and 1 XOR 1 is 0.
It is also possible to take strings of 0/1 values and apply Boolean operators to all of them in parallel;
these are sometimes called ``bitwise'' operations. For example, the bitwise OR of 0011 and 0101 is 0111.
(If it isn't obvious, what happens here is that each bit in the answer is the result of applying the
corresponding operation to the two corresponding bits in the input numbers.)
Handouts
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html
In order to learn C you must write and run C programs, and in order to run C programs you will generally
need a ``C compiler.'' As explained elsewhere, a compiler is a special program which builds other
programs. Specifically, a compiler takes a program which you have written in a higher-level language
such as C, and translates or compiles that program into an equivalent machine-language program which
is suitable for execution on a computer's CPU. There are many different kinds of compilers, depending
on the input language to be compiled and the type of CPU for which code is to be generated.
The easiest compiler to use is one that is already set up and running somewhere. If you have access to a
timesharing system (such as an ISP with a ``shell prompt,'' or a multiuser system at a university), there is
a good chance that that system already has a C compiler. Similarly, if you have access (at work or in a
lab) to an engineering workstation, there is a good chance that it has a C compiler. Most Unix systems
have C compilers, and the freeware Linux operating system (a Unix clone which runs on, among other
machines, PC compatibles) comes with an excellent free C compiler, gcc (which we'll have more to say
about later).
Assuming that you don't have access to an existing C compiler but that you do have access to a personal
computer under your control, you can obtain and install your own C compiler for that machine. You have
several options, ranging from expensive to cheap to free, from simple to sophisticated, and from easy to
difficult. (Of course, since nothing in life is truly free, you probably won't be able to find an option which
is simultaneously free, easy to use, and perfectly matched to your own preferred level of simplicity or
sophistication.)
Commercial Compilers
Inexpensive Compilers
Shareware and Freeware Compilers
Important Notes
Read Sequentially
http://www.eskimo.com/~scs/cclass/handouts/avcompilers/index.html
Commercial Compilers
Commercial Compilers
The most ``popular'' (whatever that means) compilers for MS-DOS and Windows systems are
manufactured by Microsoft and Inprise (formerly Borland). Both of these companies have several
compiler products to offer, including some cheap ones (under $100) targeted at individuals, and some
expensive ones ($600 to $1000 or more) targeted at professional software developers. There are
compilers for MS-DOS or Windows (though the former are becoming obsolete) and for C, C++, or both.
An inexpensive ($20 or $30) commercial option is Power C from Mix Software, 1132 Commerce Drive,
Richardson, TX 75801, 214-783-6001.
For the Macintosh, the two commercial compilers I know of are Think C by Symantec, and CodeWarrior
by Metroworks.
Inexpensive Compilers
Inexpensive Compilers
If you don't need the latest-and-greatest compiler (and, as we'll see, you don't), there are a number of
``budget'' alternatives. One is to find a used compiler--several computer stores and bookstores (including
Half-Price Books in the Seattle area) carry used software at a fraction of its original price. The software
is always a few years old, but it's fully functional and comes with complete documentation at a price
that's hard to beat. (Whenever I visit Half-Price Books, I just about always see copies of Microsoft,
Borland/Inprise, and Think C for $30-50, which when new would have cost $100-600.)
Another option (also to be found in bookstores) is to obtain a book with a title like ``Teach Yourself C in
N days--disk included!''. I've seen two or three titles along these lines, for between 30 and 50 dollars. The
disk contains (among other things) a stripped-down version of some popular, commercial compiler.
Since the book sells for less than the full-fledged compiler would by itself, the version of the compiler
included with the book is typically crippled in some way, designed to be useful for students learning C
but not for professionals doing large-scale software development. But since we're learning C, a
``learning'' compiler is fine, and even if the book is no good, the cover price may be worth it for the
compiler alone.
Important Notes
Important Notes
Whatever compiler you choose, make sure it's for the operating system you have on your computer. An
older MS-DOS compiler probably won't run well under Windows (though it may be possible to run it in
a ``DOS box''.) A newer Windows compiler certainly won't run under plain MS-DOS.
Also, it's preferable to have a C compiler, or a C++ compiler that also compiles C, as opposed to a
compiler that only compiles C++. As far as I know, most C++ compilers do have an optional mode in
which they act just like C compilers, and when you're writing C programs, this is obviously the mode
you want to use. (C++ compilers that also compile C are often called ``C/C++ compilers.'') If you find
yourself stuck with a compiler that compiles only C++, you may be able to use it (since C++ is mostly a
superset of C, and most C programs--especially simple ones--are also valid C++ programs), but it won't
be ideal.
Finally, as mentioned above, you don't need the latest-and-greatest of the brand-new compilers out there,
and in fact you may not even want a brand-new compiler. C is a stable language, and especially for the
simple programs we'll be writing, a compiler from a few years ago will be perfectly adequate. Some of
the fancier features that newer or more powerful compilers contain would only get in our way. Do make
sure that any compiler you get is new enough to run under your computer's operating system (or old
enough, if you're using an older operating system such as MS-DOS or Windows 3.1). Do make sure that
the compiler is ``ANSI compatible'' (although you'd be hard-pressed to find one that was not, since the
Standard was ratified almost 10 years ago). Beyond that, it hardly matters which compiler you get, so
stick with something simple and inexpensive, which will be bound to serve you well while you're
learning.
Copyright
This collection of hypertext pages is Copyright 1998-1999 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/index.html
This handout contains reasonably detailed, step-by-step instructions for entering, compiling, and running
C programs, in a number of environments. Much of this information appears in other handouts as well, so
there is a certain amount of overlap here. If you've got a good handle on the compilation process, there's
no need to read this handout, but if you've been having difficulty, or prefer explicit instructions, read on.
Since everyone's situation is different (different environments, different compilers, different amounts of
experience, different learning styles), this handout will run through the process several times, with
increasing levels of detail. After the first two overviews, the discussion splits into threads for several
environments: Unix, DOS/Windows, and the Macintosh.
1. First Overview
2. Second Overview
3. Third Pass: Details
Read Sequentially
1. First Overview
1. First Overview
In broad outline, there are three steps:
1. Type in or edit the C source code. It will go into an ordinary text file on your computer, with a
name of your own choosing, although the name must end in .c to indicate that it is a file full of C
source code.
2. Compile it. (Remember that ``compiling'' is the process performed by the compiler, the translation
of your program from the high-level C language which we're supposed to be able to understand,
into the low-level machine language which the processor can understand.)
3. Run it.
If the compiler gives you error messages during step 2, you'll have to return to step 1, discover and
correct the errors, and then try the compilation (step 2) again. If the program doesn't work as you expect
(or doesn't work at all), you'll have to return to steps 1 and 2 (in that order, of course).
2. Second Overview
2. Second Overview
The edit, compile, and run steps take different forms depending on whether you are working in a
traditional command line environment (such as Unix or MS-DOS) or a Graphical User Interface
(Microsoft Windows or the Macintosh).
2.1. Command Line Environment
2.2. Graphical User Interface
3.1. Unix
3.1. Unix
Since C is the programming language of Unix, it used to be the case that virtually all Unix machines had
C compilers. That's less true today (alas!), but C compilers are still very common. One extremely popular
Unix compiler, which happens to be of extremely high quality, and also happens to be free, is the Free
Software Foundations's gcc, or GNU C Compiler.
Step 1: entering or editing your source code
The most popular Unix text editors are vi and emacs. I'm not going to provide tutorials on them here; if
you've been using Unix for a while, there's a good chance that you've become familiar with one or the
other of them. Like most text editors, you can invoke them in one of two slightly different ways: either
with no command-line arguments, in which case you are given a blank screen in which to begin typing
(following which you must save your work to a file, after selecting a filename); or with a filename as a
command-line argument, in which case that file is read in so that you can make incremental changes to it
(following which you must of course again save your work, which by definition will go back to the same
file as you initially read in). Another popular text editor is pico, which is easy for beginners to use,
particularly if you use the PINE e-mail handling program. (pico is PINE's ``composer'', made available
as a standalone program.) See the ``University of Washington'' section below for more information about
pico.
For example, for the first ``Hello, world!'' program, you might type one of the following four commands:
$
$
$
$
vi
emacs
vi hello.c
emacs hello.c
(In these and the later examples, I'm using $ as a generic Unix shell prompt.) In the first two cases, you'd
be given a blank slate, and you'd save your work (with :w in vi or ^X^W in emacs), specifying a
filename of hello.c. In the second two cases, you'd be presented with hello.c (if it existed), and
when you were finished, you'd save it back to that file (with :w again under vi, or ^X^S under emacs).
In any case, whichever text editor you're using and however you choose to invoke it, make sure to type in
the program (if it's an initial example from the class notes or from a book) exactly as shown, including all
punctuation, and with the lines laid out just as they are in the code you're typing in. (Later, we'll learn
what does and doesn't matter in terms of program layout: where your degrees of freedom are, what the
compiler does and doesn't care so much about, which syntax you have to get exactly right and which
parts allow you to express some creativity. For now, though, just type everything verbatim.)
Step 2: compiling the program
3.1. Unix
Most Unix C compilers use about the same command-line syntax. To compile our ``Hello, world!''
program, you'd type
$ cc -o hello hello.c
The -o hello part says that the output, the executable program which the compiler will build, should
be named hello (that is, placed in a file of that name in the current directory). The C source file to be
read and compiled is obviously hello.c.
If you're using gcc, the command line is almost identical:
$ gcc -o hello hello.c
If cc gives you strange syntax errors, it may be that it's an old-fashioned, pre-ANSI compiler, in which
case it won't be able to compile most of the sample programs we'll be working with (unless you were to
edit them slightly). For many systems where plain cc is obsolete, there is an ANSI C compiler available,
often called acc. Again, the invocation is probably the same:
$ acc -o hello hello.c
For any of these invocations, if you leave out the -o hello part (i.e. if you just type cc hello.c),
the default is usually to leave your executable program in a file named a.out, which is cryptic and
somewhat less than perfectly useful.
If you're compiling a program which uses any of the math functions declared in the header file
<math.h>, such as sqrt, sin, cos, or pow, you'll typically have to request explicitly that the
compiler include the math library:
$ cc -o myprogram myprogram.c -lm
Notice that, unlike most command-line options, the -lm option which requests the math library must be
placed at the end of the command line.
It's also possible to use cc (or gcc, or acc, or any of the others) to compile multiple C source files,
either at once or separately, and to link them all together into one program, but we'll cover those details
later, when we start talking about separate compilation at all.
Step 3: running the program
Assuming you did use the -o option to specify the output filename hello when compiling, you can
now attempt to run your program by just typing its name:
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3a.html (2 of 3) [22/07/2003 5:36:35 PM]
3.1. Unix
$ hello
If you get a message like ``not found'' or ``no such file or directory'', it probably indicates that your shell
is not searching the current directory for commands that you type. You can either adjust your search path
(the PATH environment variable), or else invoke
$ ./hello
to explicitly indicate that the hello you're trying to run is in the current directory, known as dot (.).
If the program prints its output to the screen (as most programs do), and if you'd like to save it (for
posterity, or to print out or mail to me for review), you can type something like
$ hello > hello.out
On the Unix command line, a greater-than symbol > requests that the program's output be redirected to
the named file, instead of going to the screen. (Later, when you begin writing programs that read input
from the keyboard, you may be interested to know that it's possible to redirect a program's input as well,
using the less-than symbol, <. An invocation like program < inputfile requests that the program
read its input from the file named inputfile instead of from the keyboard.)
Finally, one last word on those text editors. I glossed over one detail, which may be troubling you: If you
invoke a text editor with a filename as a command line argument, but the file does not exist, most editors
assume that you're about to create it. You start out with a blank screen, almost as if you'd invoked the
editor with no filename argument at all, but when it comes time to save or quit, the editor remembers the
filename, so you don't have to type it in again. That's all. (In other words, you get to do a ``Save'' instead
of ``Save As''.)
to get exactly right and which parts allow you to express some creativity. For now, though, just type
everything verbatim.)
When you're done typing, type control-O to write the source code that you've just typed in out to a text
file. pico will prompt you for a file name; use hello.c (assuming you're typing in the first ``Hello,
world!'' example). Exit pico by typing control-X.
If you get compilation errors in step 2, or if the program doesn't work in step 3, such that you have to
come back here to step 1, you will edit (modify) your program by typing
% pico hello.c
(or whatever the filename is), making your changes, writing the file out again with control-O, and then
quitting again with control-X.
Step 2: compiling the program
Type
% gcc -o hello hello.c
(again, assuming you're working with a source file named hello.c). The -o hello part says that the
output, the executable program which the compiler will build, should be named hello (that is, placed in
a file of that name in the current directory). The C source file to be read and compiled is obviously
hello.c.
(If you leave out the -o hello part, the default is usually to leave your executable program in a file
named a.out, which is cryptic and somewhat less than perfectly useful.)
Later in the class, when it comes time to compile a program which uses any of the math functions
declared in the header file <math.h>, such as sqrt, sin, cos, or pow, you'll have to request
explicitly that gcc include the math library:
% gcc -o myprogram myprogram.c -lm
Notice that, unlike most command-line options, the -lm option which requests the math library must be
placed at the end of the command line.
It's also possible to compile multiple C source files, either at once or separately, and to link them all
together into one program, but we'll cover those details later, when we start talking about separate
compilation at all.
When I tested gcc on saul during the process of writing this handout, it gave me an unusual warning
message of the form ``as0: Warning: extraneous values on version stamp ignored''. I have a pretty good
idea of what this message means, and it indicates a slight problem in the installation of one of the parts of
the compiler, not a problem with my (or your) C program. You can ignore this message; the programs
produced by the compiler seem to be just fine in spite of it.
Step 3: running the program
Assuming you did use the -o option to specify the output filename hello when compiling, you can
now attempt to run your program by just typing its name:
% hello
If you get a message like ``not found'' or ``no such file or directory'', it probably indicates that the shell is
not searching the current directory for commands that you type. You can either adjust your search path
(the PATH environment variable), or else invoke
% ./hello
to explicitly indicate that the hello you're trying to run is in the current directory, known as dot (.).
As mentioned in the Unix section above, you can capture your program's output in a file (instead of
having it go to the screen) by typing
% hello > hello.out
The output appears in the file hello.out, which will be a plain text file.
Finally, one footnote about pico: If you invoke it with the name of a file that does not exist, e.g. by
typing pico hello.c the very first time, pico assumes that you're about to create that file (as, in
fact, you are). It gives you the same initially empty editing screen as if you'd invoked it with no filename
argument at all, but when you type ^O to write the file out, it remembers the filename from the command
line, so you don't have to type it in again. That's all. (In other words, you get to do a ``Save'' instead of
``Save As''.)
3.3. MS-DOS
3.3. MS-DOS
If you are using an older MS-DOS compiler, it may be invoked from the command line, in which case it
will act somewhat similarly to the Unix compilers discussed above. On the other hand, there are several
MS-DOS compilers which ``roll their own'' graphical user interfaces, and these act more like the GUI
compilation environments described below.
The MS-DOS compilers I am familiar with are Microsoft's, Borland's, and DJ Delorie's (aka djgpp).
Other MS-DOS compilers are Watcom's and Mix Software's. Microsoft and Borland, at least, make both
command line compilers and ``visual'' compilers with a more graphical user interface. The rest of this
section assumes that you're not using much of a graphical interface, that you're instead simply typing
commands at the DOS (C:\>) prompt. (Theoretically, it's even possible to use these command-line
compilers in a ``DOS box'' under Microsoft windows, but they tend not to work too well there.)
Step 1: entering or editing your source code
Most MS-DOS compilers probably come with some kind of a source code editor for typing in your
programs, but if not, you'll have to use an ordinary text editor. (If you have a text editor you prefer, you
can probably use it instead of the compiler's editor in any case.)
If at all possible, do not try to use a ``word processor'' to enter or edit C source code. The compiler
expects the source files you'll be giving it to be plain text files, not the document files (incorporating font
and size changes, pagination, and all the other formatting goodies) that word processors create by default.
Even if you remember to ask your word processor to save files as ``plain text'' or ``MS-DOS text files'',
the word processor will still be eager to rearrange your source code into what it thinks of as nicelyformatted ``paragraphs'', which is not what you want your C code to look like.
In any case, whichever editor you use, make sure to type in the program (if it's an initial example from
the class notes or from a book) exactly as shown, including all punctuation, and with the lines laid out
just as they are in the code you're typing in. (Later, we'll learn what does and doesn't matter in terms of
program layout: where your degrees of freedom are, what the compiler does and doesn't care so much
about, which syntax you have to get exactly right and which parts allow you to express some creativity.
For now, though, just type everything verbatim.) When you're finished, be sure to save your work in a
filename ending in .c . (Also, once again, make sure it's a plain, MS-DOS text file, if you're trying to
use a word processor to do the entering.)
Step 2: compiling the program
The older versions of the Microsoft compilers (e.g. MSC V6.00), which are the only ones I'm familiar
with, use a command-line syntax like
3.3. MS-DOS
C:\> cl hello.c
(``cl'' stands for ``compile and link''. If you're using a really ancient version, such as 3.00, the command
may be msc, but I doubt there are too many of those in circulation any more.)
Borland's syntax (at least the versions I've used) is similar:
C:\> bcc hello.c
(``bcc'' obviously stands for ``Borland C compiler''.)
Both of these commands request that the file named HELLO.C be compiled. Ideally (i.e. if there are no
errors), an executable program will be created, named HELLO.EXE.
It's also possible to do a two-step compilation, requesting the compiler to build only an ``object'' (.OBJ)
file corresponding to the .C file, and then separately invoking the linker to actually create the executable.
We'll defer discussion of the linker to the later topic of separate compilation, but I mention it here
because I can't remember if bcc invokes the linker automatically at all. If, when you attempt to compile
HELLO.C, the bcc command (or any MS-DOS compiler) creates HELLO.OBJ but not HELLO.EXE, it
means that you'll have to invoke the linker yourself. For simple, one-source-file programs like the ones
we'll be writing at first, the linker invocations will be equally simple:
C:\> link hello;
or
C:\> tlink hello;
(The Microsoft linker is usually named link; the Borland linker is usually named tlink.) The linker
assumes a .obj extension on the input file names you give it (although you can type link
hello.obj; if you prefer), and creates an executable named HELLO.EXE (or whatever.EXE,
depending on the name of the first object file you give it). The semicolon at the end of the line keeps the
linker from prompting you for additional information which would be unnecessary for these simple
examples. (The extra information, which you can also supply in advance on the command line if you
wish, includes such things as an explicitly-selected name for the executable file and a list of extra
libraries to load.)
I haven't said much about djgpp, which is DJ Delorie's monumental port of the entire GNU C
compilation suite to MS-DOS. That topic deserves a section of its own, which I haven't written yet.
djgpp is a very mixed bag: it's an extremely high-quality compiler, better in many ways than the
commercial alternatives, but it is not easy to install or set up. Its opening documentation says something
like ``if you're just learning C, this is not the compiler you want.'' Nevertheless, the price is right, and I
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3c.html (2 of 3) [22/07/2003 5:36:39 PM]
3.3. MS-DOS
Whether or not the compiler explicitly prompts you to, you will eventually want to do a Save or Save As,
to save your work to disk. Again, you will always use a name ending in .c for source files. If you've just
invoked the compiler for the first time, the default directory (or folder) where you'll first end up trying to
store your own source files may end up being the compiler's installation directory. This isn't ideal, and
although it would probably work, you'll be much better off creating your own directory or folder,
somewhere else, where you'll keep all your own source files and programs that you've written or typed
in, and where they can stay separate from all the compiler's supporting files.
The compiler may automatically add new source files (once you've typed them in, and given them a
name) to the current project or program, or you may have to explicitly use an Add selection (perhaps
from a source or project menu) to add the to the current project. Somewhere, there should be a window
or other display which lists the ``modules'' that make up the current program. Make sure that, eventually,
a name incorporating the word ``hello'' shows up on this list (assuming you've named your source file
hello.c).
Later, when you have occasion to come back to an existing source file to edit it, you'll be able to open it
using Open (instead of New) from a File menu, or perhaps by selecting its name from within the
compiler's list of files or modules which make up the current project.
Step 2: compiling the program
Modern compilers typically include several different compilation modes, and although some of them
might not end up mattering very much for the simple programs we'll be writing, it's worth paying a bit of
attention to their selection, for best results. Somewhere (perhaps in the Project menu) there will probably
be a menu or dialog of ``Compiler Settings''. One choice may be the ``Input Language'', and it may have
options like ``ANSI C'', ``ANSI C with Extensions'', ``C++'', and ``pre-ANSI'' or ``K&R'' or ``Classic'' or
``Traditional C''. For this class, you definitely want the ``ANSI C'' option, or as close as you can get to it.
(Any extensions, no matter how seductive they sound, are unnecessary and would only get in the way,
besides intruding on our goal of learning portable C.)
Another selection may concern the type of program or executable or application to create--there may be
selections with names like ``Windows Application'' or ``Console Application'' or ``stand-alone
executable'' or ``run within compiler''. Definitely don't request a Windows Application; that would
require that you learn how to open windows (from the program side) and learn the rest of the Windows
API, and it would prohibit you from using our most basic library function, printf. A ``console
application'', if you see a choice like that, is much more like what we want: this arranges that a window
be opened automatically, in which our printf output will appear, and into which we can type any input
which our program expects, all without our having to worry about the gory details of Windows. The
result is a ``console'' or ``virtual terminal'' environment just like the old-fashioned ones which the C
textbooks and tutorials (mine included) all still assume you're using.
If you have a choice between running your program within the compilation environment versus creating
a standalone executable, you may do either, but you should understand the distinction. If you run within
the compilation environment, it means that you'll ask the compilation environment to run your program,
and it will open a new window to do so. Creating a standalone MS-DOS executable, on the other hand,
means that the compiler will write an executable file to disk (perhaps automatically picking a name like
HELLO.EXE, or perhaps prompting you for one). You'll then invoke your program yourself, by typing
its name at a DOS prompt. (If you were creating a standalone Windows application, you'd invoke it by
double-clicking on it, but as mentioned, we're probably not ready to create standalone Windows
applications.)
Once you've made your selections of input language and output format (and any other choices which
look interesting, such as how verbose you want the compiler to be about warning you of possible errors,
and how hard it should work at ``optimizing'' your program--you'll probably want to choose ``very'' and
``not very'', respectively), you're ready to actually do the compiling. There may be a Compile or Build or
Make button or menu selection somewhere (again, perhaps on the Project menu). Or, especially if you've
selected ``run within compiler'', there may simply be a Run button (or a top-level Run menu) which first
automatically compiles and builds your program and then, if everything is successful, takes you directly
to step 3.
As helpful and integrated and hand-holding as these visual compilation environments try to be, they tend
to let you down and leave you out in the cold in one respect, namely the automatic selection and
incorporation of the standard C libraries. If, when you first try to build or run your program, you get error
messages telling you that functions (or ``external references'') with names like printf are ``undefined'',
this means that the compiler wants you to explicitly tell it that yes, you do want to make use of those
standard library functions which your textbook and instructor assured you would be provided by the
compiler vendor. (It may even want you to tell it where they are.) If so, you will have to go back to the
Add Module (or perhaps Add Library or even Add Source File) menu, and tell the compiler that your
project is made up of more than just the one source file you gave it. You will have to add (as a module or
something) the ``Standard'' or ``ANSI'' C libraries. To do so, you may end up going through a standard
file-opening dialog, and you may have to wander over to the compiler's installation directory to find the
library (it may be in a subdirectory or subfolder called ``LIB''). If you find several versions of the
libraries, having to do with different ``memory models'' (or beginning with different cryptic letters like S,
M, C, or L), then you'll have to pick the right one somehow, although I hope you don't have to, because
picking the right one depends on which memory model the compiler is compiling for, which would be
another compiler setting which I didn't talk about.
(I realize that this subsection, in fact all of this section, has been pretty generic, and not very ``step by
step''. Eventually, I may be able to learn enough about the popular Windows C compilation environments
to provide specific instructions for each of them. Until then, if you're having trouble, you may be able to
glean some hints from reading through the detailed instructions in the Macintosh section below, although
the discussion there is of course of a compiler other than the one you're using.)
Step 3: running the program
Depending on the style of program or executable you've built, there will be a couple different ways of
running your program. If you're running it within the compilation environment, it will be particularly
easy; you'll just select something like Run Program from the Project or Run menu. (In fact, it may be so
easy that steps 2 and 3 meld together.) If you've created a standalone executable, you'll have to return to
the DOS prompt (i.e. switch to a ``DOS box'') and type the program's name there, perhaps after changing
to the correct directory. If you're invoking a standalone executable from the DOS prompt, you can read
some notes about input and output redirection at the end of the MS-DOS section above.
3.5. Macintosh
3.5. Macintosh
I've saved the best for last! :-) Although not the total programmer's environment that Unix is, there are
some excellent C compilers available for the Mac. In fact, all of the examples and problem set solutions
for this class have been written and tested on my Macintosh, and all of the software I use to format and
typeset these course notes (including the handout you're reading now) was written by me, much of it on
the Mac. The compiler I use is Think C version 5.1, by Symantec. The rest of this section discusses that
compiler, but as far as I know, other Mac compilers (including Codewarrior) are quite similar.
In Think C, at any one time, you're always working with exactly one project, which is Think C's
metaphor for the program you're writing. The project is built from one or more sources, including the C
source files you'll create and type in. Once you've got things set up correctly, you simply select Run from
the Project menu to have all aspects of compiling and running your program taken care of automatically.
(You can also ``make'' your program without running it.)
When you first start up Think C, you're presented with a New Project dialog. You can name your project
anything you like, but don't add a .c to the end of the name, because as we'll see, we'll be creating
individual .c files later which are distinct from the ``project file''. (Later, once you've created a project,
you can start up the compiler ``on'' that project by double-clicking on the project file's icon, or you can
select Open Project from the Project menu.)
If you've just invoked the compiler for the first time, the default folder where you'll first end up trying to
store your own project and source files will probably end up being the compiler's installation directory.
This is not ideal, and Symantec recommends against it. Instead, you'll be much better off creating your
own folder somewhere else, where you'll keep all your own source files and projects that you've written
or typed in, and where they can stay separate from the compiler and all its supporting files.
Step 1: entering or editing your source code
Once you've opened a project, select New from the File menu. This opens a new source file editing
window, in which you can begin typing your source code. The source code editing window works like
any text editing window, except that it is streamlined and tailored for the typing and editing of C source
code. (For example, you may notice that the cursor automatically indents itself on each new line, as the
editor guesses the syntax and structure of the C program you're typing.)
Once you've successfully opened a source code editing window, type in the program (if it's an initial
example from the class notes or from a book) exactly as shown, including all punctuation, and with the
lines laid out just as they are in the code you're typing in. (Later, we'll learn what does and doesn't matter
in terms of program layout: where your degrees of freedom are, what the compiler does and doesn't care
so much about, which syntax you have to get exactly right and which parts allow you to express some
creativity. For now, though, just type everything verbatim.)
3.5. Macintosh
At some point, but definitely by the time you're finished typing, you'll need to select a name for the
source file, which will be something like hello.c. Select Save As from the File menu, and save the file
as normal. You'll want to place it in the same folder as the project file, which ought to be where it ends
up by default, but due to the way the Mac's standard open file dialog always seems to remember the last
folder you were in for any reason, you'll want to double check that you're still where you want to be
before saving files in Think C.
Later, when you have occasion to come back to an existing source file to edit it, you can either open it as
usual using Open from the File menu, or else double-click on its name in think C's ``project window'',
which is a list of all the sources that make up the current project. (Think C always displays this window
whenever a project is open; it's usually pretty small and narrow, and located in the top center or left of
the screen.)
Step 2: compiling the program
First, pull down the Edit menu and go to the Options dialog, and run through the various choices. For the
simple programs we'll be writing, most of the defaults will be adequate. But do turn off any extensions,
``Think Object'' or C++ or otherwise. (Compiler extensions, no matter how seductive they sound, are
unnecessary and would only get in the way, besides intruding on our goal of learning portable C.)
Once you've got a project set up, the easiest way way of compiling it is by simply selecting Run from the
Project menu, which compiles the program (if necessary) and then runs it, taking you directly to step 3.
But if you try that right away, you'll get an error saying that main is undefined, because even though
you've typed in a .c file containing your main function, Think C hasn't realized it's an official part of
your project yet.
There are at least three ways of adding a source file to the current project:
1. Pull down Add... from the Source menu. Double-click on the desired file in the upper half of the
dialog (or select it and click Add) to place it in the lower list of files to be added. Then click
Done.
2. If the desired source file is the topmost (current) window, select Add (the one without the three
dots) from the Source menu.
3. If the desired source file is the topmost (current) window, select Compile from the Source menu.
If the compilation is successful, the source will be automatically added to the project menu.
You can use any one of these methods to add hello.c to the current project. (Generally, I find number
3 most convenient, because it's got a command-key shortcut.)
You're now ready to hit an annoying little feature of Think C, which is that it does not assume any
default, initial set of function libraries, ANSI standard or otherwise. This means that the second time you
attempt to build or run your program (after fixing the ``main undefined'' problem from when you tried to
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3e.html (2 of 4) [22/07/2003 5:36:46 PM]
3.5. Macintosh
run a project with no sources at all), you're likely to get error messages telling you that functions with
names like printf are undefined, in spite of the fact that your textbook and instructor assured you they
would be provided by the compiler vendor. You have to pull down Add... from the Source menu, go to
the folder (wherever it is) where you installed Think C, go to the C Libraries subfolder, and select the
ANSI library. (In other words, you add it as a second Source to your project. Here, ``source'' just means
``source of information for the project''; it doesn't mean ``C source code''.) (Also, once you've gone over
to the C Libraries folder to get the ANSI library, remember to go back to your own C programming
folder next time you have occasion to save a file, otherwise it might get lost over in the Think C folder.)
Step 3: running the program
When you select Run from the Project menu, Think C (obviously) runs your program. If your program
calls printf to generate any output (as all of our programs will do), a ``console window'' will
automatically be opened to display the output. If the program tries to read any input, you can type that
input into the same console window.
You can also pull down Build Application from the Project menu. This allows you to turn your program
into an actual Mac application. You'll be asked to pick a file name for the application, which will have to
be different from both the project file name and any source file (.c) names. (For this reason, it's common
to name project files ending in .prj or project or, to be cute, the pi symbol.)
When you create a standalone Mac application, you'll be able to invoke it by double-clicking on its icon,
as usual. (However, it probably won't look much like a Mac application once it's running, because
assuming it just calls printf, it will still open up the same console window, and it won't have any
fancy menus or any dialogs at all.)
Later, when it comes time to write some sample programs which try to read their command line
arguments, you will obviously have a problem typing these arguments, because the Mac of course has no
command line. Think C does incorporate a mechanism for passing command lines to a program (because
even though doing so is entirely un-Mac-like, many C programs, especially those written during
introductory classes, still expect them). To get an ``application'' built by Think C to prompt you for a
command line, you have to do two things:
1. Place the line
#include <console.h>
at the top of the source file containing your main function, along with the other #include
lines.
2. Define your main function like this:
main(int argc, char *argv)
http://www.eskimo.com/~scs/cclass/handouts/cstepbystep/sx3e.html (3 of 4) [22/07/2003 5:36:46 PM]
3.5. Macintosh
{
... any local variable declarations go here ...
argc = ccommand(&argv);
... rest of program ...
In other words, the first executable statement of your main function must be the assignment argc =
ccommand(&argv); . The call to the ccommand function causes a dialog box to pop up in which
you can type a command line, the contents of which will then be available to the rest of your code
through the argc and argv variables, as usual. Also, the dialog gives you a way to redirect the input
and output of your program (see the Unix and MS-DOS sections above for a bit more information on
input/output redirection). But don't worry about any of this until we get to the topic of command line
arguments, towards the end of the class.
http://www.eskimo.com/~scs/cclass/handouts/errmsgs/index.html
Since C was originally a language designed for programmers who considered themselves experts, to this
day a C compiler can be a somewhat cryptic beast to install and configure correctly, and to understand
when it begins to complain about problems (it thinks) it has detected in your programs. The error
messages that a C compiler prints usually have a direct relationship to some distinct problem in your
program, but until you learn what they mean, it's not always immediately obvious what the messages are
trying to tell you. This handout summarizes some of the error and warning messages commonly printed
by C compilers, and gives you a slightly more comprehensive explanation of what they're really saying.
(As we'll see, sometimes what they're saying is not that there's a problem with your program, but rather
with the compiler itself.)
If your compiler prints some cryptic message which you can't find in this list, please let me know, so that
I can add it.
One side note: sometimes, a compiler gets so badly confused by something weird which you've done that
it goes off the deep end, spewing messages which have nothing to do with actual problems in your
program (or which may have nothing to do with your program at all). If your compiler has printed scores
of error messages, and you've dealt with the first few (that is, corrected the problems) but can't figure out
what the rest are referring to, try compiling again anyway. It may be that the first problem(s) you fixed
will make all (or most) of the other messages go away.
Read Sequentially
``syntax error''
This is the granddaddy of all compiler error messages, although it can be less than useful. It
means that there's something wrong with the syntax of your program (yes, I know, you figured
that much out already). Roughly speaking, ``syntax'' refers to the typographical characters you've
used to express your program and the order you've arranged them in. If parts of your program are
in the wrong order, or if some required punctuation (such as a comma, semicolon, brace, or
parenthesis) is missing, you're liable to get this error. The actual error is not always on the line the
compiler indicates; sometimes, a problem on one line isn't noticed right away but percolates down
and triggers an error message a few lines later.
You may also get this error if you're trying to use an old, pre-ANSI compiler (such as cc on many
Unix systems) to compile a newer, ANSI-compatible program. (Try using acc or gcc instead.)
``undefined symbol x''
``undeclared variable x''
``x undefined''
All of these mean that you have apparently tried to use a variable named x (or, of course,
whatever name the compiler is complaining about) without declaring it. C requires that all
variables be declared before use; the declaration informs the compiler in advance of the name and
type of each variable that you will be using. (If the compiler is complaining about a variable
which you did think you'd declared, it may be that you misspelled it either where you declared it
or where you used it, or that you declared it in a different part of your program than the part where
you're trying to use it.)
You might wonder why it's illegal to have a newline in a string constant (and why it is that you
have to use the \n sequence instead). The reason is precisely so that the compiler can give you
this error message when you accidentally leave out a closing quote. Otherwise, the compiler
might try to interpret the whole rest of your source file as one very long string constant.
``bad lvalue'' (or ``missing lvalue'')
lvalue is a piece of compiler writer's jargon which refers to an object with a location. (Actually, its
original derivation had to do with the fact that lvalues are what appear on the left hand side of an
assignment operator.) What this message means to you and me is that the compiler was trying to
assign to or modify some variable, but there was no variable there for it to modify (or, perhaps,
for some reason the particular variable could not be modified). For example, if you accidentally
wrote
1 = a + b
(that's a number 1 there, not a letter l), the compiler would complain, because the number 1 is not
a variable which it's possible for you to assign a new value to.
``can't find header file xxx.h''
This message means that one of your source files contains a line like
#include <xxx.h>
or
#include "xxx.h"
but the compiler (specifically, the preprocessor) could not find the header file xxx.h.
If the header file is one of your own (usually one that you've used the #include "" form with),
make sure that you have actually composed (written) the header file, and that it is in the correct
directory--usually, the same directory as the rest of your source files.
If the header file is a Standard one (usually one that you've used the #include <> form with),
such as stdio.h, then there's something wrong with the way the compiler has been installed or
configured. You may have to go into a configuration screen and set the directory name where the
standard header files are kept. (Of course, to do this, you will have to know where that directory
is! It usually has a name like ``include'', and is located somewhere within the compiler's
installation tree.)
(Finally, if the preprocessor is complaining about a misspelled header name, simply fix the
misspelling.)
``undefined external: xxx''
This message means that during the linking stage, a function (or, occasionally, a global variable)
by the name of xxx could not be found. There can be several causes, depending on whether the
function is one of your own, or was one which you had expected would be provided for you.
If the function name is one of your own, you'll have to figure out why the linker couldn't find it.
Did you provide a defining instance (not just an external declaration) for the function? Did you
enter its source code into the same file(s) as the rest of the program? If not, did you tell the linker
where to find it? You'll either have to write the function if you forgot to, or adjust the program's
build procedure (perhaps by editing the ``Makefile'' or adding a ``module'' to the ``project list'').
If the function name is one from the Standard C library, such as printf, which was supposed to
be provided for you, you'll have to figure out why the linker couldn't find it. Occasionally, the
problem is that the compiler doesn't know where the Standard library is. You may have to go into
a configuration screen and set the path or directory name of the library. (Of course, to do this, you
will have to know where the library is! It usually has a name containing the string ``libc'', such as
libc.a or SLIBC.LIB. The libraries--there may be several--are often located in a subdirectory
named ``lib'', somewhere within the compiler's installation tree.)
If the undefined function's name is close but not exactly the same as one of your own functions or
one of the Standard library functions, it obviously means that you've mistyped its name in a
function call somewhere. All you have to do is find that typo in whichever source file it's in, fix it,
and recompile.
If the function is a math-related one, such as sin, cos, or sqrt, and you're using a Unix system,
make sure that you've used the -lm option, at the end of the command line, when
compiling/linking.
If the function name printed in the error message includes a leading underscore, don't worry about
the underscore. (Under some compilers, the names manipulated by the linker have this underscore
added by the compiler, although you continue to refer to the function by the name you thought
you had given it.) In most cases, if the linker complains about ``_myfunc'' being undefined, it's
``myfunc'' you've got to worry about.
If the name is ``_end'', you'll rarely have to worry about it; this is a symbol used internally by the
traditional Unix linkers, and shows up as ``undefined'' only when there are also other symbols
undefined. If you fix all of the others, the mysterious undefined ``_end'' will also disappear.
#include <stdio.h>
near the top of the file, because <stdio.h> contains the prototype for printf (and many other
functions). For your own functions, you will have to supply your own prototypes. (You may wish
to place them in your own header files, and use #include "" to bring them in wherever they're
needed. Using header files in this way is generally more reliable than manually typing copies of
the prototype into each source file where a particular function of yours is called.)
If, for whatever reason, you'd rather that the compiler not check function calls for you (that is, if
you think that you can get them right yourself), you can usually configure a compiler not to
complain about function calls with no prototype in scope.
top
We won't be doing much floating-point work, but it doesn't hurt to say ``yes''. (The worst that can
happen is that if your machine already has a floating-point coprocessor, the programs that you
build will be slightly bigger--by a few K--if they include a software floating-point emulation
library which you don't really need. But it certainly won't hurt.)
``Which memory models?''
Under MS-DOS, for arcane reasons which I'll avoid the temptation to editorialize about, you have
your choice among programming in at least four different ``memory models'': ``small,''
``compact,'' ``medium,'' and ``large'' (and perhaps also ``tiny'' and ``huge''). For our purposes, the
programs we'll be writing are so small and simple that it absolutely won't make a difference which
model we use. If disk space is not a problem and there's a chance you'll be working on larger
programs later, pick both ``small'' and ``large'' models, and ignore the rest. If all you'll be writing
in the foreseeable future is the small programs for this class, just pick ``small'' model and be done
with it. (You can always go back and re-install the support for the other memory models later, if
for some reason you ever need them.)
``Include xxx library?''
(``xxx'' might be ``graphics,'' ``OWL'' ``networking,'' or some other keyword which you do or
don't recognize.) It's probably safe to say ``no'' to any of these special libraries; I'm not aware of
any special libraries which would be needed by any compilers to deal with the simple programs
we'll be writing.
``Delete component libraries after build?''
It's safe to say ``yes,'' or just use the default. (Saying ``yes'' conserves some disk space; you're not
going to need the ``component libraries'' later if the installation procedure has built you some
``combined libraries.'')
top
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
C was not used very much for floating-point work at first (i.e. back when it was still being developed and refined),
so its support for some aspects of floating-point and scientific work is a little weak. (On the other hand, C's support
for string processing and other data structures can make up the difference in a program that does some of each.)
There are two significant areas of concern. One is specific to C, and concerns the handling of multidimensional
arrays. The other is endemic in floating-point work done on computers, and concerns accuracy, precision, and
round-off error.
Functions and Multi-dimensional Arrays
We've seen that when a function receives a one-dimensional array as a parameter:
f(double a[])
{
...
}
the size of the array need not be specified, because the function is not allocating the array and does not need to
know how big it is. The array is actually allocated in the caller (and in fact, only a pointer to the beginning of the
array is passed to a function like f). If, in the function, we need to know the number of elements in the array, we
have to pass it as a second parameter:
f(double a[], int n)
{ ... }
If we wish to emphasize that it is really a pointer which is being passed, we can write it like this:
f(double *a, int n)
{ ... }
(In fact, if we declare the parameter as an array, the compiler pretends we declared it as a pointer, as in this last
form.)
In the one-dimensional case, it is actually a bit of an advantage that we do not need to specify the size of the array.
If f does something generic like ``find the average of the numbers in the array,'' it will clearly be useful to be able
to call it with arrays of any size. (If we had to write things like
f(double a[10])
{ ... }
then we might find it a real nuisance that f could only operate on arrays of size 10; we'd hate to have to write
several different versions of f, one for each size.)
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (1 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
Unfortunately, the situation is not nearly so straightforward for multi-dimensional arrays. If you had an array
double a2[10][20];
which you wanted to pass to a function g, what would g receive? We could declare g as
g(double a[10][20])
{ ... }
and let the compiler worry about it--that is, let it ignore the dimensions it doesn't need, and pretend that g really
receives a pointer. But what pointer does it receive? It's easy to imagine that g could also be declared as
g(double **a)
{ ... }
/* WRONG */
but this is incorrect. Notice that the array a2 that we're trying to pass to g is not really a multidimensional array; it
is an array of arrays (more precisely, a one-dimensional array of one-dimensional arrays). So in fact, what g
receives is a pointer to the first element of the array, but the first element of the array is itself an array! Therefore,
g receives a pointer to an array. Its actual declaration looks like this:
g(double (*a)[20])
{ ... }
(and this is what the compiler pretends we'd written if we instead write a[10][20] or a[][20]). This says that
the parameter a is a pointer to an array of 20 doubles. We can still use normal-looking subscripts: a[i][j] is
the j'th element in the i'th array pointed to by a. The problem is that no matter how we declare g and its array
parameter a, we have to specify the second dimension. If we write
g(double a[][])
the compiler complains that a needed dimension is missing. If we write
g(double (*a)[])
the compiler complains that a needed dimension is missing. We have to specify the ``width'' of the array, and if
we've got various sizes of arrays running around in our program, we're back with the problem of potentially
needing different functions to deal with different sizes of arrays.
Why wouldn't
g(double **a)
/* WRONG */
work? That says that a is a pointer to a pointer, but since we're passing an array of arrays, the function will receive
a pointer to an array. A pointer to an array is not compatible with a pointer to a pointer. (More on double **a
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (2 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
below.)
A second problem arises if you need any temporary arrays inside a function. Often, a temporary array needs to be
the same size as the passed-in arrays being worked on. Unfortunately, there's no way to declare an array and say
``just make it as big as the passed-in array.''
It's easy to imagine a solution to all of the problems we've been discussing. To write a function accepting a
multidimensional array with the size specified by the caller, you might want to write
g(double a[m][n], int m, int n)
/* won't work */
where the array dimensions are taken from other function parameters. (This is, of course, just how you do it in
FORTRAN.) To declare a temporary array, you might try things like
f(double a[n], int n)
{
double tmparray[n];
...
}
/* won't work */
/* won't work */
or
But you can't do any of these things in standard C. Every array dimension must be a compile-time constant. Array
parameters can't take their dimensions from other parameters, and local arrays can't take their dimensions from
parameters.
(It is worth noting that at least one popular compiler, namely gcc, does let you do these things: parameter and local
arrays can take their dimensions from parameters. If you make use of this extension, just be aware that your
program won't work under other compilers.)
How can we work around these limitations? The most powerful and flexible solution involves abandoning
multidimensional arrays and using pointers to pointers instead. This approach has the advantage that all
dimensions may be specified at run time, though it also has a few disadvantages. (Namely: you may have to
remember to free the arrays, especially if they're local; the intermediate pointers take up more space, and accessing
``array'' elements via pointers-to-pointers may be slightly less efficient than true subscript references. On the other
hand, on some machines, pointer accesses are more efficient. Another disadvantage is that our treatise on scientific
programming in C is about to digress into a treatise on memory allocation.)
To illustrate, here is a pair of functions for allocating simulated one- and two-dimensional arrays of double. (By
``simulated'' I mean that these are not true arrays as the compiler knows them, but because of the way arrays and
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (3 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
pointers work in C, we are going to be able to treat these pointers as if they were arrays, by using [] array
subscript notation.)
#include <stdio.h>
#include <stdlib.h>
double *alloc1d(int n)
{
double *dp = (double *)malloc(n * sizeof(double));
if(dp == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
return dp;
}
double **alloc2d(int m, int n)
{
int i;
double **dpp = (double **)malloc(m * sizeof(double *));
if(dpp == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
for(i = 0; i < m; i++) {
if((dpp[i] = (double *)malloc(n * sizeof(double))) == NULL) {
fprintf(stderr, "out of memory\n");
exit(1);
}
}
return dpp;
}
These routines print an error message and exit if they can't allocate the requested memory. Another approach
would be to have them return NULL, and have the caller check the return value and take appropriate action on
failure. (In this case, alloc2d would want to ``back out'' by freeing any intermediate pointers it had successfully
allocated.)
Note that these routines allocate arrays of type double. They could be trivially rewritten to allocate arrays of
other types. (Note, however, that it is impossible in portable, standard C to write a routine which generically
allocates a multidimensional array of arbitrary type, or with an arbitrary number of dimensions.)
When a one-dimensional array allocated by alloc1d is no longer needed, it can be freed by passing the pointer to
the standard free routine. Freeing two-dimensional arrays allocated by alloc2d is a bit trickier, because all the
intermediate pointers need to be freed, too. Here is a function to do it:
void free2d(double **dpp, int m)
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (4 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
{
int i;
for(i = 0; i < m; i++)
free((void *)dpp[i]);
free((void *)dpp);
}
(The casts in the malloc and free calls in these routines are not necessary under ANSI C, but are included to
make it easier to port this code to a pre-ANSI compiler, if necessary.)
With these routines in place, we can dynamically allocate one- and two-dimensional arrays of double to our
heart's content. Here is an (artificial) example:
double *a1 = alloc1d(10);
double **a2 = alloc2d(10, 20);
a1[0] = 0;
a1[9] = 9;
a2[0][0] = 0.0;
a2[0][19] = 0.19;
a2[9][0] = 9.0;
a2[9][19] = 9.19;
Now, when we pass a two-dimensional array allocated by alloc2d to a function, that function will be declared as
accepting a pointer-to-a-pointer:
g(double **a)
When we need a temporary array within a function, we can allocate one with alloc1d or alloc2d; however,
we must remember to free it before the function returns. (Remember that malloc'ed memory persists until
explicitly freed, or until the program exits. Local variables disappear when the function they're local to returns,
but in the case of pointers to malloc'ed memory, that just means that we lose the pointer, and not that the pointedto memory has been freed. We have to explicitly free it, before we lose the pointer.)
One last note: since
g1(double a[10][20])
is not compatible with
g2(double **a)
we will have to maintain a segregation between arrays-of-arrays and pointers-to-pointers. We can't write a single
function that operates on either a true multidimensional array or one of our pointer-to-pointer, simulated
multidimensional arrays. (Actually, there is one way, but it's more elaborate, and not strictly portable.)
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
That's about enough (for now, at least) about multidimensional arrays. We now turn to the subject of...
Floating-Point Precision
It's the essence of real numbers that there are infinitely many of them: not only infinitely large (in both the positive
and negative direction), but infinitely small and infinitely fine. Between any two real numbers like 1.1 and 1.2
there are infinitely many more numbers: 1.11, 1.12, etc., and also 1.101, 1.102, etc., and also 1.1001, 1.1002, etc.;
and etc. There is no way in a finite amount of space to represent the infinite granularity of real numbers. Most
computers use an approximation called floating point. In a floating point representation, we keep track of a base
number or ``mantissa'' with some degree of precision, and a magnitude or exponent. The parallel here with
scientific notation is close: The number actually represented is something like mantissa x
10<sup>exponent</sup>. (Actually, most computers use mantissa x 2<sup>exponent</sup>, and we'll soon
see how this difference can be important.)
The advantage of a floating-point representation is that it lets us devote a finite amount of space to each value but
still do a decent job of approximating the real numbers we're interested in. If we have six decimal digits of
precision, we can represent numbers like 123.456 or 123456 or 0.123456 or 0.000000123456 or 123456000000.
However, we can not accurately represent numbers which require more than six significant digits: we can't talk
about numbers like 123456.123456 or 0.1234567 or even 1234567. If we try to represent numbers like these (i.e. if
they occur in our calculations), they'll be ``rounded off'' somehow. (``Rounded off'' is in quotes because we don't
always know whether the unrepresentable digits will be properly rounded up or down, or randomly rounded up or
down, or simply discarded. High-quality floating-point processors and subroutine libraries will try to do a good job
of rounding when a result cannot be exactly represented, but unfortunately there are some low-quality ones out
there.)
In simple circumstances, the finite precision of floating-point numbers is not a problem. Real-world inputs (i.e.
anything you measure) never have infinite precision; in fact they often have only two or three significant digits of
precision. Almost everyone has a vague understanding that floating-point representations have finite precision, so
it's easy to shrug off slight inaccuracies as ``roundoff error.'' If these errors always stayed confined to the leastsignificant digit, they wouldn't be much of a problem.
The problem is that error can accumulate. In the worst case, you can lose one (or more than one) significant digit in
each step of a calculation. If a calculation involves six steps, and you only have six significant digits to play with,
by the end of the calculation, you've got nothing of significance left. As a wise programmer once said, ``Floating
point numbers are like piles of sand. Every time you move them around, you lose a little sand.''
How can an error grow to involve more than the least-significant digit? Here is one simple example. Suppose we
are computing the sum 1+1+1+1+1+1+1+1+1+1+1000000, and we have only six significant digits to play with.
Clearly the answer is 1000010, which has six significant digits, so we're okay. But suppose we compute it in the
other order, as 1000000+1+1+1+1+1+1+1+1+1+1. In the first step, 1000000+1, the intermediate result is 1000001,
which has seven significant digits and so cannot be represented. It will probably be rounded back down to
1000000. So 1000000+1+1+1+1+1+1+1+1+1+1 = 1000000, and 1+1+1+1+1+1+1+1+1+1+1000000 !=
1000000+1+1+1+1+1+1+1+1+1+1. In other words, the associative law does not hold for floating-point addition.
Elementary school students often wonder whether it makes a difference which order you add the column of
numbers up in (and sometimes notice that it does, i.e. they sometimes make mistakes when they try it). But here
we see that, on modern digital computers, the folklore and superstition is true: it really can make a difference
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
Another thing to know about floating point numbers on computers is that, as mentioned above, they're usually
represented internally in base 2. It's well known that in base 10, fractions like 1/3 have infinitely-repeating decimal
representations. (In fact, if you truncate them, you can easily find anomalies like the one we discussed in the
previous paragraphs, such as that 0.333+0.333+0.333 is 0.999, not 1.) What's not so well known is that in base 2,
the fraction one tenth is an infinitely-repeating binary number: it's 0.00011001100110011<sub>2</sub>, where
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
the 0011 part keeps repeating. This means that almost any number that you thought was a ``clean'' decimal fraction
like 0.1 or 2.3 is probably represented internally as a messy infinite binary fraction which has been truncated or
rounded and hence is not quite the number you thought it was. (In fact, depending on how carefully routines like
atof and printf have been written, it's possible to have something like printf("%f", atof("2.3"))
print an ugly number like 2.299999.) Where you have to watch out for this is that if you multiply a number by 0.1,
it is not quite the same as dividing it by 10. Furthermore, if you multiply a number by 0.1 three times, it may be
different than if you'd multiplied it by 0.001 and different than if you'd divided it by 1000.
My purpose in bringing up these anomalies is not to scare you off of doing floating point work forever, but to drive
home the point that floating point inaccuracies can be significant, and cannot be brushed of as simple uncertainty
about the least-significant digit. With care, the inaccuracies can be confined to the least-significant digit, and the
care involves making sure that errors do not proliferate and compound themselves. (Also, I should again point out
that none of these problems are specific to C; they arise any time floating-point work is done with finite precision.)
In practice, anomalies crop up when both large and small numbers participate in the same calculation (as in the
1+1+1+1+1+1+1+1+1+1+1000000 example), and when large numbers are subtracted, and when calculations are
iterated many times. One thing to beware of is that many common algorithms magnify the effects of these
problems. For example, the definition of the standard deviation is the square root of the mean variance, namely
sqrt(sum(sqr(x[i] - mean)) / n)
where sum() and sqr() are summing and squaring operators (neither of which are standard C library functions,
of course) and where mean is
sum(x[i]) / n
(The statisticians out there know that the denominator in the first expression is not necessarily n but should usually
be n-1.)
Computing standard deviations from the definition can be a bit of a nuisance because you have to walk over the
numbers twice: once to compute the mean, and a second time to subtract it from each input number, square the
difference, and sum the squares. So it's quite popular to rearrange the expression to
sqrt((sum(sqr(x[i])) - sqr(sum(x))/n) / n)
which involves only the sum of the x's and the sum of the squares of the x's, and hence can be completed in one
pass. However, it turns out that this method can in practice be significantly less accurate. By squaring each
individual x and accumulating sum(sqr(x[i])), the big numbers get bigger and the small numbers get
smaller, so the small numbers are more likely to get lost in the underflow. When you use the definition, it's only
the differences that get squared, so less of significance gets lost. (However, even working from the definition, if
the individual variances sum to near zero they can sometimes end up summing to a slightly negative number, due
to accumulated error, which is a problem when you try to take the square root.)
Other standard mathematical manipulations to watch out for are:
The quadratic formula:
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (8 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
If b<sup>2</sup> and 4ac are both large, there may be very few significant digits left in the determinant
b<sup>2</sup> - 4ac.
Solving systems of equations, inverting matrices, computing the determinant of matrices:
The standard mathematical techniques involve large numbers of compounded operations which can lost a lot of
accuracy.
Iterated simulations:
Obviously, any iteration is prone to compounded errors.
Trigonometric reductions
The only sane ways of computing trig functions numerically (i.e. the ways likely to be used by library functions
such as sin, cos, and tan) work only for the one period of the function near 0. These functions will of course
work on arguments outside the interval (-pi, pi), but the function must first reduce the arguments by subtracting
some multiple of 2*pi, and as we've seen, taking the difference of two large numbers (whether you do it or the
library function does it) can lose all significance.
Compounded interest
The standard formula is something like
principle x (1 + rate)<sup>n</sup>
but raising a number near 1 to a large power can be inaccurate.
Converting strings to numbers:
[okay, so this is a computer science problem, not a mathematical problem] If you handle digits to the right of the
decimal by repeatedly multiplying by 0.1, you'll lose accuracy.
For some of these problems, using double instead of single precision makes the difference, but for others, great
care and rearrangement of the algorithms may be required for best accuracy.
Returning to C, you might like to know that type float is guaranteed to give you the equivalent of at least 6
decimal digits of significance, and double (which is essentially C's version of the ``double precision'' mentioned
in the preceding paragraph) is guaranteed to give you at least 10. Both float and double are guaranteed to
handle exponents in at least the range -37 to +37.
One other point relating to numerical work and specific to C is that C does not have a built-in exponentiation
operator. It does have the library function pow, which is the right thing to use for general exponents, but don't use
it for simple squares or cubes. (Use simple multiplication instead.) pow is often implemented using the identity
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html (9 of 10) [22/07/2003 5:37:03 PM]
http://www.eskimo.com/~scs/cclass/handouts/sciprog.html
Copyright
This collection of hypertext pages is Copyright 1995-1996 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
When the contents of a web page are not static, but rather are generated on-the-fly as the page is fetched, a program runs
to generate the contents. When that program requires input from the client who is actually fetching the page (input such as
the selections made when filling out a form) that input is propagated to the program via the Common Gateway Interface,
or CGI.
Programs which generate web pages on-the-fly (commonly called ``CGI programs'') can be written in any language. Perl,
an interpreted or scripting language, is currently the language of choice for writing CGI programs (also called ``CGI
scripts'' if they're written in a scripting language), but it's eminently feasible, not difficult, and possibly advantageous to
write them in C, which the rest of this handout will describe how to do.
The basic operation of a CGI program is quite simple. Since its job is to generate a web page on the fly, that's just what it
must do, by ``printing'' text to its standard output. When we're writing a CGI program in C, therefore, we'll be calling
printf a lot, to generate the text we want the virtual page (that is, the one we're building) to contain. Some of the
printf calls will print constant or ``boilerplate'' text; for example, the first few lines of almost any C-written CGI
program will be along the lines of
printf("Content-Type: text/html\n\n");
printf("<html>\n");
printf("<head>\n");
The <html> and <head> tags are the ones that appear at the beginning of any HTML page. The first line informs the
receiving browser that the page is encoded in HTML. For static pages, that line is automatically added by the web server
(often as a function of the file name, i.e. if it ends in .html or perhaps .htm), which is why you never see it in static
HTML files. But since CGI programs are generating pages on the fly, they must include that line explicitly. (The content
type is almost invariably text/html, unless you're getting really fancy and generating images on the fly. Notice that the
first, Content-Type: line is followed by two newlines, creating a blank line which separates this ``header'' from the
rest of the HTML document.)
Of course, not all of the printf calls will print constant text (or else the resulting page will essentially be static, and
needn't have been generated by a CGI program at all). Therefore, many of the printf calls will typically insert some
variable text into the output. For example, if title is a string variable containing a title which has been generated for the
generated page, one of the next lines in the CGI program might be
printf("<title>%s</title>\n", title);
Many of the decisions controlling the output of a particular run of a CGI program will of course be made based on
selections made by the requesting user (perhaps by having filled out a form). The most important thing to understand
about CGI programming (in fact, the very aspect of CGI programming which gives it its name, that is, the aspect which
the CGI specification specifies) is the set of mechanisms by which the user's choices and other information are made
available to a CGI program.
One mechanism is by means of the environment. The environment is a set of variables maintained, not by any one
program, but by the operating system itself, on behalf of a user or process. For example, on MS-DOS and Unix systems,
the PATH variable contains a list of directories to be searched for executable programs. On Unix systems, other common
environment variables are HOME (the user's home directory), TERM (the user's terminal type) and MAIL (the user's
electronic mailbox).
Environment variables are completely separate from the variables used in programs written in typical programming
languages, such as C. When a C program wishes to access an environment variable, it does not simply declare a C variable
http://www.eskimo.com/~scs/cclass/handouts/cgi.html (1 of 10) [22/07/2003 5:37:08 PM]
of the same name, and hope that the value will somehow be magically linked into the program. Instead, a C program must
call the library function getenv to fetch environment variables. The getenv function accepts a string which is the name
of an environment variable to be fetched, and returns a string which is the contents of the variable. (Environment variables
are always strings.) If the requested environment variable does not exist, getenv returns a null pointer, instead.
For example, here is a scrap of code which would print out a user's terminal type (presumably on a Unix system):
char *term;
termtype = getenv("TERM");
if(termtype != NULL)
printf("your terminal type is \"%s\"\n", termtype);
else
printf("no terminal type defined\n");
Two things to be careful of when calling getenv are
1. As already mentioned, it returns a null pointer if the requested variable doesn't exist, and this null pointer return
must always be checked for; and
2. You shouldn't modify (scribble on) the string it returns, that is, you should treat it as a constant string.
A web daemon (a.k.a. HTTP server or httpd) which supports CGI passes a host of environment variables to a CGI
program. Many of them are somewhat esoteric, and this handout will not describe them all. (See the References at the
end.) The ones which are most likely to be of use to ordinary CGI programs are:
REQUEST_METHOD The specific request that the client used to fetch the page (that is, to invoke the CGI
program): usually GET or POST.
PATH_INFO Any extra path information that appeared after the program name in the URL that invoked this CGI
program.
PATH_TRANSLATED The path information from PATH_INFO, with any ``virtual-to-physical mapping''
performed on it.
SCRIPT_NAME The actual name of the CGI program that is running, as a URL, in case the generated page needs
to reference itself.
QUERY_STRING The query being performed by the user. In the case of form processing, the QUERY_STRING
variable contains encoded information about each field filled in by the user. (We'll have much more to say about
QUERY_STRING in a bit.)
REMOTE_HOST The hostname of the client (if known).
CONTENT_TYPE The HTTP/MIME content type of the attached information, for queries (such as POST) with
attached information.
CONTENT_LENGTH The size of the attached information, if any.
HTTP_USER_AGENT An indication of the name and version of the client browser in use.
HTTP_ACCEPT The specification by the client browser of which document types it will accept.
For simple uses of the above-listed variables, it suffices to call getenv and then inspect the returned string (if any). For
example, you could generate different output depending on which browser a user is using with code like this:
char *browser = getenv("HTTP_USER_AGENT");
if(browser != NULL && strstr(browser, "Mozilla") != NULL)
printf("I see you're using Netscape, just like everyone else.\n");
else
printf("Congratulations on your daring choice of browser!\n");
(However, this is not a terribly good example. Building knowledge of specific browsers into web pages and servers is a
risky proposition, as there will always be more browsers out there than you've heard of. Doing so also flies in the face of
the notion of interoperability, which is one of the very foundations of the Internet. If the protocols are well designed, and
if all clients and servers implement them properly, no specific server should ever need to know what kind of client it's
talking to, or vice versa.)
Environment variables are not the only input mechanism available in CGI. It's also possible for input to arrive via the
standard input, to be read with getchar or the like. In fact, in production CGI programming, the standard input is
heavily used, because it's where the user's query information arrives when the retrieval method (that is,
REQUEST_METHOD) is POST. Because it allows essentially unlimited-size requests, POST is strongly recommended as
the request method for HTML forms. It will be a bit easier, however, to describe and demonstrate the processing of query
information if we restrict ourselves to the environment variables (specifically, QUERY_STRING), so our examples will
use the request method of GET (which causes request information to be sent as a query string rather than on standard
input).
For simple queries (such as <ISINDEX> searches), the QUERY_STRING environment variable simply contains the query
string (although it will have been encoded; see below). For more complicated queries, however, and in particular for
queries resulting from filled-out and submitted HTML forms, the value of the QUERY_STRING variable is a complicated
string, with substructure, containing potentially many pieces of information. The basic syntax of the QUERY_STRING
string in this case is
name=value&name=value&name=value
The string is a series of name=value pairs, separated by ampersand characters. Each name is the name attribute from
one of the input elements in the form; the value is either the text typed by the user for an element with a type attribute
of text, password, or textarea, or one of the <option> values from a <select> element, or the value
attribute of a selected element of type checkbox or radio (or ``on'' for selected checkbox elements without value
attributes).
What if some text typed by the user (that is, one of the values) happened to contain an = or & character? That would
certainly screw up the syntax of the QUERY_STRING string. To guard against this problem, and to keep the query string a
single string, it is encoded in two ways:
1. All spaces are replaced by + signs.
2. All = and & characters, and any other special characters (such as + and % characters) are replaced by a percent sign
(%) followed by a two-digit hexadecimal number representing the character's value in ASCII. For example, = is
represented by %3d, & is represented by %26, + is represented by %2b, % is represented by %25, and ~ is often
represented by %7e.
A CGI program must arrange to decode (that is, undo) these encodings before attempting to make further use of the
information in the QUERY_STRING variable.
We're going to write a function to centralize the parsing of the QUERY_STRING variable, for three reasons:
1. Splitting the string into name/value pairs is a bit of a nuisance, and is best isolated from the calling code;
2. Decoding the + and % encodings is a bit of a nuisance, and is best isolated from the calling code; and
3. If we do it right, we can arrange that improving our programs later to use POST instead of GET (that is, to retrieve
query information from the standard input rather than the QUERY_STRING variable) will require rewriting only
this one function, not the rest of the program.
http://www.eskimo.com/~scs/cclass/handouts/cgi.html (3 of 10) [22/07/2003 5:37:08 PM]
<input type="submit">
<p>
(Or press this button to clear this form:
<input type="reset">)
</form>
</body>
The bulk of this page is a form, with one text input area, one set of three radio buttons, a submit button, and a clear button.
The user is supposed to type some text, select a radio button representing one of three transformations to be applied to the
text, and finally press the ``Submit'' button. Notice that the four <input> tags all have name attributes; these are the
names we'll retrieve the values by. (The names of the three radio button elements are the same; this is how the browser
knows to implement the one-of-n functionality on the group of three.)
The form header is
<form action="/cgi-bin/sillycgi" method="get">
If you try this form out, you may have to modify the URL in the action attribute, depending on how CGI programs are
accessed under your server. The method attribute is specified as ``get'', which means that our CGI program will receive
the user's form inputs in the QUERY_STRING variable.
Here is the CGI program itself. It's rather small and simple, because cgigetval does most of the hard work, which is as
it should be.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern char *cgigetval(char *);
main()
{
char *browser;
char *edittype;
char *text;
printf("Content-Type: text/html\n\n");
printf("<html>\n");
printf("<head>\n");
printf("<title>CGI test result</title>\n");
printf("</head>\n");
printf("<body>\n");
browser = getenv("HTTP_USER_AGENT");
printf("<p>\n");
printf("Hello, ");
if(browser != NULL && strstr(browser, "Lynx") != NULL)
printf("Lynx");
#include <ctype.h>
upperstring(char *str)
{
char *p;
for(p = str; *p != '\0'; p++)
{
if(islower(*p))
*p = toupper(*p);
}
}
lowerstring(char *str)
{
char *p;
for(p = str; *p != '\0'; p++)
{
if(isupper(*p))
*p = tolower(*p);
}
}
The isupper, islower, toupper, and tolower functions are all in the standard C library, and are declared in the
header file <ctype.h>. They operate on characters, with the obvious results. (Actually, on a modern, ANSI-compatible
system, the calls to isupper and islower are not required, but they don't hurt much. Older systems required them.)
Finally, here is the code for cgigetval. It is somewhat long and involved, partly because it has a significant amount of
work to do, partly because of some stubborn details of the sort that sometimes crop up in real-world programs.
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
char *unescstring(char *, int, char *, int);
char *cgigetval(char *fieldname)
{
int fnamelen;
char *p, *p2, *p3;
int len1, len2;
static char *querystring = NULL;
if(querystring == NULL)
{
querystring = getenv("QUERY_STRING");
if(querystring == NULL)
return NULL;
}
if(fieldname == NULL)
http://www.eskimo.com/~scs/cclass/handouts/cgi.html (7 of 10) [22/07/2003 5:37:08 PM]
return NULL;
fnamelen = strlen(fieldname);
for(p = querystring; *p != '\0';)
{
p2 = strchr(p, '=');
p3 = strchr(p, '&');
if(p3 != NULL)
len2 = p3 - p;
else
len2 = strlen(p);
if(p2 == NULL || p3 != NULL && p2 > p3)
{
/* no = present in this field */
p3 += len2;
continue;
}
len1 = p2 - p;
if(len1 == fnamelen && strncmp(fieldname, p, len1) == 0)
{
/* found it */
int retlen = len2 - len1 - 1;
char *retbuf = malloc(retlen + 1);
if(retbuf == NULL)
return NULL;
unescstring(p2 + 1, retlen, retbuf, retlen+1);
return retbuf;
}
p += len2;
if(*p == '&')
p++;
}
/* never found it */
return NULL;
}
cgigetval presents a perfect example of a topic which we mentioned in passing but never gave any realistic examples
of: local static variables. cgigetval's job is to parse the value of the QUERY_STRING environment variable, but the
value of that variable will remain constant over one run of a program calling cgigetval. Therefore, we only need to
call getenv once, the first time cgigetval is called. Thereafter, we can continue to use the previous value.
cgigetval maintains the value of QUERY_STRING in a local static variable, querystring. This variable is
initialized to NULL, so the first time cgigetval is called, it notices that querystring is null, and calls getenv to
properly initialize it. The whole point of a local static variable is that it retains its value from call to call, so on subsequent
calls to cgigetval, querystring will not be null, and there will be no need to call getenv again.
The main loop in cgigetval loops over each name/value pair in the query string, one pair at a time. The standard
http://www.eskimo.com/~scs/cclass/handouts/cgi.html (8 of 10) [22/07/2003 5:37:08 PM]
library strchr function (declared in <string.h>) is used to locate the next = and & characters in the string. It would
be easier to overwrite the & and/or = characters with '\0', to isolate the name and value substrings as true 0-terminated
strings, but we can't modify querystring (which was obtained from getenv, and which we'll be using again next
time we're called). Instead, we compute the lengths of the entire name/value pair we're looking at and the name within that
pair (in len2 and len1 respectively), and use strncmp to compare each name against the name we're looking for.
(strncmp is a standard library function like strcmp, except that it only compares the first n characters.)
When (if) we find a matching name, it's time to decode and return the value. First, we call malloc to allocate the return
buffer, remembering to add 1 to the length, to leave room for the terminating \0. We conservatively estimate the size of
the buffer we'll need by using the length of the encoded string. If there are any % encodings, this means we'll allocate more
than we need (the decoded string will end up shorter than the encoded string), but it's too hard to figure out in advance
exactly how big a buffer we'd need, and in this case, the slight amount of wasted memory should be inconsequential.
If we don't find a matching string, we take another trip through the main loop. Each trip through the loop, p points at the
remainder of the string we're to examine. After discarding a name/value pair, therefore, we add len2 to p, which either
moves it to the & or to the \0 at the very end of query_string. If we're now pointing at an &, we increment p by one
more so that it points to the first character of the next name/value pair.
Here is the unescstring function. It accepts a pointer to the encoded string, the length of the encoded string, a pointer
to the location to store the decoded result, and the maximum size of the decoded result (so that it can double-check it's not
overflowing anything). We pass the encoded string in as a pointer plus a length, rather than as a 0-terminated string,
because in most cases it will actually be an unterminated substring, and the length we receive will be such that we stop
decoding just before the & that separates one name/value pair from the next.
static int xctod(int);
char *unescstring(char *src, int srclen, char *dest, int destsize)
{
char *endp = src + srclen;
char *srcp;
char *destp = dest;
int nwrote = 0;
for(srcp = src; srcp < endp; srcp++)
{
if(nwrote > destsize)
return NULL;
if(*srcp == '+')
*destp++ = ' ';
else if(*srcp == '%')
{
*destp++ = 16 * xctod(*(srcp+1)) + xctod(*(srcp+2));
srcp += 2;
}
else
*destp++ = *srcp;
nwrote++;
}
*destp = '\0';
return dest;
}
static int xctod(int c)
{
if(isdigit(c))
return c - '0';
else if(isupper(c))
return c - 'A' + 10;
else if(islower(c))
return c - 'a' + 10;
else
return 0;
}
The basic operation is simple: copy characters one by one from the source to the destination, converting + signs to spaces,
converting % sequences, and passing other characters through unchanged. It's mildly tricky to convert a % sequence to the
equivalent character value. Here, we use an auxiliary function xctod which converts one hexadecimal digit to its decimal
equivalent. (xctod is declared static because it's private to unescstring and to this source file.) xctod converts
the digit characters 0-9 to the values 0-9, and a-f or A-F to the values 10-15. (It does so by subtracting character
constants which provide the correct offsets, without our having to know the particular character set values.)
It would also be possible to convert the hexadecimal digits by calling sscanf, and thus dispense with xctod. The code
would look like this:
else if(*srcp == '%')
{
int x;
sscanf(srcp+1, "%2x", &x);
*destp++ = x;
srcp += 2;
}
The %2x in the sscanf format string means to convert a hexadecimal integer that's exactly two characters long.
And that's our introduction to writing CGI programs in C! I encourage you to type in the example and play with it
(assuming you have access to a web server where you can install CGI programs at all, of course.) If you would like more
information on CGI programming, you can read the page http://hoohoo.ncsa.uiuc.edu/cgi/overview.html (which is where I
learned everything I've presented here).
Week 1
Topics: basics -- simple example programs
Assignment
Answers
Week 2
Topics: data types and constants; variables, identifiers, and declarations; operators and expressions
Assignment
Answers
Week 3
Topics: arrays, more about declarations (incl. initialization), functions
Assignment
Answers
Week 4
Topics: assignment, increment, and decrement operators; character I/O; strings
Assignment
Answers
Week 5
Topics: C preprocessor
Assignment
http://www.eskimo.com/~scs/cclass/asgn.beg/index.html (1 of 2) [22/07/2003 5:37:10 PM]
Answers
Week 6
Topics: pointers, strings
Assignment
Answers
Week 7
Topics: array/pointer equivalence, memory allocation, more about strings
Assignment
Answers
Week 8
Topics: command line, file I/O
One last handout: notes, Chapter 14
Assignment #1
Assignment #1
Introductory C Programming
UW Experimental College
Assignment #1
Handouts:
Syllabus
Assignment #1
A Short Introduction to Programming
A Brief Refresher on Some Math Often Used in Computing
Class Notes, Chapter 1
Class Notes, Chapter 2
Available Compiler Options
Common Compiler Error Messages and Other Problems
Compiling C Programs: Step by Step
Reading Assignment:
A Short Introduction to Programming
Class Notes, Chapter 1
Class Notes, Chapter 2 (optional)
Compiling C Programs: Step by Step
Review Questions:
1.
2.
3.
4.
5.
6.
7.
Approximately what is the line #include <stdio.h> at the top of a C source file for?
What are some uses for comments?
Why is indentation important? How carefully does the compiler pay attention to it?
What are the largest and smallest values that can be reliably stored in a variable of type int?
What is the difference between the constants 7, '7', and "7"?
What is the difference between the constants 123 and "123"?
What is the function of the semicolon in a C statement?
Assignment #1
instructions in the notes should get you started. If you're using a commercial compiler on a home
computer, the compiler's instruction manuals should (really, must) tell you how to enter, compile, and run
a program. (In either case, the ``Compiling C Programs'' handout should help, too.)
Figure out how to print both the program's source code and its output, or how to send me both the source
code and the output by e-mail.
2. What do these loops print?
for(i = 0; i < 10; i = i + 2)
printf("%d\n", i);
for(i = 100; i >= 0; i = i - 7)
printf("%d\n", i);
for(i = 1; i <= 10; i = i + 1)
printf("%d\n", i);
for(i = 2; i < 100; i = i * 2)
printf("%d\n", i);
3. Write a program to print the numbers from 1 to 10 and their squares:
1
2
3
...
10
1
4
9
100
Assignment #1
Don't use ten printf statements; use two nested loops instead. You'll have to use braces around the
body of the outer loop if it contains multiple statements:
for(i = 1; i <= 10; i = i + 1)
{
/* multiple statements */
/* can go in here */
}
(Hint: a string you hand to printf does not have to contain the newline character \n.)
Copyright
This collection of hypertext pages is Copyright 1995-9 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
Assignment #1 Answers
Assignment #1 Answers
Introductory C Programming
UW Experimental College
Assignment #1 ANSWERS
Question 1. Approximately what is the line #include <stdio.h> at the top of a C source file for?
In the case of our first few programs, it lets the compiler know some important information about the
library function, printf. It also lets the compiler know similar information about other functions in the
``Standard I/O library,'' some of which we'll be learning about later. (It also provides a few I/O-related
definitions which we'll be using later.) We'll learn more about the #include directive when we cover
the C Preprocessor in a few weeks.
Question 2. What are some uses for comments?
Describing what a particular variable is for, describing what a function is for and how it works,
documenting the name, version number, purpose, and programmer of an entire program, explaining any
tricky or hard-to-understand part about a program, ...
Question 3. Why is indentation important? How carefully does the compiler pay attention to it?
Indentation is important to show the logical structure of source code, in particular, to show which parts-the ``bodies`` of loops, if/else statements, and other control flow constructs--are subsidiary to which
other parts.
Indentation is not specially observed by the compiler; it treats all ``white space'' the same. Code which is
improperly indented will give a human reader a mistaken impression of the structure of the code, an
impression potentially completely different from the compiler's. Therefore, it's important that the
punctuation which describes the block structure of code to the compiler matches the indentation.
Question 4. What are the largest and smallest values that can be reliably stored in a variable of type
int?
32,767 and -32,767.
Question 5. What is the difference between the constants 7, '7', and "7"?
Assignment #1 Answers
The constant 7 is the integer 7 (the number you get when you add 3 and 4). The constant '7' is a
character constant consisting of the character `7' (the key between `6' and `8' on the keyboard, which also
has a `&' on it on mine). The constant "7" is a string constant consisting of one character, the character
`7'.
Question 6. What is the difference between the constants 123 and "123"?
The constant 123 is the integer 123. The constant "123" is a string constant containing the characters
1, 2, and 3.
Question 7. What is the function of the semicolon in a C statement?
It is the statement terminator.
Exercise 3. Write a program to print the numbers from 1 to 10 and their squares.
#include <stdio.h>
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
printf("%d %d\n", i, i * i);
return 0;
}
Exercise 4. Write a program to print a simple triangle of asterisks.
#include <stdio.h>
int main()
{
int i, j;
for(i = 1; i <= 10; i = i + 1)
{
for(j = 1; j <= i; j = j + 1)
printf("*");
printf("\n");
http://www.eskimo.com/~scs/cclass/asgn.beg/PS1a.html (2 of 3) [22/07/2003 5:37:19 PM]
Assignment #1 Answers
}
return 0;
}
Assignment #2
Assignment #2
Introductory C Programming
UW Experimental College
Assignment #2
Handouts:
Assignment #2
Assignment #1 Answers
Class Notes, Chapter 3
Reading Assignment:
Class Notes, Chapter 2
Class Notes, Chapter 3, Secs. 3.1-3.2
Class Notes, Chapter 3, Secs. 3.3-3.6 (optional)
Review Questions:
1. What are the two different kinds of division that the / operator can do? Under what circumstances does
it perform each?
2. What are the definitions of the ``Boolean'' values true and false in C?
3. Name three uses for the semicolon in C.
4. What would the equivalent code, using a while loop, be for the example
for(i = 0; i < 10; i = i + 1)
printf("i is %d\n", i);
?
5. What is the numeric value of the expression 3 < 4 ?
6. Under what conditions will this code print ``water''?
if(T < 32)
printf("ice\n");
else if(T < 212)
printf("water\n");
else
printf("steam\n");
7. What would this code print?
Assignment #2
int x = 3;
if(x)
printf("yes\n");
else
printf("no\n");
8. (trick question) What would this code print?
int i;
for(i = 0; i < 3; i = i + 1)
printf("a\n");
printf("b\n");
printf("c\n");
Tutorial Section
(This section presents a few small but complete programs and asks you to work with them. If you're
comfortable doing at least some of the exercises later in this assignment, there's no reason for you to work
through this ``tutorial section.'' But if you're not ready to do the exercises yet, this section should give you some
practice and get you started.)
1. Type in and run this program, and compare its output to that of the original ``Hello, world!'' program
(exercise 1 in assignment 1).
#include <stdio.h>
int main()
{
printf("Hello, ");
printf("world!\n");
return 0;
}
You should notice that the output is identical to that of the original ``Hello, world!'' program. This shows
that you can build up output using multiple calls to printf, if you like. You mark the end of a line of
output (that is, you arrange that further output will begin on a new line) by printing the ``newline''
character, \n.
2. Type in and run this program:
#include <stdio.h>
int main()
{
int i;
Assignment #2
printf("statement 1\n");
printf("statement 2\n");
for(i = 0; i < 10; i = i + 1)
{
printf("statement 3\n");
printf("statement 4\n");
}
printf("statement 5\n");
return 0;
}
This program doesn't do anything useful; it's just supposed to show you how control flow works--how
statements are executed one after the other, except when a construction such as the for loop alters the
flow by arranging that certain statements get executed over and over. In this program, each simple
statement is just a call to the printf function.
Now delete the braces {} around statements 3 and 4, and re-run the program. How does the output
change? (See also question 8 above.)
3. Type in and run this program:
#include <stdio.h>
int main()
{
int i, j;
printf("start of program\n");
for(i = 0; i < 3; i = i + 1)
{
printf("i is %d\n", i);
for(j = 0; j < 5; j = j + 1)
printf("i is %d, j is %d\n", i, j);
printf("end of i = %d loop\n", i);
}
printf("end of program\n");
return 0;
}
This program doesn't do much useful, either; it's just supposed to show you how loops work, and how
loops can be nested. The outer loop runs i through the values 0, 1, and 2, and for each of these three
values of i (that is, during each of the three trips through the outer loop) the inner loop runs, stepping
the variable j through 5 values, 0 to 4. Experiment with changing the limits (initially 3 and 5) on the
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2.html (3 of 6) [22/07/2003 5:37:22 PM]
Assignment #2
two loops. Experiment with interchanging the two loops, that is, by having the outer loop manipulate the
variable j and the inner loop manipulate the variable i. Finally, compare this program to the triangleprinting program of Assignment 1 (exercise 4) and its answer as handed out this week.
4. Type in and run this program:
#include <stdio.h>
int main()
{
int day, i;
for(day = 1; day <= 3; day = day + 1)
{
printf("On the %d day of Christmas, ", day);
printf("my true love gave to me\n");
for(i = day; i > 0; i = i - 1)
{
if(i == 1)
{
if(day == 1) printf("A ");
else
printf("And a ");
printf("partridge in a pear tree.\n");
}
else if(i == 2)
printf("Two turtledoves,\n");
else if(i == 3)
printf("Three French hens,\n");
}
printf("\n");
}
return 0;
}
The result (as you might guess) should be an approximation of the first three verses of a popular (if
occasional tiresome) song.
a. Add a few more verses (calling birds, golden rings, geese a-laying, etc.).
b. Here is a scrap of code to print words for the days instead of digits:
if(day == 1)
printf("first");
else if(day == 2)
printf("second");
else if(day == 3)
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2.html (4 of 6) [22/07/2003 5:37:22 PM]
Assignment #2
printf("third");
Incorporate this code into the program.
c. Here is another way of writing the inner part of the program:
printf("On the %d day of Christmas, ", day);
printf("my true love gave to me\n");
if(day >= 3)
printf("Three French hens,\n");
if(day >= 2)
printf("Two turtledoves,\n");
if(day >= 1)
{
if(day == 1) printf("A ");
else
printf("And a ");
printf("partridge in a pear tree.\n");
}
Study this alternate method and figure out how it works. Notice that there are no else's between the
if's.
Exercises:
(As before, these range from easy to harder. Do as many as you like, but at least two or three.)
1. Write a program to find out (the hard way, by counting them) how many of the numbers from 1 to 10 are
greater than 3. (The answer, of course, should be 7.) Your program should have a loop which steps a
variable (probably named i) over the 10 numbers. Inside the loop, if i is greater than 3, add one to a
second variable which keeps track of the count. At the end of the program, after the loop has finished,
print out the count.
2. Write a program to compute the average of the ten numbers 1, 4, 9, ..., 81, 100, that is, the average of the
squares of the numbers from 1 to 10. (This will be a simple modification of Exercise 3 from last week:
instead of printing each square as it is computed, add it in to a variable sum which keeps track of the
sum of all the squares, and then at the end, divide the sum variable by the number of numbers summed.)
If you keep track of the sum in a variable of type int, and divide by the integer 10, you'll get an integeronly approximation of the average (the answer should be 38). If you keep track of the sum in a variable
of type float or double, on the other hand, you'll get the answer as a floating-point number, which
you should print out using %f in the printf format string, not %d. (In a printf format string, %d
prints only integers, and %f is one way to print floating-point numbers. In this case, the answer should
be 38.5.)
3. Write a program to print the numbers between 1 and 10, along with an indication of whether each is
even or odd, like this:
1 is odd
Assignment #2
2 is even
3 is odd
...
4.
5.
6.
7.
Assignment #2 Answers
Assignment #2 Answers
Introductory C Programming
UW Experimental College
Assignment #2 ANSWERS
Question 1. What are the two different kinds of division that the / operator can do? Under what
circumstances does it perform each?
The two kinds of division are integer division, which discards any remainder, and floating-point division,
which yields a floating-point result (with a fractional part, if necessary). Floating-point division is
performed if either operand (or both) is floating-point (that is, if either number being operated on is
floating-point); integer division is performed if both operands are integers.
Question 2. What are the definitions of the ``Boolean'' values true and false in C?
False is always represented by a zero value. Any nonzero value is considered true. The built-in operators
<, <=, >, >=, ==, !=, &&, ||, and ! always generate 0 for false and 1 for true.
Question 3. Name three uses for the semicolon in C.
Terminating declarations, terminating statements, and separating the three control expressions in a for
loop.
Question 4. What would the equivalent code, using a while loop, be for the example
for(i = 0; i < 10; i = i + 1)
printf("i is %d\n", i);
?
Just move the first (initialization) expression to before the loop, and the third (increment) expression to
the body of the loop:
i = 0;
while(i < 10)
{
printf("i is %d\n", i);
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2a.html (1 of 7) [22/07/2003 5:37:25 PM]
Assignment #2 Answers
i = i + 1;
}
Question 5. What is the numeric value of the expression 3 < 4 ?
1 (or ``true''), because 3 is in fact less than 4.
Question 6. Under what conditions will this code print ``water''?
if(T < 32)
printf("ice\n");
else if(T < 212)
printf("water\n");
else
printf("steam\n");
If T is greater than or equal to 32 and less than 212. (If you said ``greater than 32 and less than 212'', you
weren't quite right, and this kind of distinction--paying attention to the difference between ``greater than''
and ``greater than or equal''--is often extremely important in programming.)
Question 7. What would this code print?
int x = 3;
if(x)
printf("yes\n");
else
printf("no\n");
It would print ``yes'', since x is nonzero.
Question 8. What would this code print?
int i;
for(i = 0; i < 3; i = i + 1)
printf("a\n");
printf("b\n");
printf("c\n");
It would print
a
a
a
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2a.html (2 of 7) [22/07/2003 5:37:25 PM]
Assignment #2 Answers
b
c
The indentation of the statement printf("b\n"); is (deliberately, for the sake of the question)
misleading. It looks like it's part of the body of the for statement, but the body of the for statement is
always a single statement, or a list of statements enclosed in braces {}. In this case, the body of the loop
is the call printf("a\n");. The two statements printf("b\n"); and printf("c\n"); are
normal statements following the loop.
The code would be much clearer if the printf("b\n"); line were indented to line up with the
printf("c\n"); line. If the intent was that ``b'' be printed 3 times, along with ``a'', it would be
necessary to put braces {} around the pair of lines printf("a\n"); and printf("b\n"); .
Exercise 1. Write a program to find out how many of the numbers from 1 to 10 are greater than 3.
#include <stdio.h>
int main()
{
int i;
int count = 0;
for(i = 1; i <= 10; i = i + 1)
{
if(i > 3)
count = count + 1;
}
printf("%d numbers were greater than 3\n", count);
return 0;
}
Tutorial 4b. Print ``On the first day'' instead of ``On the 1 day'' in the days-of-Christmas program.
Simply replace the lines
printf("On the %d day of Christmas, ", day);
printf("my true love gave to me\n");
with
printf("On the ");
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2a.html (3 of 7) [22/07/2003 5:37:25 PM]
Assignment #2 Answers
if(day == 1) printf("first");
else if(day == 2) printf("second");
else if(day == 3) printf("third");
else if(day == 4) printf("fourth");
else if(day == 5) printf("fifth");
else if(day == 6) printf("sixth");
else
printf("%d", day);
printf(" day of Christmas, ");
printf("my true love gave to me\n");
The final
else printf("%d", day);
is an example of ``defensive programming.'' If the variable day ever ends up having a value outside the
range 1-6 (perhaps because we later added more verses to the song, but forgot to update the list of
words), the program will at least print something.
Exercise 2. Write a program to compute the average of the ten numbers 1, 4, 9, ..., 81, 100.
#include <stdio.h>
int main()
{
int i;
float sum = 0;
int n = 0;
for(i = 1; i <= 10; i = i + 1)
{
sum = sum + i * i;
n = n + 1;
}
printf("the average is %f\n", sum / n);
return 0;
}
Exercise 3. Write a program to print the numbers between 1 and 10, along with an indication of whether
each is even or odd.
#include <stdio.h>
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2a.html (4 of 7) [22/07/2003 5:37:25 PM]
Assignment #2 Answers
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
if(i % 2 == 0)
printf("%d is even\n", i);
else
printf("%d is odd\n", i);
}
return 0;
}
Exercise 4. Print out the 8 points of the compass, based on the x and y components of the direction of
travel.
if(x > 0)
{
if(y > 0)
printf("Northeast.\n");
else if(y < 0)
printf("Southeast.\n");
else
printf("East.\n");
}
else if(x < 0)
{
if(y > 0)
printf("Northwest.\n");
else if(y < 0)
printf("Southwest.\n");
else
printf("West.\n");
}
else
{
if(y > 0)
printf("North.\n");
else if(y < 0)
printf("South.\n");
else
printf("Nowhere.\n");
}
(You will notice that this code also prints ``Nowhere'' in the ninth case, namely when x == 0 and y
http://www.eskimo.com/~scs/cclass/asgn.beg/PS2a.html (5 of 7) [22/07/2003 5:37:25 PM]
Assignment #2 Answers
== 0.)
Exercise 5. Write a program to print the first 7 positive integers and their factorials.
#include <stdio.h>
int main()
{
int i;
int factorial = 1;
for(i = 1; i <= 7; i = i + 1)
{
factorial = factorial * i;
printf("%d
%d\n", i, factorial);
}
return 0;
}
The answer to the extra credit question is that 8!, which is how mathematicians write ``eight factorial,'' is
40320, which is not guaranteed to fit into a variable of type int. To write a portable program to compute
factorials higher than 7, we would have to use long int.
Exercise 6. Write a program to print the first 10 Fibonacci numbers.
#include <stdio.h>
int main()
{
int
int
int
int
i;
fibonacci = 1;
prevfib = 0;
tmp;
Assignment #2 Answers
return 0;
}
(There are many ways of writing this program. As long as the code you wrote printed 1, 1, 2, 3, 5, 8, 13,
21, 34, 55, it's probably correct. If the code you wrote is clearer than my solution above, so much the
better!)
Exercise 7. Make some improvements to the prime number printing program.
Here is a version incorporating both of the suggested improvements:
#include <stdio.h>
#include <math.h>
int main()
{
int i, j;
double sqrti;
printf("%d\n", 2);
for(i = 3; i <= 100; i = i + 2)
{
sqrti = sqrt(i);
for(j = 2; j < i; j = j + 1)
{
if(i % j == 0)
break;
if(j > sqrti)
{
printf("%d\n", i);
break;
}
}
}
return 0;
}
/* not prime */
/* prime */
Assignment #3
Assignment #3
Introductory C Programming
UW Experimental College
Assignment #3
Handouts:
Assignment #3
Answers to Assignment #2
Class Notes, Chapter 4
Class Notes, Chapter 5
Reading Assignment:
Class Notes, Chapter 3, Secs. 3.3-3.5
Class Notes, Chapter 4, Secs. 4.1-4.1.1, 4.3
Class Notes, Chapter 5, Secs. 5.1-5.3
(optional) Class Notes, Secs. 3.6, 4.1.2-4.2, 4.4, 5.4
Review Questions:
1. How many elements does the array
int a[5];
contain? Which is the first element? The last?
2. What's wrong with this scrap of code?
int a[5];
for(i = 1; i <= 5; i = i + 1)
a[i] = 0;
3. How might you rewrite the dice-rolling program (from the notes, chapter 4, p. 2) without arrays?
4. What is the difference between a defining instance and an external declaration?
5. What are the four important parts of a function? Which three does a caller need to know?
Assignment #3
Tutorial Section
1. Here is another nested-loop example, similar to exercise 4 of assignment 1, and to tutorial 3 of
assignment 2. This one prints an addition table for sums from 1+1 to 10+10.
/* print an addition table for 1+1 up to 10+10 */
#include <stdio.h>
int main()
{
int i, j;
/* print header line: */
printf(" ");
for(j = 1; j <= 10; j = j + 1)
printf(" %3d", j);
printf("\n");
/* print table: */
for(i = 1; i <= 10; i = i + 1)
{
printf("%2d", i);
for(j = 1; j <= 10; j = j + 1)
printf(" %3d", i + j);
printf("\n");
}
return 0;
}
The first j loop prints the top, header row of the table. (The initial printf(" "); is to make it
line up with the rows beneath, which will all begin with a value of i.) Then, the i loop prints the
rest of the table, one row per value of i. For each value of i, we print that value (on the left edge
of the table), and then print the sums resulting from adding that value of i to ten different values
of j (using a second, inner loop on j).
Make a simple modification to the program to print a multiplication table, or a subtraction table.
2. Here is yet another nested-loop example. It is very similar to the one above, except that rather
than printing the sum i+j, it determines whether the sum is even or odd, using the expression
(i+j) % 2 . The % operator, remember, gives the remainder when dividing, and an even
number gives a remainder of 0 when dividing by 2. Depending on whether the sum is even or odd,
the program prints an asterisk or a space. Type in and run the program, and look at the pattern that
results.
http://www.eskimo.com/~scs/cclass/asgn.beg/PS3.html (2 of 7) [22/07/2003 5:37:29 PM]
Assignment #3
#include <stdio.h>
int main()
{
int i, j;
for(i = 1; i <= 10; i = i + 1)
{
for(j = 1; j <= 10; j = j
{
if((i + j) % 2 ==
printf("*
else
printf("
}
printf("\n");
}
return 0;
}
+ 1)
0)
");
");
Assignment #3
return 0;
}
There's one slight trick in the declaration of the squares array. Remember that arrays in C are
based at 0. So if we wanted an array to hold the squares of 10 numbers, and if we declared it as
int squares[10]; the array's 10 elements would range from squares[0] to
squares[9]. This program wants to use elements from squares[1] to squares[10], so
it simply declares the array as having size 11, and wastes the 0th element.
The program also uses the tab character, \t, in its printouts, to make the columns line up.
Modify the program so that it also declares and fills in a cubes array containing the cubes (third
powers) of the numbers 1-10, and prints them out in a third column.
4. We're going to write a simple (very simple!) graphics program. We'll write a function
printsquare which prints a square (made out of asterisks) of a certain size. Then we'll use our
favorite for-i-equals-1-to-10 loop to call the function 10 times, printing squares of size 1 to 10.
#include <stdio.h>
int printsquare(int);
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
printsquare(i);
printf("\n");
}
return 0;
}
int printsquare(int n)
{
int i, j;
for(i = 0; i < n; i = i + 1)
{
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
}
return 0;
}
Assignment #3
Type the program in and run it. Then see if you can modify it to print ``open'' squares, like this:
****
* *
* *
****
instead of filled squares. You'll have to print the box in three parts: first the top row, then a
number of ``* *'' rows, and finally the bottom row. There's obviously no way to print ``open''
boxes of size 1 or 2, so don't worry about those cases. (That is, you can change the loop in the toplevel main function to for(i = 3; i <= 10; i = i + 1) .)
Assignment #3
x's. Use this function to rewrite the triangle-printing program of assignment 1 (exercise 4).
5. Write a function to compute the factorial of a number, and use it to print the factorials of the
numbers 1-7. (Extra credit: print the factorials of the numbers 1-10.)
6. (Kernighan and Ritchie) Write a function celsius() to convert degrees Fahrenheit to degrees
Celsius. (The conversion formula is C = 5/9 * (F - 32).) Use it to print a Fahrenheit-toCentigrade table for -40 to 220 degrees Fahrenheit, in increments of 10 degrees. (Remember that
%f is the printf format to use for printing floating-point numbers. Also, remember that the
integer expression 5/9 gives 0, so you won't want to use integer division.)
7. Modify the dice-rolling program (Sec. 4.1) so that it computes the average of all the rolls of the
pair of dice. Remember that integer division truncates, so you'll have to declare some of your
variables as float or double.
For extra credit, also compute the standard deviation, which can be expressed as
sqrt((sum(x*x) - sum(x)*sum(x)/n) / (n - 1))
(where the notation sum(x) indacates summing all values of x, as usually expressed with the
Greek letter sigma; there is notsuch a sum() function in C!) If you put the line
#include <math.h>
at the top of the file, you'll be able to call sqrt().
(See the note in Assignment #2, Exercise 5, about compiling with the math library under Unix.
There are better ways of computing the standard deviation, but the above expression will suffice
for our purposes.)
8. Write either the function
randrange(int n)
which returns random integers from 1 to n, or the function
int randrange2(int m, int n)
which returns random integers in the range m to n. Use your function to streamline the dicerolling program a bit.
The header file <stdlib.h> defines a constant, RAND_MAX, which is the maximum number
returned by the rand() function. A better way of reducing the range of the rand() function is
like this:
rand() / (RAND_MAX / N + 1)
Assignment #3
**
*****
****
**********
***************
****************************
************
*********
*******
*****
***
Assignment #3 Answers
Assignment #3 Answers
Introductory C Programming
UW Experimental College
Assignment #3 ANSWERS
Question 1. How many elements does the array int a[5] contain? Which is the first element? The
last?
The array has 5 elements. The first is a[0]; the last is a[4].
Question 2. What's wrong with the scrap of code in the question?
The array is of size 5, but the loop is from 1 to 5, so an attempt will be made to access the nonexistent
element a[5]. A correct loop over this array would run from 0 to 4.
Question 3. How might you rewrite the dice-rolling program without arrays?
About all I can think of would be to declare 11 different variables:
int roll2, roll3, roll4, roll5, roll6, roll7, roll8, roll9,
roll10, roll11, roll12;
sum = d1 + d2;
if(sum == 2)
roll2 =
else (sum == 3)
roll3 =
else (sum == 4)
roll4 =
else (sum == 5)
roll5 =
...
roll2 + 1;
roll3 + 1;
roll4 + 1;
roll5 + 1;
What a nuisance! (Fortunately, we never have to write code like this; we just use arrays, since this sort of
situation is exactly what arrays are for.)
Assignment #3 Answers
Question 4. What is the difference between a defining instance and an external declaration?
A defining instance is a declaration of a variable or function that actually defines and allocates space for
that variable or function. In the case of a variable, the defining instance may also supply an initial value,
using an initializer in the declaration. In the case of a function, the defining instance supplies the body of
the function.
An external declaration is a declaration which mentions the name and type of a variable or function
which is defined elsewhere. An external declaration does not allocate space; it cannot supply the initial
value of a variable; it does not need to supply the size of an array; it does not supply the body of a
function. (In the case of functions, however, an external declaration may include argument type
information; in this case it is an external prototype declaration.)
Question 5. What are the four important parts of a function? Which three does a caller need to know?
The name, the number and type of the arguments, the return type, and the body. The caller needs to know
the first three.
Tutorial 3. Modify the array-of-squares program to also print cubes.
#include <stdio.h>
int main()
{
int i;
int squares[11];
/* [0..10]; [0] ignored */
int cubes[11];
/* fill arrays: */
for(i = 1; i <= 10; i = i + 1)
{
squares[i] = i * i;
cubes[i] = i * i * i;
}
/* print table: */
printf("n\tsquare\tcube\n");
for(i = 1; i <= 10; i = i + 1)
printf("%d\t%d\t%d\n", i, squares[i], cubes[i]);
return 0;
}
Tutorial 4. Rewrite the simple graphics program to print ``open'' boxes.
Assignment #3 Answers
I made a new version of the original printsquare function called printbox, like this:
int printbox(int n)
{
int i, j;
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
for(i = 0; i < n-2; i = i + 1)
{
printf("*");
for(j = 0; j < n-2; j = j + 1)
printf(" ");
printf("*\n");
}
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
return 0;
}
A box of size 1 or 2 doesn't have all three parts. If you want the function to handle those sizes more
appropriately, here are the necessary modifications:
int printbox(int n)
{
int i, j;
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
for(i = 0; i < n-2; i = i + 1)
{
printf("*");
for(j = 0; j < n-2; j = j + 1)
printf(" ");
if(n > 1)
printf("*\n");
}
if(n > 1)
{
for(j = 0; j < n; j = j + 1)
printf("*");
printf("\n");
http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html (3 of 12) [22/07/2003 5:37:32 PM]
Assignment #3 Answers
}
return 0;
}
Exercise 1. Write code to sum the elements of an array of int.
Here is a little array-summing function:
int sumnum(int a[], int n)
{
int i;
int sum = 0;
for(i = 0; i < n; i = i + 1)
sum = sum + a[i];
return sum;
}
Here is a test program to call it:
#include <stdio.h>
int a[] = {1, 2, 3, 4, 5, 6};
int sumnum(int [], int);
int main()
{
printf("%d\n", sumnum(a, 6));
return 0;
}
Exercise 2. Write a loop to call the multbytwo function on the numbers 1-10.
#include <stdio.h>
int multbytwo(int);
int main()
{
int i;
for(i = 1; i <= 10; i++)
printf("%d %d\n", i, multbytwo(i));
return 0;
http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html (4 of 12) [22/07/2003 5:37:32 PM]
Assignment #3 Answers
}
Exercise 3. Write a square() function and use it to print the squares of the numbers 1-10.
The squaring function is quite simple:
int square(int x)
{
return x * x;
}
Here is a loop and main program to call it:
#include <stdio.h>
int square(int);
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
printf("%d %d\n", i, square(i));
return 0;
}
Exercise 4. Write a printnchars function, and use it to rewrite the triangle-printing program.
Here is the function:
void printnchars(int ch, int n)
{
int i;
for(i = 0; i < n; i++)
printf("%c", ch);
}
Here is the rewritten triangle-printing program:
#include <stdio.h>
Assignment #3 Answers
int main()
{
int i;
for(i = 1; i <= 10; i = i + 1)
{
printnchars('*', i);
printf("\n");
}
return 0;
}
Exercise 5. Write a function to compute the factorial of a number, and use it to print the factorials of the
numbers 1-7.
Here is the function:
int factorial(int x)
{
int i;
int fact = 1;
for(i = 2; i <= x; i = i + 1)
fact = fact * i;
return fact;
}
Here is a driver program:
#include <stdio.h>
int factorial(int);
int main()
{
int i;
for(i = 1; i <= 7; i = i + 1)
printf("%d
%d\n", i, factorial(i));
return 0;
}
Assignment #3 Answers
The answer to the ``extra credit'' problem is that to portably compute factorials beyond
factorial(7), it would be necessary to declare the factorial() function as returning long
int (and to declare its local fact variable as long int as well, and to use %ld in the call to
printf). 8! (``eight factorial'') is 40320, but remember, type int is only guaranteed to hold integers up
to 32767.
(Some machines, but not all, have ints that can hold more than 32767, so computing larger factorials on
those machines would happen to work, but not portably. Some textbooks would tell you to ``use long
int if your machine has 16-bit ints,'' but why write code two different ways depending on what kind
of machine you happen to be using today? I prefer to say, ``Use long int if you would like results
greater than 32767.'')
Exercise 6. Write a function celsius() to convert degrees Fahrenheit to degrees Celsius. Use it to
print a Fahrenheit-to-Centigrade table for -40 to 220 degrees Fahrenheit, in increments of 10 degrees.
Here is the function:
double celsius(double f)
{
return 5. / 9. * (f - 32);
}
Here is the driver program:
#include <stdio.h>
double celsius(double);
int main()
{
double c, f;
for(f = -40; f <= 220; f = f + 10)
printf("%f\t%f\n", f, celsius(f));
return 0;
}
Exercise 7. Modify the dice-rolling program so that it computes the average (and, optionally, the
standard deviation) of all the rolls of the pair of dice.
The new code involves declaring new variables sum, n, and mean (and, for the extra credit problem,
sumsq and stdev), adding code in the main dice-rolling loop to update sum and n (and maybe also
http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html (7 of 12) [22/07/2003 5:37:32 PM]
Assignment #3 Answers
sumsq), and finally adding code at the end to compute the mean (and standard deviation) and print them
out.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
double sum = 0;
double sumsq = 0;
int n = 0;
double mean;
double stdev;
for(i = 2; i <= 12; i = i + 1)
a[i] = 0;
for(i = 0; i < 100; i = i + 1)
{
d1 = rand() % 6 + 1;
d2 = rand() % 6 + 1;
a[d1 + d2] = a[d1 + d2] + 1;
sum = sum + d1 + d2;
sumsq = sumsq + (d1 + d2) * (d1 + d2);
n = n + 1;
}
for(i = 2; i <= 12; i = i + 1)
printf("%d: %d\n", i, a[i]);
printf("\n");
mean = sum / n;
stdev = sqrt((sumsq - sum * sum / n) / (n - 1));
printf("average: %f\n", mean);
printf("std. dev.: %f\n", stdev);
return 0;
}
Exercise 8. Write a randrange function.
http://www.eskimo.com/~scs/cclass/asgn.beg/PS3a.html (8 of 12) [22/07/2003 5:37:32 PM]
Assignment #3 Answers
Assignment #3 Answers
The various ``fudge factors'' in these expressions deserve some explanation. The first one is
straightforward: The + 1 in (n - m + 1) simply gives us the number of numbers in the range m to n,
including m and n. (Leaving out the + 1 in this case is the classic example of a fencepost error, named
after the old puzzle, ``How many pickets are there in a picket fence ten feet long, with the pickets one
foot apart?'')
The other +1 is a bit trickier. First let's consider the second implementation of randrange. We want to
divide rand's output by some number so that the results will come out in the range 0 to n - 1. (Then
we'll add in 1 to get numbers in the range 1 through n.) Left to its own devices, rand will return
numbers in the range 0 to RAND_MAX (where RAND_MAX is a constant defined for us in <stdlib.h>).
The division, remember, is going to be integer division, which will truncate. So numbers which would
have come out in the range 0.0 to 0.99999... (if the division were exact) will all truncate to 0, numbers
which would have come out in the range 1.0 to 1.99999... will all truncate to 1, 2.0 to 2.99999... will all
truncate to 2, etc. If we were to divide rand's output by the quantity
RAND_MAX / n
that is, if we were to write
rand() / (RAND_MAX / n)
then when rand returned RAND_MAX, the division could yield exactly n, which is one too many. (This
wouldn't happen too often--only when rand returned that one value, its maximum value--but it would be
a bug, and a hard one to find, because it wouldn't show up very often.) So if we add one to the
denominator, that is, divide by the quantity
RAND_MAX / n + 1
then when rand returns RAND_MAX, the division will yield a number just shy of n, which will then be
truncated to n - 1, which is just what we want. We add in 1, and we're done.
In the case of the more general randrange2, everything we've said applies, with n replaced by n - m
+ 1. Dividing by
RAND_MAX / (n - m + 1)
would occasionally give us a number one too big (n + 1, after adding in m), so we divide by
RAND_MAX / (n - m + 1) + 1
instead.
Assignment #3 Answers
Finally, just two lines in the dice-rolling program would need to be changed to make use of the new
function:
d1 = randrange(6);
d2 = randrange(6);
or
d1 = randrange2(1, 6);
d2 = randrange2(1, 6);
The answer to the extra-credit portion of the exercise is that under some compilers, the output of the
rand function is not quite as random as you might wish. In particular, it's not uncommon for the rand
funciton to produce alternately even and odd numbers, such that if you repeatedly compute rand % 2,
you'll get the decidedly non-random sequence 0, 1, 0, 1, 0, 1, 0, 1... . It's for this reason that the slightly
more elaborate range-reduction techniques involving the constant RAND_MAX are recommended.
Exercise 9. Rewrite the dice-rolling program to also print a histogram.
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i;
int d1, d2;
int a[13];
/* uses [2..12] */
< 100; i = i + 1)
rand() % 6 + 1;
rand() % 6 + 1;
+ d2] = a[d1 + d2] + 1;
Assignment #3 Answers
}
return 0;
}
The \t in the second-to-last call to printf prints a tab character, so that the histogram bars will all be
lined up, regardless of the number of digits in the particular values of i or a[i]. Another possibility
would be to use printf's width specifier (which we haven't really covered yet) to keep the digits lined
up. That approach might look like this:
printf("%2d: %3d
", i, a[i])
Assignment #4
Assignment #4
Introductory C Programming
UW Experimental College
Assignment #4
Handouts:
Assignment #4
Answers to Assignment #3
Class Notes, Chapters 6, 7, and 8
``Expressions and Statements'' handout
Reading Assignment:
Class Notes, Chapter 6
Class Notes, Chapter 7 (Sec. 7.3 is a bit advanced, but do at least skim it)
Class Notes, Chapter 8
Review Questions:
1. What would the expression
c = getchar() != EOF
do?
2. Why must the variable used to hold getchar's return value be type int?
3. What is the difference between the prefix and postfix forms of the ++ operator?
4. (trick question) What would the expression
i = i++
do?
5. What is the definition of a string in C?
6. (Advanced) What will the getline function from section 6.3 of the notes do if successive calls
to getchar return the four values 'a', 'b', 'c', and EOF? Is getline's behavior
reasonable? Is it possible for this situation to occur, that is, for a ``line'' somehow not to be
terminated by \n?
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4.html (1 of 10) [22/07/2003 5:37:36 PM]
Assignment #4
Tutorial Section:
1. Here is a toy program for computing your car's gas mileage. It asks you for the distance driven on
the last tank of gas, and the amount of gas used (i.e. the amount of the most recent fill-up). Then it
simply divides the two numbers.
#include <stdio.h>
#include <stdlib.h>
/* for atoi() */
Assignment #4
break;
if(nch < max)
{
line[nch] = c;
nch = nch + 1;
}
}
if(c == EOF && nch == 0)
return EOF;
line[nch] = '\0';
return nch;
}
For each of the two pieces of information requested (mileage and gallons) the code uses a little
three-step procedure: (1) print a prompt, (2) read the user's response as a string, and (3) convert
that string to a number. The same character array variable, inputline, can be to hold the string
used both times, because we don't care about keeping the string around once we've converted it to
a number. The function for converting a string to an integer is atoi. (The compiler then
automatically converts the integer returned by atoi into a floating-point number to be stored in
miles or gallons. There's another function we could have used, atof, which converts a
string--possibly including a decimal point--directly into a floating-point number.)
You might wonder why we're reading the user's response as a string, only to turn around and
convert it to a number. Isn't there a way to read a numeric response directly? There is (one way is
with a function called scanf), but we're going to do it this way because it will give us more
flexibility later. (Also, it's much harder to react gracefully to any errors by the user if you're using
scanf for input.)
Obviously, this program would be a nuisance to use if you had several pairs of mileages and
gallonages to compute. You'd probably want the program to prompt you repetitively for additional
pairs, so that you wouldn't have to re-invoke the program each time. Here is a revised version
which has that functionality. It keeps asking you for more distances and more gallon amounts,
until you enter 0 for the mileage.
#include <stdio.h>
#include <stdlib.h>
/* for atoi() */
Assignment #4
char inputline[100];
float miles;
float gallons;
float mpg;
for(;;)
{
printf("enter miles driven (0 to end):\n");
getline(inputline, 100);
miles = atoi(inputline);
if(miles == 0)
break;
printf("enter gallons used:\n");
getline(inputline, 100);
gallons = atoi(inputline);
mpg = miles / gallons;
printf("You got %.2f mpg\n\n", mpg);
}
return 0;
}
The (slightly) tricky part about writing a program like this is: what kind of a loop should you use?
Conceptually, it seems you want some sort of a while loop along the lines of ``while(miles
!= 0)''. But that won't work, because we don't have the value of miles until after we've
prompted the user for it and read it in and converted it.
Therefore, the code above uses what at first looks like an infinite loop: ``for(;;)''. A for loop
with no condition tries to run forever. But, down in the middle of the loop, if we've obtained a
value of 0 for miles, we execute the break statement which forces the loop to terminate. This
sort of situation (when we need the body of the loop to run part way through before we can decide
whether we really wanted to make that trip or not) is precisely what the break statement is for.
2. Next, we're going to write our own version of the atoi function, so we can see how it works (or
might work) inside. (I say ``might work'' because there's no guarantee that your compiler's
particular implementation of atoi will work just like this one, but it's likely to be similar.)
int myatoi(char str[])
{
int i;
int digit;
int retval = 0;
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4.html (4 of 10) [22/07/2003 5:37:36 PM]
Assignment #4
Assignment #4
Assignment #4
}
(You may notice that I've deleted the digit variable in this second example, because we didn't
really need it.)
There are now two loops. The first loop starts at 0 and looks for space characters; it stops (using a
break statement) when it finds the first non-space character. (There may not be any space
characters, so it may stop right away, after making zero full trips through the loop. And the first
loop doesn't do anything except look for non-space characters.) The second loop starts where the
first loop left off--that's why it's written as
for(; str[i] != '\0'; i = i + 1)
The second loop looks at digits; it stops early if it finds a non-digit character.
The remaining problem with the myatoi function is that it doesn't handle negative numbers. See
if you can add this functionality. The easiest way is to look for a '-' character up front,
remember whether you saw one, and then at the end, after reaching the end of the digits and just
before returning retval, negate retval if you saw the '-' character earlier.
3. Here is a silly little program which asks you to type a word, then does something unusual with it.
#include <stdio.h>
extern int getline(char [], int);
int main()
{
char word[20];
int len;
int i, j;
printf("type something: ");
len = getline(word, 20);
for(i = 0; i < 80 - len; i++)
{
for(j = 0; j < i; j++)
printf(" ");
printf("%s\r", word);
}
printf("\n");
return 0;
}
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4.html (7 of 10) [22/07/2003 5:37:36 PM]
Assignment #4
(To understand how it works, you need to know that \r prints a carriage return without a
linefeed.) Type it in and see what it does. (You'll also need a copy of the getline function.) See
if you can modify the program to move the word from right to left instead of left to right.
Exercises:
1. Type in the character-copying program from section 6.2 of the notes, and run it to see how it
works. (When you run it, it will wait for you to type some input. Type a few characters, hit
RETURN; type a few more characters, hit RETURN again. Hit control-D or control-Z when
you're finished.)
2. Type in the getline() function and its test program from section 6.3 of the notes, and run it to
see how it works. You can either place getline() and its test program (main()) in one source
file or, for extra credit, place getline() in a file getline.c and the test program in a file
getlinetest.c (or perhaps gtlntst.c), for practice in compiling a program from two
separate source files.
3. Rewrite the getline test program from Exercise 2 to use the loop
while(getline(line, 256) != EOF)
printf("%s\n", line);
That is, have it simply copy a line at a time, without printing ``You typed''. Compare the behavior
of the character-copying and line-copying programs. Do they behave differently?
Now rewrite the character-copying program from Exercise 1 to use the loop
while((c = getchar()) != EOF)
printf("you typed '%c'\n", c);
and try running it. Now do things make more sense?
4. The standard library contains a function, atoi, which takes a string (presumably a string of
digits) and converts it to an integer. For example, atoi("123") would return the integer 123.
Write a program which reads lines (using getline), converts each line to an integer using
atoi, and computes the average of all the numbers read. (Like the example programs in the
notes, it should determine the end of ``all the numbers read'' by checking for EOF.) See how much
of the code from assignment 3, exercise 7 you can reuse (if you did that exercise). Remember that
integer division truncates, so you'll have to declare some of your variables as float or double.
For extra credit, also compute the standard deviation (see assignment 3, exercise 7).
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4.html (8 of 10) [22/07/2003 5:37:36 PM]
Assignment #4
5. Write a rudimentary checkbook balancing program. It will use getline to read a line, which
will contain either the word "check" or "deposit". The next line will contain the amount of
the check or deposit. After reading each pair of lines, the program should compute and print the
new balance. You can declare the variable to hold the running balance to be type float, and you
can use the function atof (also in the standard library) to convert an amount string read by
getline into a floating-point number. When the program reaches end-of-file while reading
"check" or "deposit", it should exit. (In outline, the program will be somewhat similar to
the average-finding program.)
For example, given the input
deposit
100
check
12.34
check
49.00
deposit
7.01
the program should print something like
balance:
balance:
balance:
balance:
100.00
87.66
38.66
45.67
Extra credit: Think about how you might have the program take the word "check" or
"deposit", and the amount, from a single line (separated by whitespace).
6. Rewrite the ``compass'' code from Assignment 2 (exercise 4) to use strcpy and strcat to
build the "northeast", "southwest", etc. strings. (Don't worry about capitalizing them
carefully.) You should be able to write it in a cleaner way, without so many if's and else's.
Remember to declare the array in which you build the string big enough to hold the largest string
you build (including the trailing \0).
7. Write a program to read its input, one character at a time, and print each character and its decimal
value.
8. Write a program to read its input, one line at a time, and print each line backwards. To do the
reversing, write a function
int reverse(char line[], int len)
{
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4.html (9 of 10) [22/07/2003 5:37:36 PM]
Assignment #4
...
}
to reverse a string, in place. (It doesn't have to return anything useful.)
Assignment #4 Answers
Assignment #4 Answers
Introductory C Programming
UW Experimental College
Assignment #4 ANSWERS
Assignment #4 Answers
do?
Nothing, or at least, nothing useful. Since it tries to modify i twice, it's undefined.
Question 5. What is the definition of a string in C?
An array of characters, terminated with the null character \0.
Question 6. What will the getline function do if successive calls to getchar return the four values 'a', 'b',
'c', and EOF?
The first three characters are placed in the line array, as usual, and when the EOF indicator is read, getline breaks
out of its loop, also as usual. Although c is now EOF, nch is 3, so the condition c == EOF && nch == 0 is false.
getline therefore does not return EOF, but rather terminates the line with \0 and returns its length, just as it does
with a normal line which it finds \n at the end of.
Can this situation ever occur? One way to answer a question like this is not to try too hard to answer it, to err on the
side of conservatism, to assume that if we can't prove that the unusual situation won't arise, we might as well be safe
and arrange that our code can handle it if it somehow comes up. (There's obviously little or no harm in writing code to
handle a situation that never comes up, while the reverse--neglecting to write code to handle a situation that does come
up--can of course be very harmful.)
In any case, under Unix at least, this situation can in fact come up. Unix does not enforce any notion of a ``text file'';
the sequence a, b, c, EOF at the end of a file is no more or less favored (by the operating system, that is) than the
sequence a, b, c, \n, EOF. Furthermore, there are some programs (e.g. full-screen text editors such as EMACS) which
make it easy to create a text file without a final newline (if only by accident). (But there are also examples of programs
which inadvertently ignore the last line of a file whose last line does not end in \n, and this is a bug which can cause
data loss.) So writing getline to treat a ``line'' ending in EOF (but no \n) is not only reasonable, but useful.
Tutorial 2. Improve the myatoi function so that it can handle negative numbers.
[You've seen the answer already, because I accidentally left the sign-handling code--within #ifdef SIGN--turned on
in the copy of the code printed in the assignment. Oops.]
#include <ctype.h>
int myatoi(char str[])
{
int i;
int retval = 0;
int negflag = 0;
for(i = 0; str[i] != '\0'; i = i + 1)
{
if(!isspace(str[i]))
break;
}
if(str[i] == '-')
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4a.html (2 of 9) [22/07/2003 5:37:39 PM]
Assignment #4 Answers
{
negflag = 1;
i = i + 1;
}
for(; str[i] != '\0'; i = i + 1)
{
if(!isdigit(str[i]))
break;
retval = 10 * retval + (str[i] - '0');
}
if(negflag)
retval = -retval;
return retval;
}
Tutorial 3. Modify the ``word zipping'' program to move the word from right to left instead of left to right.
#include <stdio.h>
extern int getline(char [], int);
int main()
{
char word[20];
int len;
int i, j;
printf("type something: ");
len = getline(word, 20);
for(i = 80 - len - 1; i >= 0; i--)
{
for(j = 0; j < i; j++)
printf(" ");
printf("%s \r", word);
}
printf("\n");
return 0;
}
Exercise 4. Write a program which computes the average (and standard deviation) of a series of numbers.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
extern int getline(char [], int);
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4a.html (3 of 9) [22/07/2003 5:37:39 PM]
Assignment #4 Answers
int main()
{
char line[100];
int x;
double sum, sumsq;
int n;
double mean, stdev;
sum = sumsq = 0.0;
n = 0;
while(getline(line, 100) != EOF)
{
x = atoi(line);
sum = sum + x;
sumsq = sumsq + x * x;
n = n + 1;
}
mean = sum / n;
stdev = sqrt((sumsq - sum * sum / n) / (n - 1));
printf("mean: %f\n", mean);
printf("std. dev.: %f\n", stdev);
return 0;
}
Assignment #4 Answers
Remember that characters are represented by small integers representing their values in the machine's character set. We
shouldn't have to know what these values are, but if we assume that the characters '0', '1', '2', ... '9' have
consecutive values (which, as it happens, is a perfectly valid assumption) then subtracting the value of the character
'0' from any digit character will give us that digit's value. For example, '1' - '0' is 1, '2' - '0' is 2, and of
course '0' - '0' is 0. If str[i] is a digit character, then str[i] - '0' is its value. (In all of these
subtractions, we are using the constant '0' to mean ``the value of the character '0','' which is, in fact, exactly what it
means.)
The operation of the function is simple: it moves through the string from left to right, converting each digit character to
a digit value and building up the return value. Each time we find another digit character, it (obviously) indicates that the
number we're converting has another digit, so we multiply the number we've converted so far by 10, and add in the new
digit. (Walk through this algorithm to convince yourself that it works. For example, if the string to be converted is
"123", we'll make three trips through the loop, and retval will successively take on the values 1, 12, and finally
123.)
Here is a little test program to read strings from the user, convert them to numbers using myatoi, and print out the
converted numbers:
#include <stdio.h>
int main()
{
char line[100];
int n;
while(1)
{
printf("type a number:\n");
if(getline(line, 100) == EOF)
break;
n = myatoi(line);
printf("you typed %d\n", n);
}
return 0;
}
This first implementation of myatoi works, but only if the string contains digits and only digits. If the string contains
any character other than a digit, the expression str[i] - '0' will result in a meaningless digit value. Therefore, a
better implementation of myatoi would ensure that it only attempted to convert digits.
There are in fact several situations under which the string might contain characters other than digits. It might contain
some spaces before the number to be converted (e.g. " 123"), or the number to be converted might begin with a
minus sign. In any case, the caller might carelessly pass a string which contained some non-digit characters (perhaps at
the end, following some digit characters). We will next look at an improved version of myatoi which allows for these
possibilities.
Assignment #4 Answers
The standard library contains several functions which test for various classes of characters. The isspace function
returns nonzero (i.e. ``true'') for a whitespace character (e.g. space or tab). Similarly, the isdigit function returns
nonzero (i.e. ``true'') for a digit character. These functions are declared in the header <ctype.h>. Here is an improved
version of myatoi which uses isspace to skip leading whitespace and isdigit to ensure that it attempts to
convert only digits. It also checks for a leading minus sign (but after checking for leading whitespace).
#include <ctype.h>
int myatoi(char str[])
{
int retval = 0;
int i = 0;
int negflag = 0;
while(isspace(str[i]))
i++;
if(str[i] == '-')
{
negflag = 1;
i++;
}
#include <stdio.h>
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4a.html (6 of 9) [22/07/2003 5:37:39 PM]
Assignment #4 Answers
#include <stdlib.h>
/* for atof() */
char word[20];
if(y > 0)
strcpy(word, "north");
else if(y < 0)
strcpy(word, "south");
else
strcpy(word, "");
/* empty string */
if(x > 0)
strcat(word, "east");
else if(x < 0)
strcat(word, "west");
else
strcat(word, "");
/* empty string */
Assignment #4 Answers
printf("%s\n", word);
Exercise 8. Write a program to read its input, one character at a time, and print each character and its decimal value.
#include <stdio.h>
int main()
{
int c;
while((c = getchar()) != EOF)
printf("character %c has value %d\n", c, c);
return 0;
}
You will notice that this program prints a funny line or two for each new line ('\n') in the input, because when the %c
in the printf call finds itself printing a \n character that we've just read, it naturally prints a newline at that point.
Exercise 9. Write a program to read its input, one line at a time, and print each line backwards.
Here is one way of doing it, using only what we've seen so far:
#include <stdio.h>
extern int getline(char [], int);
extern int reverse(char [], int);
int main()
{
char line[100];
int len;
while((len = getline(line, 100)) != EOF)
{
reverse(line, len);
printf("%s\n", line);
}
return 0;
}
int reverse(char string[], int len)
{
int i;
char tmp;
for(i = 0; i < len / 2; i = i + 1)
{
tmp = string[i];
http://www.eskimo.com/~scs/cclass/asgn.beg/PS4a.html (8 of 9) [22/07/2003 5:37:39 PM]
Assignment #4 Answers
Assignment #5
Assignment #5
Introductory C Programming
UW Experimental College
Assignment #5
Handouts:
Assignment #5
Answers to Assignment #4
Class Notes, Chapter 9
Class Notes, Chapter 10
Reading Assignment:
Class Notes, Chapter 9
Review Questions:
1. What's wrong with this #define line?
#define N 10;
2. Suppose you defined the macro
#define SIX 2*3
Then, suppose you used it in another expression:
int x = 12 / SIX;
What value would x be set to?
3. If the header file x.h contains an external prototype declaration for a function q(), where should
x.h be included?
4. (harder) How many differences can you think of between i and J as defined by these two lines?
int i = 10;
#define J 10
http://www.eskimo.com/~scs/cclass/asgn.beg/PS5.html (1 of 3) [22/07/2003 5:37:41 PM]
Assignment #5
Exercises:
1. Pick one of the programs we've worked with so far that uses the getline function, such as the
getline test program (assignment 4, exercise 2) or the ``word zipping'' program (assignment 4,
tutorial 3) or the checkbook balancing program (assignment 4, exercise 6). Arrange the program
in two source files, one containing main(), and one containing getline(). If possible, set
your compiler to warn you when functions are called or defined without a prototype in scope.
Create a header file getline.h containing the external function prototype for getline(),
and #include it as you see fit. Also use a preprocessor macro such as MAXLINE for the
maximum line length (i.e. in declarations of line arrays, and calls to getline).
2. Write a program to read its input and write it out, double-spaced (that is, with an extra blank line
between each line). Process the input a character at a time; do not read entire lines at a time using
getline.
3. Write the function
int countchars(char string[], int ch);
which returns the number of times the character ch appears in the string. For example, the call
countchars("Hello, world!", 'o')
would return 2.
4. Write a short program to read two lines of text, and concatenate them using strcat. Since
strcat concatenates in-place, you'll have to make sure you have enough memory to hold the
concatenated copy. For now, use a char array which is twice as big as either of the arrays you
use for reading the two lines. Use strcpy to copy the first string to the destination array, and
strcat to append the second one.
5. Write the function
replace(char string[], char from[], char to[])
which finds the string from in the string string and replaces it with the string to. You may
assume that from and to are the same length. For example, the code
char string[] = "recieve";
replace(string, "ie", "ei");
http://www.eskimo.com/~scs/cclass/asgn.beg/PS5.html (2 of 3) [22/07/2003 5:37:41 PM]
Assignment #5
Assignment #5 Answers
Assignment #5 Answers
Introductory C Programming
UW Experimental College
Assignment #5 ANSWERS
Assignment #5 Answers
It should be included in each source file where q() is called, so that the compiler will see the prototype
declaration and be able to generate correct code. It should also be included in the source file where q()
is defined, so that the compiler will be able to notice, and complain about, any mismatches between the
prototype declaration and the actual definition. (It's vital that the prototype which will be used where a
function is called be accurate; an incorrect prototype is worse than useless.)
Question 4. How many differences can you think of between i and J?
The line
int i = 10;
declares a conventional run-time variable, named i, initially containing the value 10. It will be possible
to change i's value at run time. i may appear in expressions (i.e., its value may be fetched), but since it
is not constant, it could not be used where C requires a constant, such as in the dimension of an array
declaration.
The line
#define J 10
on the other hand, defines a preprocessor macro named J having the value 10. For the rest of the current
source file, anywhere you write a single J, the preprocessor will replace it with 10. An array declaration
such as
int a[J];
will be fine; it will be just as if you had written
int a[10];
However, J exists only at compile time. It is not a run-time variable; if you tried to ``change its value'' at
run time by writing
J = 20;
it would be just as if you had written
10 = 20;
Assignment #5 Answers
Assignment #5 Answers
{
int i;
int count = 0;
for(i = 0; string[i] != '\0'; i++)
{
if(string[i] == ch)
count++;
}
return count;
}
Here is a tiny little main program, to test it out:
#include <stdio.h>
extern int countnchars(char string[], int ch);
int main()
{
char string[] = "Hello, world!";
char c = 'o';
printf("The letter %c appears in \"%s\" %d times.\n",
c, string, countnchars(string, c));
return 0;
}
Exercise 4. Write a short program to read two lines of text, and concatenate them using strcat.
#include <stdio.h>
#include <string.h>
Assignment #5 Answers
Assignment #5 Answers
strstr's job is to find one string within another, it's a natural for the first half of replace.)
Extra credit: Think about what replace() should do if the from string appears multiple times in the
input string.
Our first implementation replaced only the first occurrence (if any) of the from string. It happens,
though, that it's trivial to rewrite our first version to make it replace all occurrences--just omit the
return after the first string has been replaced:
void replace(char string[], char from[], char to[])
{
int start, i1, i2;
for(start = 0; string[start] != '\0'; start++)
{
i1 = 0;
i2 = start;
while(from[i1] != '\0')
{
if(from[i1] != string[i2])
break;
i1++;
i2++;
}
if(from[i1] == '\0')
{
for(i1 = 0; to[i1] != '\0'; i1++)
string[start++] = to[i1];
}
}
}
Assignment #6
Assignment #6
Introductory C Programming
UW Experimental College
Assignment #6
Handouts:
Assignment #6
Assignment #5 Answers
Class Notes, Chapter 11
Class Notes, Chapters 12 and 13
Reading Assignment:
Class Notes, Chapters 10 and 11
Review Questions:
1. If we say
int i = 5;
int *ip = &i;
then what is ip? What is its value?
2. If ip is a pointer to an integer, what does ip++ mean? What does
*ip++ = 0;
do?
3. How much memory does the call malloc(10) allocate? What if you want enough memory for
10 ints?
4. The assignment in
char c;
int *ip = &c;
/* WRONG */
is in error; you can't mix char pointers and int pointers like this. How, then, is is possible to
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6.html (1 of 3) [22/07/2003 5:37:46 PM]
Assignment #6
write
char *cp = malloc(10);
int *ip = malloc(sizeof(int));
without error on either line?
Exercises:
1. Write a program to read lines and print only those containing a certain word. (For now, the word
can be a constant string in the program.) The basic pattern (which I have to confess I have
parroted exactly from K&R Sec. 4.1) is
while(there's another line)
{
if(line contains word)
print the line;
}
Use the strstr function (mentioned in the notes) to look for the word. Be sure to include the
line
#include <string.h>
2.
3.
4.
5.
6.
7.
Assignment #6
Extra credit: remove the restriction imposed by the fixed-size array into which each line is
originally read; allow the program to accept arbitrarily many arbitrarily-long lines. (You'll have to
replace getline with a dynamically-allocating line-getting function which calls malloc and
realloc.)
Assignment #6 Answers
Assignment #6 Answers
Introductory C Programming
UW Experimental College
Assignment #6 ANSWERS
Question 1. If we say int i = 5; int *ip = &i; then what is ip? What is its value?
ip is a variable which can point to an int (that is, its value will be a pointer to an int; or informally,
we say that ip is ``a pointer to an int''). Its value is a pointer which points to the variable i.
Question 2. If ip is a pointer to an int, what does ip++ mean? What does *ip++ = 0 do?
ip++ means about the same as it does for any other variable: increment it by 1, that is, as if we had
written ip = ip + 1. In the case of a pointer, this means to make it point to the object (the int) one
past the one it used to. *ip++ = 0 sets the int variable pointed to by ip to 0, and then increments ip
to point to the next int.
Question 3. How much memory does the call malloc(10) allocate? What if you want enough memory
for 10 ints?
malloc(10) allocates 10 bytes, which is enough space for 10 chars. To allocate space for 10 ints,
you could call malloc(10 * sizeof(int)) .
Question 4. If char and int pointers are different, how is it possible to write
char *cp = malloc(10);
int *ip = malloc(sizeof(int));
without error on either line?
malloc is declared as returning the special, ``generic'' pointer type void *, which can be (and is)
automatically converted to different pointer types, as needed.
Exercise 1. Write a program to read lines and print only those containing a certain word.
#include <stdio.h>
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (1 of 11) [22/07/2003 5:37:49 PM]
Assignment #6 Answers
#include <string.h>
extern int getline(char [], int);
int main()
{
char line[100];
char *pat = "hello";
while(getline(line, 100) != EOF)
{
if(strstr(line, pat) != NULL)
printf("%s\n", line);
}
return 0;
}
Exercise 2. Rewrite the checkbook-balancing program to use the getwords function to make it easy to
take the word ``check'' or ``deposit'', and the amount, from a single line.
#include <stdio.h>
#include <stdlib.h>
/* for atof() */
Assignment #6 Answers
if(strcmp(words[0], "deposit") == 0)
{
if(nwords < 2)
{
printf("missing amount\n");
continue;
}
balance += atof(words[1]);
}
else if(strcmp(words[0], "check") == 0)
{
if(nwords < 2)
{
printf("missing amount\n");
continue;
}
balance -= atof(words[1]);
}
else
{
printf("bad data line: \"%s\"\n", words[0]);
continue;
}
printf("balance: %.2f\n", balance);
}
return 0;
}
Exercise 3. Rewrite the line-reversing function to use pointers.
Here is one way:
int reverse(char *string)
{
char *lp = string;
char *rp = &string[strlen(string)-1];
char tmp;
while(lp < rp)
{
tmp = *lp;
*lp = *rp;
*rp = tmp;
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (3 of 11) [22/07/2003 5:37:49 PM]
/* left pointer */
/* right pointer */
Assignment #6 Answers
lp++;
rp--;
}
return 0;
}
Exercise 4. Rewrite the character-counting function to use pointers.
/* for malloc */
/* for strcpy and strcat */
Assignment #6 Answers
exit(1);
newstring = malloc(len1 + len2 + 1);
/* +1 for \0 */
if(newstring == NULL)
{
printf("out of memory\n");
exit(1);
}
strcpy(newstring, string1);
strcat(newstring, string2);
printf("%s\n", newstring);
return 0;
}
Exercise 6. Rewrite the string-replacing function to use pointers.
Here is one way:
void replace(char string[], char *from, char *to)
{
char *start, *p1, *p2;
for(start = string; *start != '\0'; start++)
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
Assignment #6 Answers
}
The bulk of this code is a copy of the mystrstr function in the notes, chapter 10, section 10.4, p. 8.
(Since strstr's job is to find one string within another, it's a natural for the first half of replace.) We
could also call strstr directly, simplifying replace:
#include <string.h>
void replace(char string[], char *from, char *to)
{
char *p1;
char *start = strstr(string, from);
if(start != NULL)
{
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
Again, we might wonder about the case when the string to be edited contains multiple occurrences of the
from string, and ask whether replace should replace the first one, or all of them. The problem
statement didn't make this detail clear. Our first two implementations have replaced only the first
occurrence. It happens, though, that it's trivial to rewrite our first version to make it replace all
occurrences--just omit the return after the first string has been replaced:
void replace(char string[], char *from, char *to)
{
char *start, *p1, *p2;
for(start = string; *start != '\0'; start++)
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
for(p1 = to; *p1 != '\0'; p1++)
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (6 of 11) [22/07/2003 5:37:49 PM]
Assignment #6 Answers
*start++ = *p1;
}
}
}
Rewriting the second version wouldn't be much harder.
Exercise 7. Write a program to read lines of text up to EOF, and then print them out in reverse order.
One of the reasons this program is harder is that it uses pointers to pointers. When we used pointers to
simulate an array of integers, we used pointers to integers. In this program, we want to simulate an array
of strings, or an array of pointers to characters. Therefore, we use pointers to pointers to characters, or
char **.
#include <stdio.h>
#include <stdlib.h>
extern int getline(char [], int);
#define MAXLINE 100
int main()
{
int i;
char line[MAXLINE];
char **lines;
int nalloc, nitems;
nalloc = 10;
lines = malloc(nalloc * sizeof(char *));
if(lines == NULL)
{
printf("out of memory\n");
exit(1);
}
nitems = 0;
while(getline(line, MAXLINE) != EOF)
{
if(nitems >= nalloc)
{
char **newp;
nalloc += 10;
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (7 of 11) [22/07/2003 5:37:49 PM]
Assignment #6 Answers
Assignment #6 Answers
char *line;
int nalloc = 10;
int nch = 0;
int c;
line = malloc(nalloc + 1);
if(line == NULL)
{
printf("out of memory\n");
exit(1);
}
while((c = getchar()) != EOF)
{
if(c == '\n')
break;
if(nch >= nalloc)
{
char *newp;
nalloc += 10;
newp = realloc(line, nalloc + 1);
if(newp == NULL)
{
printf("out of memory\n");
exit(1);
}
line = newp;
}
line[nch++] = c;
}
if(c == EOF && nch == 0)
{
free(line);
return NULL;
}
line[nch] = '\0';
return line;
}
Now we can rewrite the line-reversing program to call mgetline. Note that we no longer need the local
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (9 of 11) [22/07/2003 5:37:49 PM]
Assignment #6 Answers
line array. Instead, we use a pointer, linep, which holds the return value from mgetline. Since
mgetline returns a new pointer to a new block of memory each time we call it, in this program (unlike
the previous one) we can set
lines[nitems++] = linep;
without overwriting anything.
#include <stdio.h>
#include <stdlib.h>
extern char *mgetline();
int main()
{
int i;
char *linep;
char **lines;
int nalloc, nitems;
nalloc = 10;
lines = malloc(nalloc * sizeof(char *));
if(lines == NULL)
{
printf("out of memory\n");
exit(1);
}
nitems = 0;
while((linep = mgetline()) != NULL)
{
if(nitems >= nalloc)
{
char **newp;
nalloc += 10;
newp = realloc(lines, nalloc * sizeof(char *));
if(newp == NULL)
{
printf("out of memory\n");
exit(1);
}
lines = newp;
}
http://www.eskimo.com/~scs/cclass/asgn.beg/PS6a.html (10 of 11) [22/07/2003 5:37:49 PM]
Assignment #6 Answers
lines[nitems++] = linep;
}
for(i = nitems - 1; i >= 0; i--)
printf("%s\n", lines[i]);
return 0;
}
Assignment #7
Assignment #7
Introductory C Programming
UW Experimental College
Assignment #7
Handouts:
Assignment #7
Assignment #6 Answers
Class Notes, Chapter 14
Reading Assignment:
Class Notes, Chapters 12 and 13
Review Questions:
1. What do we mean by the ``equivalence between arrays and pointers'' in C?
2. If p is a pointer, what does p[i] mean?
3. Can you think of a few reasons why an I/O scheme which did not make you open a file and keep
track of a ``file pointer,'' but instead let you just mention the name of the file as you were reading
or writing it, might not work as well?
4. If argc is 2, what is argv[1]? What if argc is 1?
Exercises:
1. Rewrite the replace function from last week to remove the restriction that the replacement
substring have the same size. Be sure to test it on replacement strings which are larger, smaller,
and the same size. Should it work if the replacement string is the empty string?
2. Rewrite the pattern matching program (Assignment 6, Exercise 1) to prompt the user for the name
of the file to search in and a pattern to search for.
3. Rewrite the pattern matching program (Assignment 6, Exercise 1) to accept the pattern and file
name from the command line (like the Unix grep or MS-DOS find command).
Assignment #7
Assignment #7 Answers
Assignment #7 Answers
Introductory C Programming
UW Experimental College
Assignment #7 ANSWERS
Assignment #7 Answers
Exercise 1. Rewrite the replace function from last week to remove the restriction that the replacement substring
have the same size.
If the replacement substring is a different size, we'll have to move the rest of the string right or left, either to make
room or to close up the gap. Here's code to do that, by shifting the characters, one at a time. (Notice that when
we're shifting to the right, we have to work our way through the characters we're shifting from right to left, to
avoid stepping on characters we haven't shifted yet. Notice that we also have to worry about the new position of
the '\0' string terminator in each case.)
void replace(char string[], char *from, char *to)
{
int fromlen = strlen(from);
int tolen = strlen(to);
char *start, *p1, *p2;
for(start = string; *start != '\0'; start++)
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
if(fromlen > tolen)
{
/* move rest of string left */
p2 = start + tolen;
for(p1 = start + fromlen; *p1 != '\0'; p1++)
*p2++ = *p1;
*p2 = '\0';
}
else if(fromlen < tolen)
{
/* move rest of string right */
int leftover = strlen(start);
p2 = start + leftover + (tolen - fromlen);
*p2-- = '\0';
for(p1 = start + leftover - 1;
p1 >= start + fromlen; p1--)
*p2-- = *p1;
}
for(p1 = to; *p1 != '\0'; p1++)
http://www.eskimo.com/~scs/cclass/asgn.beg/PS7a.html (2 of 7) [22/07/2003 5:37:53 PM]
Assignment #7 Answers
*start++ = *p1;
return;
}
}
}
As it happens, there's a function in the standard library, memmove, whose job it is to move characters (which
might be strings) from point a to point b, taking care to make the copy in the right order if the source and
destination strings overlap. memmove is declared in <string.h>, and its prototype looks approximately like
this:
memmove(char *dest, char *src, int n);
memmove is therefore a lot like strcpy (the dest and src arguments are in that order so that a call mimics an
assignment). We can simplify our new version of replace by calling memmove instead:
#include <string.h>
void replace(char string[], char *from, char *to)
{
int fromlen = strlen(from);
int tolen = strlen(to);
char *start, *p1, *p2;
for(start = string; *start != '\0'; start++)
{
p1 = from;
p2 = start;
while(*p1 != '\0')
{
if(*p1 != *p2)
break;
p1++;
p2++;
}
if(*p1 == '\0')
{
if(fromlen != tolen)
{
memmove(start + tolen, start + fromlen,
strlen(start + fromlen) + 1);
/* + 1 for \0 */
}
for(p1 = to; *p1 != '\0'; p1++)
*start++ = *p1;
return;
}
}
http://www.eskimo.com/~scs/cclass/asgn.beg/PS7a.html (3 of 7) [22/07/2003 5:37:53 PM]
Assignment #7 Answers
}
Exercise 2. Rewrite the pattern matching program to prompt the user.
We'll need to use fgetline (see the notes) instead of getline, so that we can read lines from the file pointer
connected to the file we open. With fgetline in hand, we rewrite the pattern-matching program to prompt for
the file name and pattern, open the file, and finally call fgetline instead of getline:
#include <stdio.h>
#include <string.h>
#define MAXLINE 100
extern int getline(char [], int);
extern int fgetline(FILE *, char [], int);
int main()
{
char
char
char
FILE
line[MAXLINE];
filename[MAXLINE];
pat[MAXLINE];
*ifp;
Assignment #7 Answers
This program is imperfect in that it does not check the return value from the two getline calls. If the user
doesn't type a file name or pattern, the program will misbehave.
Exercise 3. Rewrite the pattern matching program to accept arguments from the command line.
#include <stdio.h>
#include <string.h>
#define MAXLINE 100
extern int fgetline(FILE *, char [], int);
int main(int
{
char
char
char
FILE
if(argc != 3)
{
fprintf(stderr, "missing search pattern or file name\n");
exit(1);
}
pat = argv[1];
filename = argv[2];
ifp = fopen(filename, "r");
if(ifp == NULL)
{
fprintf(stderr, "can't open %s\n", filename);
exit(1);
}
while(fgetline(ifp, line, MAXLINE) != EOF)
{
if(strstr(line, pat) != NULL)
printf("%s\n", line);
}
return 0;
}
It is useful to allow the program to search for the pattern in multiple files, or, if no file names are specified on the
command line, to read from standard input. Here's how we might implement that. We break the read-lines-andsearch loop out into a separate function, fsearchpat. We either call fsearchpat once, handing it stdin as
a file pointer, or else we loop over all of the file names on the command line, opening each one and calling
http://www.eskimo.com/~scs/cclass/asgn.beg/PS7a.html (5 of 7) [22/07/2003 5:37:53 PM]
Assignment #7 Answers
if(argc < 2)
{
fprintf(stderr, "missing search pattern\n");
exit(1);
}
pat = argv[1];
if(argc == 2)
fsearchpat(stdin, pat);
else
{
int i;
for(i = 2; i < argc; i++)
{
filename = argv[i];
ifp = fopen(filename, "r");
if(ifp == NULL)
{
fprintf(stderr, "can't open %s\n", filename);
exit(1);
}
fsearchpat(ifp, pat);
fclose(ifp);
}
}
return 0;
}
Assignment #7 Answers
Week 1
Topics: program and data structure design; structures; linked lists
Assignment
Answers
Week 2
Topics: input/output; data files; stdio
Assignment
Answers
Week 3
Topics: miscellaneous C features (void and unsigned types; type qualifiers; storage classes and
typedef; ?:, cast, and comma operators; default promotions; switch, do/while, and goto
statements); returning aggregates
Assignment
Answers
Week 4
Topics: bitwise operators; function-like preprocessor macros
Assignment
Answers
Week 5
http://www.eskimo.com/~scs/cclass/asgn.int/index.html (1 of 2) [22/07/2003 5:37:55 PM]
Week 6
Topics: pointers to pointers; pointers to functions; multidimensional arrays
Assignment
Answers
Week 7
Topics: Object-Orinted Programming methodology, C++ preview; variable-length argument lists
Assignment
Answers
Week 8
One last handout
Assignment #1
Assignment #1
Intermediate C Programming
UW Experimental College
Assignment #1
Handouts:
Syllabus
Assignment #1
Class Notes, Chapter 15
Game source code
Notes about ``The Game''
Review Questions:
1. What are the four parts of a structure definition?
2. What is p->m equivalent to?
3. What's wrong with this code?
struct x *p = malloc(sizeof(struct x *));
4. The last code example in chapter 15 of the notes reads lines from stdin and prints them back
out in reverse order. What would happen if the prepend function did not call malloc and
strcpy to allocate space for a new copy of each listnode's item, but instead simply set
newnode->item = newword?
Exercises:
1. Get the initial version of the game (as handed out in class) to compile and run. It should print a ``?''
prompt, and you should be able to type the ``n'', ``s'', ``e'', and ``w'' commands to attempt to move
around, and the ``take'' and ``drop'' commands to pick up and drop objects. (There are one or two other
commands, which you can learn by studying the file commands.c.)
Study and understand as much of the source code of the game as you can.
2. Add a ``help'' command which prints a message listing the available verbs (or at least the most
common ones).
Assignment #1
3. Add a ``long description'' field to the object and room structures. (It can either be an array of char
or a pointer to char.) Modify the initializations of the objects and rooms arrays in main.c to
initialize the description field for some or all objects and rooms. Arrange that the ``look'' command print
the long description of the current room (if it has one), and that the description is also printed when the
actor enters a new room. Add an ``examine'' command which prints the long description of an object, if it
has one. (Otherwise, have it print ``You see nothing special about the %s.'')
You have four options in downloading the source code package for the game:
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html
Our ongoing example during this entire class will be a relatively simple but fully functional text-based
adventure game, patterned roughly after the classic games ADVENT and Zork. If you haven't played one
of these games before, the basic idea is that you are an ``actor'' in a playing field (often referred to as a
``dungeon'') consisting of a number of interconnected rooms. You can move from room to room, pick up
objects you find lying around (such as keys or swords or C compilers), and use these objects to perform
various tasks, such as unlocking doors or slaying dragons or writing programs.
You interact with the game by typing commands which are simple English sentences. (In our first
attempts, the sentences will be very simple indeed, but in sophisticated games of this sort the parsers are
quite elaborate and can ``understand'' rather complicated sentences.) The game responds to each
command by telling you what you see--a new room you've entered, or the result of some task you've
managed to perform.
Part of the ``fun'' (or, at least, the challenge) of these games is just figuring out what commands are
acceptable. To get you started, you move from room to room with the commands ``north'', ``east'',
``west'', and ``south'' (which can be abbreviated to ``n'', ``e'', ``w'', or ``s''). You can pick up objects with
the ``take'' command and put them down by saying ``drop''. You can request a description of the room
you're in by typing ``look'', and request a list of your possessions by typing ``inventory'' (or just ``i'').
(Of course, since you have the source code of this particular game, there's an easy way out of the
challenge of discovering which commands it accepts, once you discover the C function which is in
charge of matching the commands you type against the list of verbs which the game understands.)
It might seem as if such a game would be difficult to write, but as we'll see, it can actually be quite easy.
Study the code, starting with main.c, followed by commands.c, followed by object.c and
room.c. (All along you'll want to refer to game.h.) At first it will seem that the the high-level
functions are all shirking their responsibilities--it will seem as if they're doing no real work, but rather
doing only the easy part and then calling some other function to do the rest. At first, it will seem as if
these sub-functions are going to get stuck with all the hard work, but when we get to them, we'll find that
they're nicely simple, too. In fact, the complexity which we might have imagined the program would
contain may seem to melt away before our eyes, as we discover that the program is composed of many
functions each of which is quite easy to understand. If this result surprises you, study it well, because
attacking a problem in this way--making a hard problem tractable by breaking it up into pieces each of
which seems simple but which in concert perform a task which seems harder than the sum of the parts-this process is the essence of good programming.
The game revolves around three data structures, one describing the player (or ``actor''), one describing a
room, and one describing an object. Since the dungeon will typically consist of many rooms, and since
there will typically be many objects scattered among the rooms, we'll typically have many instances of
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html (1 of 2) [22/07/2003 5:37:58 PM]
http://www.eskimo.com/~scs/cclass/asgn.int/gameintro.html
the room and object structures. The object structure will contain a ``next'' field, capable of pointing at
another object, so that several objects can be strung together as a singly-linked list. We'll use these lists
of objects to represent the set of objects sitting in a room, or in an actor's possession.
Rooms will be linked together, too, but in a more complex way: each room will have up to four pointers
to other rooms. These pointers can be thought of as doorways or corridors: the pointed-to room is the
room that will be gone to if an actor travels in one of the four compass directions out of the original
room. (A room's exits will be stored in an array of four pointers, where element [0] is the north exit,
[1] is south, [2] is east, and [3] is west. Also, since pointers are inherently directional, it will be
necessary for the linked-to room to have its own pointer back to the original room if the actor is to be
able to return. This means that we can create deliberately confusing situations if we wish, such as oneway corridors, or rooms with exits that don't take you back to where you just came from.)
When linking objects and rooms together, we'll make heavy use of null pointers. The end of a linked list
is indicated by a null pointer--that is, the last object in the list has a next pointer of null. A room with
no objects, or an actor with no possessions, will have a null pointer in its contents field. When a room
has a wall instead of a door in one of the four compass directions (that is, when it's impossible to leave a
room in a particular direction) that will be indicated by a null pointer in the corresponding cell of the
exits array.
Copyright
This collection of hypertext pages is Copyright 1995-1999 by Steve Summit. This material may be freely
redistributed and used but may not be republished or sold without permission.
Assignment #1 Answers
Assignment #1 Answers
Intermediate C Programming
UW Experimental College
Assignment #1 ANSWERS
Assignment #1 Answers
Assignment #1 Answers
Assignment #2
Assignment #2
Intermediate C Programming
UW Experimental College
Assignment #2
Handouts:
Assignment #2
Assignment #1 Answers
Class Notes, Chapter 16
Class Notes, Chapter 17
Review Questions:
1. List some differences between text and binary data files.
Exercises:
1. We're going to fix one of the more glaring deficiencies of the initial implementation of the game, namely that the
description of the dungeon takes the form of hard-compiled structure arrays in main.c, such that the source code
must be edited (and in a tedious way, at that) and the game recompiled whenever we want to modify the dungeon.
(We're used to recompiling when we make changes to our programs, but we shouldn't have to recompile when we
make expected changes to our data. Imagine if a spreadsheet program required you to enter your worksheet
expressions or data with a C compiler and recompile each time!)
We're going to add code which can read the dungeon description dynamically, at run time, from a simple, textual data
file. The file will describe the rooms, the objects and which rooms they're initially in, and the exits which interconnect
the rooms. As the file is read, we'll dynamically assign new instances of struct room and struct object to
define the rooms and objects which the data file describes. We won't know how many room and object structures
we'll need, and we certainly won't know what descriptions they will contain, so we'll abandon the rooms and
objects arrays from main.c, along with their cumbersome initializations. We'll add (in objects.c) an
allocation function, newobject(), which will return a pointer to a brand-new object structure which we can use to
contain some new object, usually one we've just read from the data file. Similarly, we'll add a newroom() function
to rooms.c which will return a pointer to a brand-new room structure. (We're not quite ready to do full-blown
dynamic memory allocation yet, so for now, the room and object alloction functions will dole structures out of fixedsize arrays which we'll arbitrarily declare as having size 100. Naturally we'll keep track of how many structures are
actually in use, and complain if an attempt is made to use more than 100. Since the allocation code is isolated down
within rooms.c and object.c, only those modules will need rewriting when we move to a more flexible
allocation scheme; the code which calls newroom() and newobject() won't need to change.)
Appended to this assignment is a listing of a new file, io.c, for doing the data file reading. (This code is also on the
disk or at the ftp site, in the week2 subdirectory.)
Rip out the definitions and initializations of the objects and rooms arrays at the top of main.c; also remove the
initialization of actor (but leave the definition). At the top of main(), before the initial call to listroom, add the
lines
Assignment #2
if(!readdatafile())
exit(1);
gotoroom(&actor, getentryroom());
Add the following code to object.c (it can go near the top):
#define MAXOBJECTS 100
static struct object objects[MAXOBJECTS];
static int nobjects = 0;
struct object *
newobject(char *name)
{
struct object *objp;
if(nobjects >= MAXOBJECTS)
{
fprintf(stderr, "too many objects\n");
exit(1);
}
objp = &objects[nobjects++];
strcpy(objp->name, name);
objp->lnext = NULL;
return objp;
}
(This code is in the file object.xc in the week2 subdirectory.)
Add the following code to rooms.c (it can go near the top):
#define MAXROOMS 100
static struct room rooms[MAXROOMS];
static int nrooms = 0;
struct room *
newroom(char *name)
{
struct room *roomp;
int i;
if(nrooms >= MAXROOMS)
{
fprintf(stderr, "too many rooms\n");
exit(1);
}
http://www.eskimo.com/~scs/cclass/asgn.int/PS2.html (2 of 4) [22/07/2003 5:38:05 PM]
Assignment #2
roomp = &rooms[nrooms++];
strcpy(roomp->name, name);
roomp->contents = NULL;
for(i = 0; i < NEXITS; i++)
roomp->exits[i] = NULL;
return roomp;
}
struct room *
findroom(char *name)
{
int i;
for(i = 0; i < nrooms; i++)
{
if(strcmp(rooms[i].name, name) == 0)
return &rooms[i];
}
return NULL;
}
struct room *
getentryroom(void)
{
if(nrooms == 0)
return NULL;
return &rooms[0];
}
/* temporary */
Assignment #2
Assignment #2 Answers
Assignment #2 Answers
Intermediate C Programming
UW Experimental College
Assignment #2 ANSWERS
Question 1. List some differences between text and binary data files.
Text files store lines of characters, intended to be human-readable if printed, displayed to the screen, read into a text editor, etc.
Binary files cntain arbitrary byte values which make little or no sense if interpreted as characters.
Binary files may be smaller, and if read and written in a simpleminded way, they may be more efficient to read and write, and
the code to do so may be easier to write. Text files tend to be larger, to take longer (than binary files) to read and write, and to
require more elaborate code. (However, if binary files are read in safer ways, or written in more portable ways, the code to do
so is as complicated as for text files, and will probably be intermediate in efficiency between text files and simpleminded
binary files.)
Text files are quite portable. Binary files will not necessarily work if they are moved between machines having different word
sizes or data formats, or between compilers which lay out data structures differently in memory.
It's easy to create, inspect, and modify text files using standard tools. Binary file maintenance generally requires speciallywritten tools.
Binary files must be written using fopen mode "wb" and read using mode "rb"
Exercise 2. Devise and implement a way for long descriptions to be read from the data file.
There are many ways of doing this. All are a bit tricky, given certain constraints imposed by the existing code.
The most obvious way is to use lines of the form
desc You are in a dark and gloomy cave.
in the data file. (Such lines could of course be used for objects as well.) The only problem with this approach is that in the
existing data file reading code in io.c, each line read from the data file is immediately broken up into words by calling
getwords, and this break-up happens before it's decided what keyword began the line and what's to be done with the rest of
the information on the line. In the case of long descriptions, we want the rest of the text on the line (after the keyword desc)
to be treated as a single sting, not a series of individual words.
One way around this problem is to make a copy of the line read from the data file before breaking it up with getwords. I
declared a second array
char line2[MAXLINE];
at the top of parsedatafile in io.c, and changed the old line
Assignment #2 Answers
Assignment #2 Answers
multiple desc lines, and arranging that they all get concatenated together to form the complete description. Here's how I
implemented that:
else if(strcmp(av[0], "desc") == 0)
{
char *p;
char *newstr;
for(p = line; *p != '\0' && !isspace(*p); p++) /* skip "desc" */
;
for(; isspace(*p); p++)
/* skip whitespace */
;
if(currentobject != NULL)
{
if(currentobject->desc == NULL)
currentobject->desc = chkstrdup(p);
else
{
newstr = chkmalloc(
strlen(currentobject->desc) + 1
+ strlen(p) + 1);
strcpy(newstr, currentobject->desc);
strcat(newstr, "\n");
strcat(newstr, p);
free(currentobject->desc);
currentobject->desc = newstr;
}
}
else if(currentroom != NULL)
{
if(currentroom->desc == NULL)
currentroom->desc = chkstrdup(p);
else
{
newstr = chkmalloc(
strlen(currentroom->desc) + 1
+ strlen(p) + 1);
strcpy(newstr, currentroom->desc);
strcat(newstr, "\n");
strcat(newstr, p);
free(currentroom->desc);
currentroom->desc = newstr;
}
}
else
fprintf(stderr, "desc not in object or room\n");
}
Now the memory allocation is more complicated, because when we already have a (partial) long description string for a given
room or object, we have to allocate a new, longer one and free the old one. (There's an easier way to do this, involving another
standard library memory allocation function called realloc. We'll see it later.)
Another option would be to decide that long descriptions span many lines on the data file, followed by some terminator,
without having each descriptive line preceded by any keyword. For example, we might have lines in the data file like this:
desc
http://www.eskimo.com/~scs/cclass/asgn.int/PS2a.html (3 of 6) [22/07/2003 5:38:08 PM]
Assignment #2 Answers
Assignment #2 Answers
/* long description */
NORTH
SOUTH
EAST
WEST
NORTHEAST
SOUTHEAST
NORTHWEST
SOUTHWEST
UP
DOWN
0
1
2
3
4
5
6
7
8
9
(By adding the new directions at the end of the array, rather than interspersing them with the original four, any array
initializations such as we used to have in main.c would still work, since it's legal to supply fewer initializers for an array than
the maximum number of elements in the array.)
I added this code to docommand in commands.c:
else if(strcmp(verb, "ne") == 0 || strcmp(verb,
dircommand(player, NORTHEAST);
else if(strcmp(verb, "se") == 0 || strcmp(verb,
dircommand(player, SOUTHEAST);
else if(strcmp(verb, "nw") == 0 || strcmp(verb,
dircommand(player, NORTHWEST);
else if(strcmp(verb, "sw") == 0 || strcmp(verb,
dircommand(player, SOUTHWEST);
else if(strcmp(verb, "up") == 0)
dircommand(player, UP);
else if(strcmp(verb, "down") == 0)
dircommand(player, DOWN);
"northeast") == 0)
"southeast") == 0)
"northwest") == 0)
"southwest") == 0)
Assignment #2 Answers
Assignment #3
Assignment #3
Intermediate C Programming
UW Experimental College
Assignment #3
Handouts:
Assignment #3
Assignment #2 Answers
Class Notes, Chapter 18
Class Notes, Chapter 19
Exercises:
1. Appended to this assignment are listings of new versions of commands.c and parser.c. (I've only printed the
first page of commands.c, because the rest doesn't change.) These versions implement a slightly more sophisticated
scheme for parsing and representing commands; they will handle sentences like
hit nail with hammer
The new scheme involves this structure, in game.h:
struct sentence
{
char *verb;
struct object *object;
char *preposition;
struct object *xobject; /* object of preposition */
};
This structure can contain the verb, object, and prepositional phrase of one of these sentences. (Also, the object of the
verb and the object of the preposition are represented by object pointers, filled in by the command parser, rather than
simple object names.)
Replace your copies of parser.c and commands.c with these new ones, which can be found in the week3
subdirectory. Merge any of the commands you added (or other changes you made) from your old commands.c into
the new one. In main.c, replace the declarations of the variables verb and object with the new variable
struct sentence cmd;
which can contain the whole command, and change the calls to parseline and docommand to
if(!parseline(&player, line, &cmd))
continue;
Assignment #3
docommand(&player, &cmd);
You'll also need to change the prototype declarations for parseline and docommand in game.h.
Although we can now parse these more-complicated commands, we need a bit more machinery before we can
implement any of them. For now, here is a little ``hit'' command (to be added to commands.c) which will begin to
give you a feel for what we'll be able to do:
else if(strcmp(verb, "hit") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to hit.\n");
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me what to hit with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You don't have the %s.\n", cmd->xobject->name);
return FALSE;
}
printf("The %s says, \"Ouch!\"\n", objp->name);
}
Add this command, too.
2. Add some other commands, such as ``break'' or ``cut'' or ``read''. (They won't really do anything yet, other than print
little messages along the lines of the ``hit'' command in exercise 1.)
3. Write a function plural with the prototype
char *plural(char *);
which, given a string, returns a new string with an ``s'' tacked on to the end. Use this new function to rewrite some of
the messages printed by the game from
printf("I see no %s here.\n", word);
to
printf("I don't see any %s.\n", plural(word));
Make an appropriate choice from among the techniques discussed in class for the allocation of the string returned by
plural. (Extra credit: try to make plural a little smarter, by adding ``es'' where appropriate.)
Assignment #3
Assignment #3 Answers
Assignment #3 Answers
Intermediate C Programming
UW Experimental College
Assignment #3 ANSWERS
Assignment #3 Answers
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
printf("The %s is now cut in two.\n", objp->name);
}
else if(strcmp(verb, "read") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to read.\n");
return FALSE;
}
printf("There isn't much to read on the %s.\n", objp->name);
}
Exercise 3. Write a function plural so that the game can print slightly more interesting messages.
Here is my first plural function:
#include <string.h>
#include "game.h"
#define MAXRETBUF 20
char *
plural(char *word)
{
static char retbuf[MAXRETBUF];
strcpy(retbuf, word);
strcat(retbuf, "s");
return retbuf;
}
Notice that the retbuf array is declared static, so that it will persist after plural returns, so that the pointer
that plural returns will remain valid.
Here is the change to (the newer version of) parser.c:
if(ac < 2)
cmd->object = NULL;
else if((cmd->object = findobject(actor, av[1])) == NULL)
{
Assignment #3 Answers
Assignment #3 Answers
Assignment #4
Assignment #4
Intermediate C Programming
UW Experimental College
Assignment #4
Handouts:
Assignment #4
Assignment #3 Answers
Class Notes, Chapter 20
Exercises:
1. This week we're going to add containers (so that objects can be put inside of other objects) and attributes, so that we can
remember things about objects (such as whether they're containers, whether they're open, etc.).
The code to implement this changes is scattered, and not all of it is in the distribution directories. You'll have to type some of
this new code in yourself.
Here is the modified object structure, along with a few definitions:
struct object
{
char name[MAXNAME];
unsigned int attrs;
struct object *contents;
struct object *lnext;
char *desc;
};
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
CONTAINER
CLOSABLE
OPEN
HEAVY
BROKEN
TOOL
SOFT
SHARP
LOCK
KEY
LOCKED
TRANSPARENT
IMMOBILE
/*
/*
/*
/*
0x0001
0x0002
0x0004
0x0008
0x0010
0x0020
0x0040
0x0080
0x0100
0x0200
0x0400
0x0800
0x1000
Assignment #4
Here is new code, for commands.c, to implement new ``open,'' ``close,'' and ``put (in)'' commands:
else if(strcmp(verb, "open") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to open.\n");
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is already open.\n", objp->name);
return FALSE;
}
if(!(objp->attrs & CLOSABLE))
{
printf("You can't open the %s.\n", objp->name);
return FALSE;
}
objp->attrs |= OPEN;
printf("The %s is now open.\n", objp->name);
}
else if(strcmp(verb, "close") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to close.\n");
return FALSE;
}
if(!(objp->attrs & CLOSABLE))
{
printf("You can't close the %s.\n", objp->name);
return FALSE;
}
if(!Isopen(objp))
{
printf("The %s is already closed.\n", objp->name);
return FALSE;
}
objp->attrs &= ~OPEN;
printf("The %s is now closed.\n", objp->name);
}
else if(strcmp(verb, "put") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to put.\n");
return FALSE;
}
if(!contains(player->contents, objp))
{
printf("You don't have the %s.\n", objp->name);
http://www.eskimo.com/~scs/cclass/asgn.int/PS4.html (2 of 7) [22/07/2003 5:38:15 PM]
Assignment #4
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "in") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me where to put the %s.\n",
objp->name);
return FALSE;
}
if(!Iscontainer(cmd->xobject))
{
printf("You can't put things in the %s.\n",
cmd->xobject->name);
return FALSE;
}
if((cmd->xobject->attrs & CLOSABLE) && !Isopen(cmd->xobject))
{
printf("The %s is closed.\n", cmd->xobject->name);
return FALSE;
}
if(!putobject(player, objp, cmd->xobject))
{
/* shouldn't happen */
printf("You can't put the %s in the %s!\n",
objp->name, cmd->xobject->name);
return FALSE;
}
printf("Now the %s is in the %s.\n",
objp->name, cmd->xobject->name);
}
(This code is in the file week4/commands.xc.)
Here is a new version of newobject, for object.c, that initializes the new object attribute and contents fields. (It's also
missing any required initialization of the new long description field; make sure you preserve yours.)
struct object *
newobject(char *name)
{
struct object *objp;
if(nobjects >= MAXOBJECTS)
{
fprintf(stderr, "too many objects\n");
exit(1);
}
objp = &objects[nobjects++];
strcpy(objp->name, name);
objp->lnext = NULL;
objp->attrs = 0;
objp->contents = NULL;
return objp;
http://www.eskimo.com/~scs/cclass/asgn.int/PS4.html (3 of 7) [22/07/2003 5:38:15 PM]
Assignment #4
}
Here is the new putobject function, for object.c. (It's in week4/object.xc.)
/* transfer object from actor to container */
putobject(struct actor *actp, struct object *objp, struct object *container)
{
struct object *lp;
struct object *prevlp = NULL;
for(lp = actp->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of actor's list */
if(lp == actp->contents)
/* head of list */
actp->contents = lp->lnext;
else
prevlp->lnext = lp->lnext;
/* splice into container's list */
lp->lnext = container->contents;
container->contents = lp;
return TRUE;
}
prevlp = lp;
}
/* didn't find it (error) */
return FALSE;
}
Here is new code for the parsedatafile function in io.c, to read attributes for objects:
else if(strcmp(av[0], "attribute") == 0)
{
if(currentobject == NULL)
continue;
if(strcmp(av[1], "container") == 0)
currentobject->attrs |= CONTAINER;
else if(strcmp(av[1], "closable") == 0)
currentobject->attrs |= CLOSABLE;
else if(strcmp(av[1], "open") == 0)
currentobject->attrs |= OPEN;
else if(strcmp(av[1], "heavy") == 0)
currentobject->attrs |= HEAVY;
else if(strcmp(av[1], "soft") == 0)
currentobject->attrs |= SOFT;
else if(strcmp(av[1], "sharp") == 0)
currentobject->attrs |= SHARP;
Assignment #4
}
(This code is incomplete; at some point you'll have to add cases for the other attributes defined in game.h.)
Here are the ``break'' and ``cut'' commands from commands.c, modified to make use of a few attributes:
else if(strcmp(verb, "break") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to break.\n");
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me what to break with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
if(!(cmd->xobject->attrs & HEAVY))
{
printf("I don't think the %s is heavy enough to break things
with.\n",
cmd->xobject->name);
return FALSE;
}
objp->attrs |= BROKEN;
printf("Oh, dear. Now the %s is broken.\n", objp->name);
}
else if(strcmp(verb, "cut") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to cut.\n");
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me what to cut with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
if(!(cmd->xobject->attrs & SHARP))
Assignment #4
{
printf("I don't think the %s is sharp enough to cut things with.\n",
cmd->xobject->name);
return FALSE;
}
if(!(objp->attrs & SOFT))
{
printf("I don't think you can cut the %s with the %s.\n",
objp->name, cmd->xobject->name);
return FALSE;
}
printf("The %s is now cut in two.\n", objp->name);
}
Try to get all of this code integrated, compiled, and working. You will probably have to add a few prototypes to game.h.
Once it compiles, before you can play with the new features, you will have to make some changes to the data file,
dungeon.dat, too. Make the hammer heavy by adding the line
attribute heavy
just after the line
object hammer
Make the kettle a container by adding the line
attribute container
just after the line
object kettle
Add a knife object by adding the lines
object knife
attribute sharp
in one of the rooms. Add a box object by adding the lines
object box
attribute closable
Now you should be able to break things with the hammer, and put things into the kettle or box (once you open the box). The
knife won't cut things unless they're soft, so add an object with
attribute soft
and try cutting it with the knife. Once you put things into containers, they'll seem to disappear; we'll fix that in the next two
exercises.
With these changes, the game suddenly explodes in potential complexity. There are lots and lots of features you'll be able to
add, limited only by your own creativity and the amount of time you care to spend/waste. I've suggested several changes and
http://www.eskimo.com/~scs/cclass/asgn.int/PS4.html (6 of 7) [22/07/2003 5:38:15 PM]
Assignment #4
improvements below, although you don't have to make all of them, and you're not limited to these suggestions, either. (You
won't be able to add too many more attributes, though, without possibly running out of bits in an int. We'll start using a more
open-ended scheme for implementing attributes next time. In the mean time, you could make the attrs field an unsigned
long int if you were desperate to add more attributes.)
2. Modify the findobject function in object.c so that it can find objects when they're inside containers (both in the
actor's possession and sitting in the room). The original implementation of findobject has separate loops for searching
the actor's list of objects and the room's list of objects. But now, when an object (in either list) is a container, we'll need to
search through its list of (contained) objects, too. It will be easiest if you add a new function, findobj2, which searches for
an object in any list of objects. The code in findobj2 will be just like the old code in findobject for searching the
actor's and room's lists; in fact, once you write findobj2, you'll be able to replace the actor- and room-searching code with
simple calls to
findobj2(actp->contents, name);
and
findobj2(actp->location->contents, name);
3.
4.
5.
6.
7.
8.
Ideally, your scheme should work even to find objects within objects within objects (or deeper); this will involve recursive
calls to findobj2.
Modify the takeobject function in object.c so that it can also take objects which are sitting in containers (both in the
actor's possession and sitting in the room). If a container is CLOSABLE, make sure it's OPEN!
Rewrite the ``examine'' command (in commands.c) to mention some of the relevant attributes (OPEN, BROKEN, etc.) of the
object being examined. (Hint: this is a good opportunity to use the conditional or ?: operator.)
Modify the listobjects function in object.c to list the contents (if any) of objects which are containers. Should it
print the contents of closed containers? (In a more complicated game, we might worry about transparency...)
Implement objects which can be locked (if they have the LOCK attribute). Add a ``lock'' command to lock them (i.e. to set the
LOCKED attribute) and an ``unlock'' command to unlock them. Both ``lock'' and ``unlock'' should require that the user use a
tool which has the KEY attribute. (In other words, ``lock strongbox with key'' should work, but ``lock strongbox with broom''
should not work. But don't worry about trying to make certain keys fit only certain locks.) Don't let objects be opened if they
have the LOCK attribute and are locked. (You'll have to update the attribute-reading code in io.c to handle the LOCK, KEY,
and LOCKED attributes.)
Implement a ``fix'' command which will let the user fix broken objects, as long as the preposition ``with'' appears and the
implement is an object with the TOOL attribute. (You'll have to update the attribute-reading code in io.c to handle the
TOOL attribute.)
Implement immobile objects that can't be picked up. (You'll have to update the attribute-reading code in io.c to handle the
IMMOBILE attribute.)
Assignment #4 Answers
Assignment #4 Answers
Intermediate C Programming
UW Experimental College
Assignment #4 ANSWERS
Exercise 2. Modify the findobject function in object.c so that it can find objects when they're inside
containers.
As suggested in the assignment, we'll need to break the old findobject function up into two functions:
findobject and findobj2. findobject accepts an actor and the name of an object to find (as before);
findobj2 accepts a pointer to a list of objects, and the name of an object to find.
static struct object *findobj2(struct object *, char *);
struct object *
findobject(struct actor *actp, char *name)
{
struct object *lp;
/* first look in actor's possessions: */
lp = findobj2(actp->contents, name);
if(lp != NULL)
return lp;
/* now look in surrounding room: */
if(actp->location != NULL)
{
lp = findobj2(actp->location->contents, name);
if(lp != NULL)
return lp;
}
/* didn't find it */
return NULL;
}
/* find a named object in an object list */
/* (return NULL if not found) */
static struct object *
findobj2(struct object *list, char *name)
{
http://www.eskimo.com/~scs/cclass/asgn.int/PS4a.html (1 of 11) [22/07/2003 5:38:18 PM]
Assignment #4 Answers
Assignment #4 Answers
{
printf("The %s is closed.\n", containerp->name);
return FALSE;
}
/* re-find in container's list, for splicing */
prevlp = NULL;
for(lp = containerp->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of room's list */
if(lp == containerp->contents) /* head of list */
containerp->contents = lp->lnext;
else
prevlp->lnext = lp->lnext;
/* splice into actor's list */
lp->lnext = actp->contents;
actp->contents = lp;
return TRUE;
}
prevlp = lp;
}
}
As the comment indicates, the test for a closed container probably belongs up in commands.c. (That's where the
other similar tests are; the functions in object.c are supposed to be low-level, and just manipulate data structures.)
But, because of the clumsy way we're currently handling actors, rooms, objects, and containers, it's hard to do it right.
(You may not yet have realized the clumsiness of the current scheme. The problem is that actors are really objects and
rooms are really containers, and since containers are objects, rooms are objects, too. We'll have more to say about this
issue later.)
Here are findcontainer and findcont2:
/* find an object's container (in actor's possesion, or room) */
/* (return NULL if not found) */
static struct object *
findcontainer(struct object *objp, struct actor *actp, struct room *roomp)
{
struct object *lp, *lp2;
/* first look in possessions: */
Assignment #4 Answers
Assignment #4 Answers
Assignment #4 Answers
}
There are several conditions under which this code prints something ``special'' about the object being examined: if the
object has a long description, if the object is a container, if the object has certain attributes, etc. Whenever the code
prints one of these messages (i.e. under any circumstances) it sets the Boolean variable printedsomething to
TRUE. At the very end, if we haven't found anything interesting to print (i.e. if printedsomething is still
FALSE), we fall back on the generic ``You see nothing special'' message.
In simpler cases, it's nice to arrange conditionals so that you don't need extra Boolean variables. Here, however, the
condition under which we print ``You see nothing special'' would be so complicated if we tried to express it directly
that it's much easier to just use the little printedsomething variable. (Among other things, using
printedsomething means that it will be easier to add more attribute printouts later, as long as they all set
printedsomething, too.)
Exercise 5. Modify the listobjects function in object.c to list the contents of objects which are containers.
Here is the simple, straightforward implementation:
void
listobjects(struct object *list)
{
struct object *lp;
for(lp = list; lp != NULL; lp = lp->lnext)
{
printf("%s\n", lp->name);
if(lp->contents != NULL)
{
printf("The %s contains:\n", lp->name);
listobjects(lp->contents);
}
}
}
However, this won't look too good, because when we list, say, the contents of a room, and the room contains a
container and some other objects, the list of objects in the container won't be demarcated from the other objects in the
room. So, a fancier solution would be to indent each container's list relative to the surrounding list. To do this, we add
a ``depth'' parameter which keeps track (with each recursive call) of how deeply the objects and containers we're
listing are. The depth obviously starts out as 0, and to keep all the old callers of listobjects from having to add
this new argument, we rename listobjects as listobjs2, and then have a new, stub version of
listobjects (which everyone else continues to call) which simply calls listobjs2, starting off the recursive
chain at a depth of 0. (Since listobjs2 is only called by listobjects, it's declared static.)
static void listobjs2(struct object *, int);
void
listobjects(struct object *list)
{
http://www.eskimo.com/~scs/cclass/asgn.int/PS4a.html (6 of 11) [22/07/2003 5:38:18 PM]
Assignment #4 Answers
listobjs2(list, 0);
}
static void
listobjs2(struct object *list, int depth)
{
struct object *lp;
for(lp = list; lp != NULL; lp = lp->lnext)
{
printf("%*s%s\n", depth, "", lp->name);
if(lp->contents != NULL)
{
printf("%*sThe %s contains:\n", depth, "", lp->name);
listobjs2(lp->contents, depth + 1);
}
}
}
The indentation is done using a trick: the "%*s" format means to print a string in a field of a given width, where the
width is taken from printf's argument list. The string we ask to be printed is the empty string, because all we want
is a certain number of spaces (that is, the spaces which printf will add to pad our string out to the requested width).
But (once we know the trick, anyway) this is considerably easier than having to write little loops which print a certain
number of spaces.
Exercise 6. Implement objects which can be locked.
Here are the new ``lock'' and ``unlock'' commands:
else if(strcmp(verb, "lock") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to lock.\n");
return FALSE;
}
if(!(objp->attrs & LOCK))
{
printf("You can't lock the %s.\n", objp->name);
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is open.\n", objp->name);
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
http://www.eskimo.com/~scs/cclass/asgn.int/PS4a.html (7 of 11) [22/07/2003 5:38:18 PM]
Assignment #4 Answers
{
printf("You must tell me what to lock with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
if(!(cmd->xobject->attrs & KEY))
{
printf("The %s won't lock the %s.\n",
cmd->xobject->name, objp->name);
return FALSE;
}
if(objp->attrs & LOCKED)
{
printf("The %s is already locked.\n", objp->name);
return FALSE;
}
objp->attrs |= LOCKED;
printf("The %s is now locked.\n", objp->name);
}
else if(strcmp(verb, "unlock") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to unlock.\n");
return FALSE;
}
if(!(objp->attrs & LOCK))
{
printf("You can't unlock the %s.\n", objp->name);
return FALSE;
}
if(cmd->preposition == NULL || strcmp(cmd->preposition, "with") != 0 ||
cmd->xobject == NULL)
{
printf("You must tell me what to unlock with.\n");
return FALSE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FALSE;
}
if(!(cmd->xobject->attrs & KEY))
http://www.eskimo.com/~scs/cclass/asgn.int/PS4a.html (8 of 11) [22/07/2003 5:38:18 PM]
Assignment #4 Answers
{
printf("The %s won't unlock the %s.\n",
cmd->xobject->name, objp->name);
return FALSE;
}
if(!(objp->attrs & LOCKED))
{
printf("The %s is already unlocked.\n", objp->name);
return FALSE;
}
objp->attrs &= ~LOCKED;
printf("The %s is now unlocked.\n", objp->name);
}
Here is the modified ``open'' command:
else if(strcmp(verb, "open") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to open.\n");
return FALSE;
}
if(Isopen(objp))
{
printf("The %s is already open.\n", objp->name);
return FALSE;
}
if(!(objp->attrs & CLOSABLE))
{
printf("You can't open the %s.\n", objp->name);
return FALSE;
}
if((objp->attrs & LOCK) && (objp->attrs & LOCKED))
{
printf("The %s is locked.\n", objp->name);
return FALSE;
}
objp->attrs |= OPEN;
printf("The %s is now open.\n", objp->name);
}
(The ``close'' command doesn't need modifying; we'll let open containers and doors swing and latch closed even if
they were already locked.)
Finally, here are a few more lines for io.c, to read the new attributes needed by this and the following two exercises:
Assignment #4 Answers
Assignment #4 Answers
}
Exercise 8. Implement immobile objects that can't be picked up.
Here is the modified ``take'' command:
else if(strcmp(verb, "take") == 0)
{
if(objp == NULL)
{
printf("You must tell me what to take.\n");
return FALSE;
}
if(contains(player->contents, objp))
{
printf("You already have the %s.\n", objp->name);
return FALSE;
}
if(objp->attrs & IMMOBILE)
{
printf("The %s cannot be picked up.\n", objp->name);
return FALSE;
}
if(!takeobject(player, objp))
{
/* shouldn't happen */
printf("You can't pick up the %s.\n", objp->name);
return FALSE;
}
printf("Taken.\n");
Assignment #5
Assignment #5
Intermediate C Programming
UW Experimental College
Assignment #5
Handouts:
Assignment #5
Assignment #4 Answers
Class Notes, Chapter 21
Class Notes, Chapter 22
Exercises:
1. Bit-masks (such as the attrs field we added to the object structure last time) are a perfectly reasonable data
structure to use for keeping track of a relatively small, well-defined set of attributes. However, as we've already seen,
once we start trying to keep track of the qualities and states of various objects in the game, we quickly find ourselves
using a welter of attributes. There aren't enough bits in an int (or even a long int) to keep track of all the
attributes we might eventually want. Furthermore, it becomes a nuisance to have to #define a new mask in
game.h every time we invent a new attribute, and to add another if clause to parsedatafile in io.c to map
the attribute line in the data file (a string) to the corresponding mask constant.
Therefore, after using it for only a week or two, we're going to rip out the bitmask-based attribute scheme and replace
it with something more open-ended and flexible. Any object may have a list of attributes, with each attribute
represented by an arbitrary string. We'll write a few functions for setting, testing, and clearing these attributes, so that
they'll be convenient for the rest of the program to manipulate. While we're at it, we'll get some more practice doing
dynamic memory allocation and using linked lists.
Since, under this new scheme, an object may have arbitrarily many attributes, and since each attribute will be an
arbitrary-length string, we won't be able to use fixed-size data structures. We'll use dynamically-allocated ones,
instead, which means that we'll be calling malloc a lot. A cardinal rule of using malloc is that you must check its
return value to make sure that it did not return a null pointer. If the program ever runs out of memory, or some other
problem causes malloc to return a null pointer, and if your program accidentally uses that null pointer as if it points
to usable memory, your program can and will fail in mysterious ways. If, on the other hand, you check malloc's
return value, and print a distinct error message when it fails, you'll at least know more or less exactly what the
problem is.
If you have many places in your program where you're doing dynamic allocation, it may become a nuisance to have to
check return values at all of those places. Therefore, a popular strategy is to implement a wrapper function around
malloc. The one we'll be using looks like this:
#include <stdio.h>
#include <stdlib.h>
#include "chkmalloc.h"
void *
chkmalloc(size_t sz)
http://www.eskimo.com/~scs/cclass/asgn.int/PS5.html (1 of 7) [22/07/2003 5:38:22 PM]
Assignment #5
{
void *ret = malloc(sz);
if(ret == NULL)
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
return ret;
}
All chkmalloc does is call malloc, check its return value, and return it if it's not NULL. (size_t is an ANSI C
type for representing the sizes of objects, and void * is the generic pointer type which malloc and related
functions return.) But since chkmalloc never returns a null pointer, its caller never has to check, but can begin
using the pointer immediately. chkmalloc can guarantee never to return a null pointer not because it implements
some kind of a magical infinite memory space, but rather because it simply exits, after printing an error message, if
malloc fails. Thus, all tests for malloc failures (and our strategy for dealing with those failures) are centralized in
chkmalloc().
The ``strategy'' of printing an error message and exiting when malloc fails is an expedient one, but it would not be
ideal for all programs. It assumes that (a) an actual out-of-memory condition is unlikely, and (b) the program won't
have any cleanup to do and won't mind if it's summarily terminated. This strategy would not, for example, be
acceptable for a text editor: the editor might use arbitrary amounts of memory when editing a large file, and the user
would certainly not appreciate being dumped out, without having the opportunity to save any work, after trying to add
(say) the 1,000,001st character to a huge file which had just been typed in from scratch over many hours. However,
for our little adventure game, this is a perfectly adequate strategy, for now at least.
You might argue that, if out-of-memory conditions are unlikely, we might as well skip chkmalloc, call malloc
directly, and not check its return value. This would be a bad idea, because while actual out-of-memory conditions
may be rare, other kinds of malloc failures are not so rare, especially in a program under development. If you
misuse the memory which malloc gives you, perhaps by asking for 16 bytes of memory and then writing 17
characters to it (and this is all too easy to do), malloc tends to notice, or at least to be broken by your carelessness,
and will return a null pointer next time you call it. When malloc returns a null pointer for this reason, it can be
difficult to track down the actual error (because it typically occurred somewhere in your code before the call to
malloc that failed), but if we blindly used the null pointer which malloc returned to us, we'd only defer the
eventual crash even farther, and it might be quite mysterious, much more so than a definitive ``out of memory''
message. Therefore, using a wrapper function like chkmalloc is a definite improvement, because we get error
messages as soon as malloc fails, and we don't have to scatter tests all through our code to get them.
All we do have to do, anywhere we call chkmalloc, is #include "chkmalloc.h", which contains
chkmalloc's external function prototype declaration, as well as a second convenience function which we'll meet in
a minute:
extern void *chkmalloc(size_t);
extern char *chkstrdup(char *);
Both chkmalloc.c and chkmalloc.h are in the week5 subdirectory of the source directories.
With chkmalloc in hand, we can begin implementing the new attribute scheme. First, we define this list structure:
struct list
{
Assignment #5
char *item;
struct list *next;
};
Then, we rewrite the object structure to use a list, instead of an unsigned int, for attrs:
struct object
{
char name[MAXNAME];
struct list *attrs;
struct object *contents;
struct object *lnext;
/*
/*
/*
/*
char *desc;
};
#define Iscontainer(o) hasattr(o, "container")
#define Isopen(o) hasattr(o, "open")
Notice that we have also rewritten the Iscontainer() and Isopen() macros, to use the new hasattr
function, which we'll see in a minute. (This suggests another benefit of having used those macros: none of the code
that ``called'' Iscontainer() or Isopen() will need rewriting. As we'll see, though, the code that has been
using raw bit operations to test and set the other attributes will need rewriting.) The old attribute bits (CONTAINER,
OPEN, etc.) are no longer needed.
We rewrite the newobject function in object.c slightly, to initialize attrs to a null pointer:
struct object *
newobject(char *name)
{
struct object *objp;
if(nobjects >= MAXOBJECTS)
{
fprintf(stderr, "too many objects\n");
exit(1);
}
objp = &objects[nobjects++];
strcpy(objp->name, name);
objp->lnext = NULL;
objp->attrs = NULL;
objp->contents = NULL;
objp->desc = NULL;
return objp;
}
(Actually, we could have left it as objp->attrs = 0, because 0 is also an acceptable null pointer constant.)
Assignment #5
Now, here are the three new functions for testing, setting, and clearing (``unsetting'') attribute strings:
/* see if the object has the attribute */
int
hasattr(struct object *objp, char *attr)
{
struct list *lp;
for(lp = objp->attrs; lp != NULL; lp = lp->next)
{
if(strcmp(lp->item, attr) == 0)
return TRUE;
}
return FALSE;
}
/* set an attribute of an object (if it's not set already) */
void
setattr(struct object *objp, char *attr)
{
struct list *lp;
if(hasattr(objp, attr))
return;
lp = chkmalloc(sizeof(struct list));
lp->item = chkstrdup(attr);
lp->next = objp->attrs;
objp->attrs = lp;
}
/* clear an attribute of an object */
void
unsetattr(struct object *objp, char *attr)
{
struct list *lp;
struct list *prevlp;
for(lp = objp->attrs; lp != NULL; lp = lp->next)
{
if(strcmp(lp->item, attr) == 0)
{
if(lp == objp->attrs)
objp->attrs = lp->next;
else
prevlp->next = lp->next;
free(lp->item);
free(lp);
return;
http://www.eskimo.com/~scs/cclass/asgn.int/PS5.html (4 of 7) [22/07/2003 5:38:22 PM]
Assignment #5
}
prevlp = lp;
}
}
(These functions are in the file object.xc in the week5 subdirectory.)
hasattr returns TRUE if an object has a certain attribute; it simply searches through the object's attribute list
looking for a matching string. setattr sets an attribute; if the object does not already have the attribute, setattr
allocates a new list structure, by allocating a new list node, allocating and copying the string, and splicing the new
node into the attribute list. unsetattr clears an attribute by finding it in the list and freeing both the list node
structure and the string (if a matching attribute is found).
To explain the call to chkstrdup in setattr, we must say and think a little more about memory allocation. It
turns out that, to be on the safe side, setattr must allocate memory for the string defining the attribute. It cannot
assume that the string which was passed to it was allocated in such a way that it would be guaranteed to persist for the
life of this attribute on this object. For example, when we're reading a data file, the passed-in attribute string will be
sitting in a buffer holding one line we've just read from the data file, and that buffer will be overwritten when we read
the next line. Even if the passed-in string were allocated in such a way that it would stick around, setattr still
couldn't use it directly, but would still have to make a copy, because unsetattr is always going to free the string,
so setattr better have allocated it. Many times, the string passed to setattr will be a pointer to a string constant
in the source code, and if setattr simply copied the pointer rather than allocating and copying the string,
unsetattr might later try to free that pointer, an operation which would fail since the pointer wasn't originally
obtained from malloc.
setattr could call strlen to get the length of the string, add 1 for the terminating \0, call malloc or
chkmalloc, and copy the string in, but since this is a common pattern (and since it's easy to forget to add 1), we
encapsulate it in the function chkstrdup. chkstrdup accepts a string, and returns a pointer to malloc'ed
memory holding a copy of the string. The ``chk'' in its name reflects the fact that, like chkmalloc, it never returns
a null pointer, even when malloc fails. Here is chkstrdup's definition:
#include <string.h>
char *
chkstrdup(char *str)
{
char *ret = chkmalloc(strlen(str) + 1);
strcpy(ret, str);
return ret;
}
(chkstrdup is also in chkmalloc.c.)
Now that we have the hasattr, setattr, and unsetattr functions, we unfortunately have quite a few changes
to make. Everywhere we have the pattern
object->attrs & ATTRIBUTE
(where object is any struct object pointer and ATTRIBUTE is any attribute), we must replace it with
hasattr(object, "attribute")
Assignment #5
So, exercise 1 (which all of this text so far has been an elaborate prelude to) is to add the functions and source files
mentioned so far, change all of the code that uses attrs to use the new hasattr, setattr, and unsetattr
functions (the changes should be confined to commands.c, and to one line in object.c), and make sure that the
program still works.
2. If you didn't use dynamically-allocated memory to hold long object and room descriptions, make that change now.
3. Rewrite newobject in object.c and newroom in rooms.c to dynamically allocate new structures, rather than
parceling them out of the static objects and rooms arrays. (Eliminate those arrays.)
Eliminating the rooms array has one unintended consequence. Although using arrays to hold the objects and rooms
was generally a nuisance in the long run, we did take advantage of it in one situation: during the initial setup of the
dungeon, while reading the data file, we hooked up named room exits to other rooms bu calling the findroom
function. findroom had to be able to find a named room anywhere in the dungeon, which meant that it needed a
way to scan through the list of all rooms. When rooms were held in an array, scanning through the array was easy.
When we move to dynamically allocated room structures, we must preserve this ability.
The fact that what we need is the ability to ``scan through the list of all rooms'' suggests the solution: we'll keep a
linked list of all the rooms. We don't need to define and implement a new list structure or anything; we can just add a
``next'' pointer to struct room. This next pointer won't be used for any purpose which the user playing the game
will be able to see--as far as the user is concerned, the only connections (pointers) between rooms are the ones
Assignment #5
described by the room's explicit exits. This new next field for struct room will be for our own, internal, behindthe-scenes use, so that we can implement this list of all rooms.
4. Improve the code in io.c so that room exit lists can be placed directly in the room descriptions, rather than at the
end. If an exit refers to a room you've already seen, you can hook up the exit immediately. But if the exit refers to a
room you haven't seen yet, you'll have to allocate some temporary data structure to keep track of the source room,
direction, and destination. Then, after you've read the entire data file, you can go back through that list of unresolved
exits, since all of the destinations should exist by that point. (If any don't, you can print an error message.)
Assignment #5 Answers
Assignment #5 Answers
Intermediate C Programming
UW Experimental College
Assignment #5 ANSWERS
Exercise 2. If you didn't use dynamically-allocated memory to hold long object and room descriptions, make that
change now.
See the published answer to assignment 2, exercise 2.
Exercise 3. Rewrite newobject and newroom to dynamically allocate new structures, rather than parceling
them out of the static objects and rooms arrays.
Rewriting newobject in object.c is easy enough:
struct object *
newobject(char *name)
{
struct object *objp;
objp = chkmalloc(sizeof(struct object));
strcpy(objp->name, name);
objp->lnext = NULL;
objp->attrs = NULL;
objp->contents = NULL;
objp->desc = NULL;
return objp;
}
However, it isn't quite so simple for newroom, because we need a way of looking through all the rooms we've got,
for example in the findroom function. Therefore, we can't allocate room structures in isolation.
As suggested in the assignment, one possibility is to keep a linked list of all rooms allocated, by adding an extra
``next'' field to struct room in game.h:
struct room
{
char name[MAXNAME];
struct object *contents;
struct room *exits[NEXITS];
http://www.eskimo.com/~scs/cclass/asgn.int/PS5a.html (1 of 6) [22/07/2003 5:38:26 PM]
Assignment #5 Answers
char *desc;
struct room *next;
};
/* long description */
/* list of all rooms */
strcpy(roomp->name, name);
roomp->contents = NULL;
for(i = 0; i < NEXITS; i++)
roomp->exits[i] = NULL;
roomp->desc = NULL;
return roomp;
}
Splicing the new room into the list of all rooms is straightforward.
findroom must be rewritten slightly, to walk over the new room list rather than the old rooms array:
struct room *
findroom(char *name)
{
struct room *roomp;
for(roomp = roomlist; roomp != NULL; roomp = roomp->next)
{
if(strcmp(roomp->name, name) == 0)
return roomp;
}
return NULL;
}
We also have to decide what to do with the getentryroom function. (This is the function that gets called to
decide which room the actor should initially be placed in.) It used to return, rather arbitrarily, the first room in the
http://www.eskimo.com/~scs/cclass/asgn.int/PS5a.html (2 of 6) [22/07/2003 5:38:26 PM]
Assignment #5 Answers
rooms array, which always happened to correspond to the first room in the data file, which was a reasonable
choice. If we make the obvious change to getentryroom, and have it return the ``first'' room in the new room
list:
struct room *
getentryroom(void)
{
return roomlist;
}
/* temporary */
it will not be so reasonable a choice, because our easy implementation of the list-splicing code in newroom above
adds new rooms to the head of the list, such that the first room added, i.e. the first room in the data file, ends up at
the end of the list. Depending on what your data file looks like, the game with the modifications shown so far will
start out by dumping the player unceremoniously onto the back porch or into the basement (which might actually
end up being an advantage for the player, if the basement tunnel is where the treasure is!).
If you wish, you can rewrite newroom to place new rooms at the end of the list, or getentryroom to return the
tail of the list, to solve this problem. However, a better fix would be to allow the data file to explicitly specify what
the entry room should be, rather than making the game code assume anything. We'll leave that for another exercise.
Exercise 4. Improve the code in io.c so that room exit lists can be placed directly in the room descriptions,
rather than at the end.
The basic change is to the parsedatafile function. Whereas it used to parse lines at the end of the data file
like
roomexits kitchen s:hall e:stairway n:porch
we'll now make it parse lines like
exit s hall
exit e stairway
exit n porch
which will occur within the description of the particular room (in this case, the kitchen) itself. (I've simplified
things by putting three exits on three separate lines and eliminating the colon syntax; there was no particular reason
for me to have done it in that more complicated way in the first place.)
Here is the new code for parsedatafile:
else if(strcmp(av[0], "exit") == 0)
{
struct room *roomp;
if(ac < 3)
{
fprintf(stderr, "missing exit or room name\n");
http://www.eskimo.com/~scs/cclass/asgn.int/PS5a.html (3 of 6) [22/07/2003 5:38:26 PM]
Assignment #5 Answers
continue;
}
if(currentroom == NULL)
{
fprintf(stderr, "exit not in room\n");
continue;
}
roomp = findroom(av[2]);
if(roomp != NULL)
{
/* already have room, so connect */
connectexit(currentroom, av[1], roomp);
}
else
{
/* haven't seen room yet; stash for later */
stashexit(currentroom, av[1], av[2]);
}
}
This makes use of two auxiliary functions:
static void
connectexit(struct room *room, char *dirname, struct room *nextroom)
{
int dir;
if(strcmp(dirname, "n") == 0)
dir = NORTH;
else if(strcmp(dirname, "e") == 0)
dir = EAST;
else if(strcmp(dirname, "w") == 0)
dir = WEST;
else if(strcmp(dirname, "s") == 0)
dir = SOUTH;
else if(strcmp(dirname, "ne") == 0)
dir = NORTHEAST;
else if(strcmp(dirname, "se") == 0)
dir = SOUTHEAST;
else if(strcmp(dirname, "nw") == 0)
dir = NORTHWEST;
else if(strcmp(dirname, "sw") == 0)
dir = SOUTHWEST;
else if(strcmp(dirname, "u") == 0)
dir = UP;
else if(strcmp(dirname, "d") == 0)
dir = DOWN;
else
{
http://www.eskimo.com/~scs/cclass/asgn.int/PS5a.html (4 of 6) [22/07/2003 5:38:26 PM]
Assignment #5 Answers
Assignment #5 Answers
return FALSE;
}
parsedatafile(fp);
resolveexits();
fclose(fp);
return TRUE;
}
And here is the third auxiliary function:
static void
resolveexits()
{
struct stashedexit *ep, *nextep;
for(ep = stashedexits; ep != NULL; ep = nextep)
{
struct room *roomp = findroom(ep->nextroom);
if(roomp == NULL)
fprintf(stderr, "still no such room \"%s\"\n", ep->nextroom);
else
connectexit(ep->room, ep->dirname, roomp);
nextep = ep->next;
free(ep->dirname);
free(ep->nextroom);
free(ep);
}
stashedexits = NULL;
}
One aspect of this last function deserves mention. Why does it not use a more usual list-traversal loop, such as
for(ep = stashedexits; ep != NULL; ep = ep->next)
What's with that temporary variable nextep? This loop is doing two things at once: traversing the linked list in
order to resolve the stashed exits, and freeing the list as it goes. But when it frees the node (the struct
stashedexit) it's working on, it also frees--and hence loses--the next pointer which that structure contains!
Therefore, it makes a copy of the next pointer before it frees the structure.
Assignment #6
Assignment #6
Intermediate C Programming
UW Experimental College
Assignment #6
Handouts:
Assignment #6
Assignment #5 Answers
Class Notes, Chapter 23
Class Notes, Chapter 24
Exercises:
1. In this exercise, we'll almost completely rewrite commands.c so that instead of having a giant if/else
chain, with many calls to strcmp and the code for all the actions interspersed, there will be one function per
command, a table relating command verbs to the corresponding functions, and some much simpler code
which will look up a command verb in the table and decide which function to call. The functions in the table
will, of course, be referred to by function pointers.
In the week6 subdirectory of the distribution materials are two new files, cmdtab.c and cmdtab.h, and
a modified (though incomplete) copy of commands.c. cmdtab.h defines the command table structure,
struct cmdtab:
struct cmdtab
{
char *name;
int (*func)(struct actor *, struct object *, struct sentence *);
};
extern struct cmdtab *findcmd(char *, struct cmdtab [], int);
With this structure in hand, the first thing to do is to declare a bunch of functions, one for each command,
and then build the ``command table,'' which is an array of struct cmdtab. All of the command functions
are going to take as arguments the actor (struct actor *), the object being acted upon (struct
object *, if any), and the complete sentence structure (struct sentence *) describing the
command we're trying to interpret/implement. (The object being acted upon is the direct object of the
sentence and could also be pulled out of the sentence structure; we're passing it as a separate argument
because that might make things more convenient for the command functions themselves.) Here, then, is the
command table, incorporating all the commands we've ever used (your copy of the game might not include
all of the commands referenced here):
http://www.eskimo.com/~scs/cclass/asgn.int/PS6.html (1 of 7) [22/07/2003 5:38:29 PM]
Assignment #6
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
static
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
Assignment #6
"fix",
"cut",
"read",
"open",
"close",
"lock",
"unlock",
"put",
"?",
"help",
"quit",
};
fixcmd,
cutcmd,
readcmd,
opencmd,
closecmd,
lockcmd,
unlockcmd,
putcmd,
helpcmd,
helpcmd,
quitcmd,
With this structure defined, the body of docommand is drastically simplified. In fact, since we'll break the
searching of the command table out to a separate function, all the new docommand has to do is call that
function, then (if the command is found) call the command function associated with the command:
docommand(struct actor *player, struct sentence *cmd)
{
struct cmdtab *cmdtp;
cmdtp = findcmd(cmd->verb, commands, Sizeofarray(commands));
if(cmdtp != NULL)
{
(*cmdtp->func)(player, cmd->object, cmd);
return TRUE;
}
else
{
printf("I don't know the word \"%s\".\n", cmd->verb);
return FALSE;
}
}
The new findcmd function wants to know (as its third argument) the number of entries in the command
table. It would be a nuisance if we had to count them, and we'd have to keep updating our count each time
we added an entry to the table. The built-in sizeof operator can tell us the size of the array, but it tells us
the size in bytes, while we want to know the number of entries. But the number of entries is the size of the
array in bytes divided by the size of one entry in bytes, or
sizeof(commands) / sizeof(commands[0])
(here we get a handle on ``the size of one entry'' by arbitrarily taking sizeof(commands[0])). Since
this is a common operation, I like to define a function-like macro to encapsulate it:
Assignment #6
The ``only'' thing left to do is to write all of the individual command functions. The code for all of them will
come from the corresponding if clause from the old, bulky version of docommand. For example, here is
takecmd:
static int
takecmd(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(objp == NULL)
{
printf("You must tell me what to take.\n");
return FALSE;
}
if(contains(player->contents, objp))
{
printf("You already have the %s.\n", objp->name);
return FALSE;
}
if(hasattr(objp, "immobile"))
{
printf("The %s cannot be picked up.\n", objp->name);
Assignment #6
return FALSE;
}
if(!takeobject(player, objp))
{
/* shouldn't happen */
printf("You can't pick up the %s.\n", objp->name);
return FALSE;
}
printf("Taken.\n");
return SUCCESS;
}
With one exception, all of the rest of the command functions are similar; they just have
static int
xxxcmd(struct actor *actor, struct object *objp,
struct sentence *cmd)
{
...
return SUCCESS;
}
wrapped around the code that used to be in one of the
else if(strcmp(verb, "xxx") == 0)
{
...
}
blocks.
(Although the copy of commands.c in the week6 subdirectory reflects this structure, you probably won't
be able to use that file directly, because it does not reflect all of the many changes and additions we've been
making to commands.c over the past several weeks.)
The integer code SUCCESS that each of these functions is returning isn't really used yet, so you don't have to
worry about what it's for. (Right now, docommand is ignoring the return value of the called command
function.)
The one command function that doesn't quite fit the pattern is dircmd, because it's called for any of the
direction commands, and has to take another look at the command verb to see what direction it represents:
static int
dircmd(struct actor *player, struct object *objp,
struct sentence *cmd)
{
Assignment #6
0)
0)
0)
0)
if(roomp == NULL)
{
printf("Where are you?\n");
return ERROR;
}
/* If the exit does not exist, or if gotoroom() fails,
/* the player can't go that way.
*/
*/
Assignment #6
your copy of game.h doesn't include them, you can put the lines
#define ERROR
0
#define SUCCESS 1
in it to #define these values.
2. Modify io.c to recognize a new entrypoint line in the data file, which will list the name of the room
which the actor should be placed in to start. What if the entrypoint line precedes the description of the
room it names? How should the code in io.c record the entry-point room so that getentryroom in
rooms.c can use it? (There are many answers to these questions; pick one, and implement it.)
3. Another part of the game that has the same sort of large, if/strcmp/else structure as did docommand in
commands.c is parsedatafile in io.c. Think about what it would take to use the cmdtab structure,
and the findcmd function, to streamline the code that processes lines in the data file.
(If you try it, you'll probably run into some difficulties which will make it rather hard to finish, so save a
copy of io.c before you start! In the answers handed out next week, I'll describe the difficulties, and outline
how they might be circumvented.)
Assignment #6 Answers
Assignment #6 Answers
Intermediate C Programming
UW Experimental College
Assignment #6 ANSWERS
Exercise 2. Modify io.c to recognize a new entrypoint line in the data file.
Here is the code I added to io.c. Rather than calling findroom right away, and face the possibility of the
room not being defined yet, I chose to just stash the name of the room away (as a string), and then try
looking it up after the data file had been completely read. I added this declaration at the beginning of
parsedatafile:
char *entryroom = NULL;
I added this case to the main if/else chain in parsedatafile:
else if(strcmp(av[0], "entrypoint") == 0)
{
struct room *roomp;
if(ac < 2)
{
fprintf(stderr, "missing entry room name\n");
continue;
}
/* don't bother to look up yet; just save name */
entryroom = chkstrdup(av[1]);
}
I added this code at the end of parsedatafile:
if(entryroom != NULL)
{
struct room *roomp = findroom(entryroom);
if(roomp != NULL)
setentryroom(roomp);
else
fprintf(stderr, "can't find entry room %s\n", entryroom);
}
Finally, I added the new setentryroom function in rooms.c:
http://www.eskimo.com/~scs/cclass/asgn.int/PS6a.html (1 of 3) [22/07/2003 5:38:30 PM]
Assignment #6 Answers
void
setentryroom(struct room *roomp)
{
entryroom = roomp;
}
Exercise 3. Think about what it would take to use the cmdtab structure, and the findcmd function, to
streamline the code that processes lines in the data file.
It would initially seem straightforward to take each case in the if/else chain, break it out to a separate
function, build a table (an array of struct cmdtab) linking the first words on lines in the data file to the
new functions for parsing those lines, and finally call findcmd (after reading each line) to decide which
function to call.
The first problem you might face, though, would be breaking up the old parsedatafile function. It
contains a number of local variables (especially currentroom and currentobject) which several of
the parsing cases need access to. Once you broke those parsing cases out into separate functions, they'd need
access to the (formerly local) variables somehow. You'd probably have to make them global, although you
could restrict them to the source file io.c by declaring them static.
The next problem you'd face would be deciding which information to pass to the broken-out functions. Most
of them use the av array to inspect the various arguments and other words which appeared on the line they're
parsing, but for at least one way I wrote the long description reading code, it needed to use the original copy
of the complete data file input line (that is, the line array), the copy that hadn't been broken apart by
getwords. The obvious information to pass to each parsing function is the ac count and the av array
(which are, not coincidentally, quite similar to the argc and argv with which main is traditionally called).
Would you have to pass along the unbroken line to all functions, for the benefit of just the one or two that
needed it? (Remember, all the functions must accept the same argument list, because one function call, using
the function pointer in the func field of the matching struct cmdtab entry, will be calling all of them.)
Finally, once you decided what information to pass to each individual line parsing function, you'd discover
(if you've got strict prototype checking turned on in your compiler, and if you used cmdtab.c and
cmdtab.h as they appeared in last week's handout), that the compiler doesn't want you to create an array of
struct cmdtab with the func fields pointing at your shiny new functions, because the func field
(again, as it appeared in last week's handout), is specifically a pointer to a function accepting a struct
actor *, a struct object *, and a struct sentence *, which is almost certainly not the set of
data you chose to pass to each data file parsing function. You'd either have to revert the declaration of the
func field to
int (*func)();
(as it was on the disk) and turn off strict prototype checking, or write a second whole version of struct
cmdtab and a second whole version of findcmd that used func fields with a different prototype, or play
http://www.eskimo.com/~scs/cclass/asgn.int/PS6a.html (2 of 3) [22/07/2003 5:38:30 PM]
Assignment #6 Answers
Assignment #7
Assignment #7
Intermediate C Programming
UW Experimental College
Assignment #7
Handouts:
Assignment #7
Assignment #6 Answers
Class Notes, Chapter 25
Exercises:
1. Adding new command verbs to the game quickly becomes cumbersome. As long as they all go into commands.c,
they can potentially be invoked on any object and with any tool. So if a new command is (or ought to be) specific to
some tool or direct object, the code to implement it has to spend most of its time verifying that the overcreative player
isn't trying to apply the new command verb in some wildly imaginative but utterly inappropriate way (``sharpen gum
with feather''; ``pierce idea with boot''; etc.). It would be far better, in several respects, if scraps of code, for
implementing certain command verbs, could be attached directly to objects, so that they would only be invoked if
those objects were involved (and, additionally, so that the same verb might possibly perform different actions if
applied to different objects). The notion of attaching code to data structures is one of the fundamentals of Object
Oriented Programming, and it's a curious coincidence (or is it?) that the data structures which in this program we'd
like to attach code to are in fact what we call ``objects.''
We're going to add a new field, of type pointer-to-function, to our struct object. This field will contain a
pointer to a per-object function which handles one or more command verbs unique to that object. (If an object has no
unique verbs it wishes to process, its function pointer will be null.) Here is the new field for struct object in
game.h:
int (*func)(struct actor *, struct object *, struct sentence *);
Notice that func takes the same arguments as all the other command functions which we split commands.c up into
last week.
If an object contains some custom verb-handling code, it will be called from docommand in commands.c. Before
trying all of the default commands, it first sees if either of the two objects which a command might reference (the
direct object, or the object of the preposition) has its own function. If so, it calls that (those) function(s).
Now, those functions might or might not know how to handle the particular verb the user typed. If not, control should
flow through to the default, global command functions. (For example, you can still take or drop or examine just
about any object, even if it has its own special code for when you try to do other things with it, and the object's
function shouldn't have to--and couldn't--repeat all the other code for all the other verbs.) To coordinate things, then,
each function that tries to implement one of the user's commands can return various status codes.
There are four status codes (which go in game.h):
#define FAILURE
Assignment #7
#define SUCCESS
#define CONTINUE
#define ERROR
1
2
3
FAILURE means that the command completed, but unsuccessfully (the player couldn't do what he tried to do).
SUCCESS means that the command completed, successfully. CONTINUE means that the the function which was just
called did not implement the command, and that the game system (i.e. the docommand function) should keep trying,
by calling other functions (if any). Finally, ERROR indicates an internal error of some kind.
Here is the new code for docommand. This goes at the top of the function, before it calls findcmd:
if(cmd->object != NULL && cmd->object->func != NULL)
{
r = (*cmd->object->func)(player, cmd->object, cmd);
if(r != CONTINUE)
return r;
}
if(cmd->xobject != NULL && cmd->xobject->func != NULL)
{
r = (*cmd->xobject->func)(player, cmd->xobject, cmd);
if(r != CONTINUE)
return r;
}
Assignment #7
}
setattr(cmd->object, "broken");
printf("Oh, dear. Now the %s is broken.\n", cmd->object->name);
return SUCCESS;
}
int toolfunc(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(strcmp(cmd->verb, "fix") != 0)
return CONTINUE;
if(objp != cmd->xobject)
return CONTINUE;
if(cmd->object == NULL)
{
printf("You must tell me what to fix.\n");
return FAILURE;
}
if(!hasattr(cmd->object, "broken"))
{
printf("The %s is not broken.\n", cmd->object->name);
return FAILURE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You have no %s.\n", cmd->xobject->name);
return FAILURE;
}
unsetattr(cmd->object, "broken");
printf("Somehow you manage to fix the %s.\n", cmd->object->name);
return SUCCESS;
}
int cutfunc(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(strcmp(cmd->verb, "cut") != 0)
return CONTINUE;
if(objp != cmd->xobject)
return CONTINUE;
if(cmd->object == NULL)
{
printf("You must tell me what to cut.\n");
return FAILURE;
}
if(!contains(player->contents, cmd->xobject))
{
http://www.eskimo.com/~scs/cclass/asgn.int/PS7.html (3 of 7) [22/07/2003 5:38:34 PM]
Assignment #7
Assignment #7
which appears as the direct object, the objp parameter is a copy of cmd->object, as before. When they're called
for objects which are objects of prepositions, however, objp is a copy of that object pointer (see the new code for
docommand above). So these commands must typically check that they're being used in the way they expect--for
example, if hammerfunc didn't check to make sure that it was being used as a prepositional object, it might
incorrectly attempt to handle a nonsense sentence like
break hammer
The next step is to hook these functions up to the relevant objects. To do that, we'll need to invent a new object
descriptor line for the data file, and add some new code to readdatafile in io.c to parse it. We'll rig it up so
that we can say
object hammer
attribute heavy
func hmmerfunc
(We'll keep the attribute ``heavy'' on the hammer, because it might be useful elsewhere, although we won't be needing
it for the old breakcmd function any more.) The new code for io.c is pretty simple, but before we can write it, we
have to remember that there's a significant distinction between the name by which we know a function and the
internal address, or function pointer, by which the compiler knows it. When we write
func hammerfunc
in the data file, it's obvious to us that we want the hammer object's func pointer to be hooked up to hammerfunc,
but it is not at all obvious to the compiler. In particular, the compiler is not going to be looking at our data file at all!
Code in io.c is going to be looking at the data file, and it's going to be doing it as the program is running, well after
the compiler has finished its work. So we're going to have to ``play compiler'' just a bit, to match up the name of a
function with the function itself.
This will actually be rather simple to do, because matching up a name (i.e. a string) with a function is exactly what the
struct cmdtab structure and findcmd function do. So if we build a second array of cmdtab structures, holding
the names and function pointers for functions which it's appropriate to attach to objects, all we have to add to
readdatafile is this scrap of code:
else if(strcmp(av[0], "func") == 0)
{
struct cmdtab *cmdtp = findcmd(av[1], objfuncs, nobjfuncs);
if(cmdtp != NULL)
currentobject->func = cmdtp->func;
else
fprintf(stderr, "unknown object func %s\n", av[1]);
}
The new array of cmdtab structures is called objfuncs. I chose to define it at the end of the new source file
objfuncs.c:
struct cmdtab objfuncs[] =
{
"hammerfunc",
hammerfunc,
"toolfunc",
toolfunc,
http://www.eskimo.com/~scs/cclass/asgn.int/PS7.html (5 of 7) [22/07/2003 5:38:34 PM]
Assignment #7
"cutfunc",
"playfunc",
};
cutfunc,
playfunc,
Assignment #7
to try out the sample ``play'' function. Notice that playfunc makes you pick up the violin before playing it, but
since the piano is immobile, you can just walk up to it and start playing.
2. Rewrite the lockcmd and unlockcmd functions (from Assignment 4) as object functions.
3. Invent some more object functions and some objects for them to be attached to. Examples: weapons, other tools,
magic wands...
4. Since object functions (if present) are called before the default functions in commands.c, they can not only
supplement the commands in commands.c, they can also override (on a per-object basis) the default behavior of an
existing verb. Invent a shower object. (If your dungeon doesn't already have a bathroom, perhaps you'll need to add
that, too.) Rig it up so that if the player types
take shower
the game will print neither ``Taken'' nor ``The shower cannot be picked up'' but rather something cute like ``After
spending 20 luxurious minutes in the shower, singing three full-length show tunes, and using up all the hot water, you
emerge clean and refreshed.''
5. Since per-object functions are called before default functions, and since they can defer to those default functions by
returning CONTINUE, another kind of customization is possible. A per-object function can recognize a verb, do some
processing based upon it, but then return CONTINUE, so that the default machinery will also act. Implement a magic
sword object so that when the player picks it up, the messages ``As you pick it up, the sword briefly glows blue'' and
``Taken'' are printed. The first message is printed by the sword's custom function, and the second one by the normal
``take'' machinery in commands.c.
6. Implement a container of radioactive waste. Arrange that if the player tries to pick it up, the message ``the container is
far too dangerous to pick up'' is printed, unless the player is wearing a Hazmat suit, in which case the container can be
picked up normally. Also implement the Hazmat suit object. (You can either just have the player pick up the Hazmat
suit, or for extra credit, arrange that the Hazmat suit implement a custom ``wear'' verb.)
Assignment #7 Answers
Assignment #7 Answers
Intermediate C Programming
UW Experimental College
Assignment #7 ANSWERS
Assignment #7 Answers
if(strcmp(cmd->verb, "lock") == 0)
{
if(hasattr(cmd->object, "locked"))
{
printf("The %s is already locked.\n", cmd->object->name);
return FAILURE;
}
setattr(cmd->object, "locked");
printf("The %s is now locked.\n", cmd->object->name);
return SUCCESS;
}
else if(strcmp(cmd->verb, "unlock") == 0)
{
if(!hasattr(cmd->object, "locked"))
{
printf("The %s is already unlocked.\n", cmd->object->name);
return FAILURE;
}
unsetattr(cmd->object, "locked");
printf("The %s is now unlocked.\n", cmd->object->name);
return SUCCESS;
}
/* shouldn't get here */
return CONTINUE;
}
Here it is written so as to be attachable to keylike objects:
int keyfunc(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(strcmp(cmd->verb, "lock") != 0 && strcmp(cmd->verb, "unlock") != 0)
return CONTINUE;
if(objp != cmd->xobject)
return CONTINUE;
if(cmd->object == NULL)
{
printf("You must tell me what to %s.\n", cmd->verb);
return FAILURE;
}
if(!hasattr(cmd->object, "lock"))
{
printf("You can't %s the %s.\n", cmd->verb, cmd->object->name);
return FAILURE;
}
if(Isopen(cmd->object))
Assignment #7 Answers
{
printf("The %s is open.\n", cmd->object->name);
return FAILURE;
}
if(!contains(player->contents, cmd->xobject))
{
printf("You don't have the %s.\n", cmd->xobject->name);
return FAILURE;
}
if(strcmp(cmd->verb, "lock") == 0)
{
if(hasattr(cmd->object, "locked"))
{
printf("The %s is already locked.\n", cmd->object->name);
return FAILURE;
}
setattr(cmd->object, "locked");
printf("The %s is now locked.\n", cmd->object->name);
return SUCCESS;
}
else if(strcmp(cmd->verb, "unlock") == 0)
{
if(!hasattr(cmd->object, "locked"))
{
printf("The %s is already unlocked.\n", cmd->object->name);
return FAILURE;
}
unsetattr(cmd->object, "locked");
printf("The %s is now unlocked.\n", cmd->object->name);
return SUCCESS;
}
/* shouldn't get here */
return CONTINUE;
}
Exercise 4. Invent a (silly) shower object.
int showerfunc(struct actor *player, struct object *objp,
struct sentence *cmd)
{
if(strcmp(cmd->verb, "take") != 0)
return CONTINUE;
if(objp != cmd->object)
return CONTINUE;
printf("After spending 20 luxurious minutes in the shower,\n");
printf("singing three full-length show tunes, and using up\n");
http://www.eskimo.com/~scs/cclass/asgn.int/PS7a.html (3 of 5) [22/07/2003 5:38:37 PM]
Assignment #7 Answers
Assignment #7 Answers
Assignment #8
Assignment #8
Intermediate C Programming
UW Experimental College
Assignment #8
This handout presents several more potential additions and improvements to the game, not couched so much in terms of an
assignment, but more as food for thought.
1. Last week we added a func field to struct object, so that an object could contain a pointer to a function
implementing actions specific to that object. Another ability this gives us is to write new, object-specific code which
is intended to be fired up not directly, in response to a user's typed command, but rather, indirectly, at other spots in
the game where we're working with objects. We'll do this by letting objects optionally define special ``verbs'' (with
names beginning with periods, by convention) which we'll ``call'' when we need to.
Suppose we wanted the name or description of an object (printed in the listing of a room's contents or the actor's
possessions, or in response to the ``examine'' command) to vary, depending on the state of the object. (We've already
done that, in a crude way, by having the ``examine'' command look at a few of the object attributes we've defined.)
For example, we might want the player to be able to find an object which is described only as
a wad of rubberized fabric
until the player also finds the ``air pump'' object, and thinks to type
inflate wad with pump
at which point the object will be described as
an inflatable boat
To do this, we'll let an object define a pseudoverb ``.list''. Then, any time we would have printed the object's name
field in a list of objects, we'll let it print its own name, if it wants to. For example, the hypothetical rubber boat might
have object-specific code like this:
int boatfunc(struct actor *actp, struct object *objp,
struct sentence *cmd)
{
if(strcmp(cmd->verb, ".list") == 0)
{
if(hasattr(objp, "inflated"))
printf("an inflatable boat");
else
printf("a wad of rubberized fabric");
return SUCCESS;
}
return CONTINUE;
}
Assignment #8
When we're printing a list of objects, we'll have to ``call'' the ``.list'' command (more specifically, call the object's
func, if it has one, passing a command verb of ``.list''). Setting up one of these special-purpose commands as if it
were a ``sentence'' the user typed will be a tiny bit of a nuisance, so we'll write an intermediate function to do it:
int objcall(struct actor *actp, struct object *objp, char *command)
{
struct sentence cmd;
if(objp->func == NULL)
return ERROR;
cmd.verb = command;
cmd.object = objp;
cmd.preposition = NULL;
cmd.xobject = NULL;
return (*objp->func)(actp, objp, &cmd);
}
Now we can rewrite the object-listing code in object.c. Where it used to say something like
printf("%s\n", lp->name);
it can now say
/* if obj has a function, try letting it list itself */
if(lp->func != NULL && objcall(actp, lp, ".list") == SUCCESS)
printf("\n");
else
printf("%s\n", lp->name);
If objcall returns SUCCESS, the object has printed itself, and all we have to do is append the trailing newline.
Otherwise, we print the object's name, as before.
(This modification to the object-listing code has one problem. If the listobjects function in object.c is going
to call objcall in this way, it needs a pointer to the actor, but it doesn't have it. So we're going to have to rewrite
the listobjects function, and each of the places it's called, to pass the actor pointer as an additional argument. As
it happens, two of the other improvements suggested in this assignment end up requiring exactly the same change. It
turns out that listobjects probably should have been accepting the actor pointer as an extra argument all along, if
there are so many good reasons why it ends up needing it.)
There are several other situations where we can use custom, ``internal'' functions like these. We can arrange that an
object also be able to print its long description, by interpreting a ``.describe'' verb. (We'd rewrite the ``examine''
command to print an object's desc field only if the object didn't succeed at performing the ``.describe'' verb.) We
could also implement customized ``hooks'' into the action of picking up an object. In last week's assignment (exercise
5), we implemented an object with a custom ``take'' command which printed some text but then returned CONTINUE,
so that the rest of the default ``take'' machinery would handle the actual picking up of the object. But we might want
the custom ``take'' action (and any messages it prints) to happen after the default machinery has performed most of
the picking up of the object. To do this, we could define a new special verb ``.take2'', and rewrite the ``take'' code in
commands.c to call .take2 at the very end, after the call to takeobject.
2. When we started writing object-specific functions last week, we placed them in a file called objfuncs.c. Although
(as we've begun to see) it can be a fantastically powerful ability to allow objects to ``point at'' arbitrarily-complicated
functions written just for them, in practice, writing these functions will still be a nuisance. Suppose we've just been
playing with the data file, and we've come up with a fun new object to put in the dungeon for the player to play with,
and the object needs some new, special-purpose code to make it do its stuff. We'd have to write that code (in C, of
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (2 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
course), and then recompile the whole game, before we could hook the new function up to the new object. The whole
point of the data file is that the information about objects (and the rest of the dungeon) can be read out of it; we
stopped editing the source files to change the dungeon way back during the first week, when we introduced the data
file in the first place.
So, another major leap will be to put actual code (not just a pointer to a function) in objects. This code will, however,
not be written in C, for a variety of reasons. (A sufficient reason is that there's no easy way to get the C compiler to
compile scraps of source code we're reading from a data file and to attach the resultant object code to some data
structures in the already-running program.) So, we'll take the (seemingly big, but it's not that bad) step of defining our
own little miniature language for describing what objects can do, writing an interpreter (not a compiler) which reads
and implements this language, and then writing the code scraps for objects in the little language.
Therefore, we'll add another field to struct object, right next to the func field we added last week:
char *script;
For objects which contain these new, ``scripted'' functions, func will be a pointer to the script-interpreting function
(one function, the same function, for all objects which are scripted), and the script field will contain the text of the
script to be interpreted.
First, a description of the little language. It is optimized for the kinds of things that we want our object functions (at
least the simple ones) to be able to do: check attributes of the tool or direct object, set attributes of the tool or direct
object, and print messages. Here is a sample script:
entry "break"
chktoolqual "heavy"
csetobjstate "broken"
message "It is broken."
return 1
This is some code which might be suitable for the hammer object. The entry line says that this is code for the verb
``break''. (There can potentially be many entry lines, if a tool can be used in multiple ways.) The chktoolqual
line makes sure that the tool has the given attribute, and prints an appropriate message (and fails) if it does not. The
csetobjstate conditionally sets the state of the direct object, unless the object already has that state, in which
case it prints a message like ``The x is already broken'', and fails. Finally, the message line prints a message, and the
return line returns.
The code for the interpreter which will execute these little scripts is too large to print out, but it's in the week8
subdirectory of the distributed source code. It interfaces with the rest of the game in two places. We must arrange for
the interpreter to be called at the right time(s), but that's done already: if an object's func pointer points to the new
interpreter function, the code we added to docommand last week will call it. But we must also be able to read in a
script (from the data file) along with the rest of a description of an object. Here is a new case for parsedatafile:
else if(strcmp(av[0], "script") == 0)
{
/* XXX cumbersome un-getwords required */
int i, l = 0;
if(currentobject == NULL)
continue;
for(i = 1; i < ac; i++)
l += strlen(av[i]) + 1;
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (3 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
currentobject->script = chkmalloc(l);
*currentobject->script = '\0';
for(i = 1; i < ac; i++)
{
if(i > 1)
strcat(currentobject->script, " ");
strcat(currentobject->script, av[i]);
}
currentobject->func = interp;
}
The script is read all from one line (which is a nuisance, but this is a preliminary implementation). The long
description reading code faced the same problem; this code uses a different solution, by un-doing the action of
getwords and rebuilding a single string (in malloc'ed memory) which the script field can point to.
Integrating the interpreter code requires paying attention to a few other details. You'll need to add the prototype
extern int interp(struct actor *, struct object *, struct sentence *);
to game.h if it's not there already, and the line
objp->script = NULL;
to the newobject function in object.c. Also, for the moment, we'll have to pay attention to the numeric values
of the status codes (SUCCESS, FAILURE, CONTINUE, ERROR) we defined last week, because the simple interpreter
doesn't know how to handle the symbolic names. Instead, we'll have to say return 1 for SUCCESS, etc.
3. This game is a prime candidate for moving from C to C++. Actors and rooms are really special cases of objects:
Rooms contain objects, just like actors and container objects do, and rooms also contain actors. Rather than having
separate structures for rooms, actors, and objects, we'd like to say things like ``a room is just like an object, but it also
has exits.'' If we rigged it up right (and cleaned up a number of messes which we've perpetrated along the way
because of the fact that we hadn't been using this structure) we could have a single piece of code which would transfer
objects between rooms and actors (``take''), between actors and rooms (``drop''), between actors and (container)
objects (``put in'') and which would even transfer actors between rooms (when the actor moved from one room to
another). This function (and much of the rest of the game) could operate on generic ``objects,'' without worrying
about whether they were simple objects, actors, or rooms. Only those portions of the game specific to operating on
actors or rooms would look at the additional fields differentiating an actor or room from an object.
The notion that some data structures are extensions of others, that an ``x'' is just like a ``y'' except for some extra stuff,
is the another cornerstone (perhaps the cornerstone) of Object-Oriented Programming. The formal term for this idea is
inheritance, and the data structures are usually spoken of as being objects, which is precisely what the word ``object''
is doing in ``Object-Oriented Programming.'' (It's again more than an coincidence, but rather an indication that our
game is a perfect application of C++ or another object-oriented language, that we were already calling our
fundamental data structures ``objects.'')
The changes to the game to let it use C++ are extensive, and I'm not going to try to present them all here. (I've
completed many of the changes, however, and you can find them in subdirectories of the week8 directory associated
with the on-line web pages for this class.) The basic idea is that our old struct object is no longer just a
structure; it is a class:
class object
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (4 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
{
public:
object(const char *);
~object(void);
char name[MAXNAME];
struct list *attrs;
object *contents;
object *container;
object *lnext;
char *desc;
};
(Among other things, all objects now contain pointers back to their containers, analogous to the way struct
actor used to contain a pointer back to its location.)
We write a constructor for new instances of this class, replacing our old newobject function in object.c:
object::object(const char *newname)
{
strcpy(name, newname);
lnext = NULL;
attrs = NULL;
contents = NULL;
container = NULL;
desc = NULL;
}
Wherever we used to write something like
objp = newobject(name);
we instead use the C++ new operator:
objp = new object(name);
(Actually, the only place we called newobject was in readdatafile in io.c, when setting
currentobject. It is entirely a coincidence, though not a surprising one, that the use of the C++ new operator
here looks so eerily similar to the old newobject call.)
Going back to game.h, we define the actor and room classes as being derived from class object:
class actor : public object
{
public:
actor();
};
class room : public object
{
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (5 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
public:
room(const char *);
room *exits[NEXITS];
struct room *next;
};
These say that an actor is just like an object (we don't actually have any actor-specific information at the moment),
and that a room is just like an object except that it has an exits array and an extra next pointer so that we can construct
a list of all rooms.
Here is the new, general-purpose object-transferring function, for object.c:
/* transfer object from one general container to another */
transferobject(object *objp, object *newcontainer)
{
object *lp;
object *prevlp = NULL;
if(objp->container != NULL)
{
object *oldc = objp->container;
for(lp = oldc->contents; lp != NULL; lp = lp->lnext)
{
if(lp == objp)
/* found it */
{
/* splice out of old container's list */
if(lp == oldc->contents)
/* head of list */
oldc->contents = lp->lnext;
else
prevlp->lnext = lp->lnext;
break;
}
prevlp = lp;
}
}
/* splice into new container's list */
if(newcontainer != NULL)
{
objp->lnext = newcontainer->contents;
newcontainer->contents = objp;
}
objp->container = newcontainer;
return TRUE;
}
Notice that the parameters are declared as being of type ``object *''. In C++, every structure and class you declare
has its tag name implicitly defined as a typedef, so that the keyword struct or class is no longer needed in later
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (6 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
declarations. (Theoretically, we should seek out every instance of struct object in the game and replace them
with object or perhaps class object, and similarly for every struct actor and struct room. The
compiler I'm using, the C++ version of the GNU C Compiler, namely g++, doesn't seem to be complaining about
stray struct keywords referring to what I've actually redefined as classes, but I'm not sure what the formal rules of
C++ say.)
In any case, even though transferobject looks like it's defined only for use on objects, since actors and rooms
are now also objects, we can also use transferobject to move objects to and from the actor, and even to move
the actor between rooms. That is, we can rewrite the other transfer functions from object.c and rooms.c very
simply:
takeobject(actor *actp, object *objp)
{
return transferobject(objp, actp);
}
dropobject(actor *actp, object *objp)
{
return transferobject(objp, actp->container);
}
putobject(actor *actp, object *objp, object *container)
{
return transferobject(objp, container);
}
int
gotoroom(actor *actor, room *room)
{
return transferobject(actor, room);
}
4. Another improvement which you might be interested in (and another sweeping one) would be to rewrite the game so
that several people could play it at once, over the network. Instead of one instance of struct actor, representing
one player sitting at one keyboard and viewing output on one screen, we could have arbitrarily many actor
structures (just as we now have arbitrarily many objects and rooms), with each actor representing a player sitting
somewhere on the network, typing input and receiving output over a network connection.
The changes to make a multiplayer version of the game are actually rather straightforward and self-contained, with
the glaring exception of the fact that everywhere we used to call printf to print some text to the user's screen, we
must instead call some special output function of our own which knows how to send the text to the network
connection of the appropriate player.
To represent the network connection, we'll add a file descriptor to the actor structure. If we're using Unix-style
networking, the file descriptor will just be an integer:
int remotefd;
If we've gone over to the C++ style of doing things, this means that class actor now does have something to
distinguish it from a plain object:
class actor : public object
http://www.eskimo.com/~scs/cclass/asgn.int/PS8.html (7 of 8) [22/07/2003 5:38:42 PM]
Assignment #8
{
public:
actor();
int remotefd;
struct actor *next;
};
(It turns out we're also going to need a list of all players, just like we need a list of all rooms.)
Then, using what we learned about variable-length argument lists in chapter 25 of the class notes, we can write an
output function which is like printf except that it writes the message to a particular player's network connection:
void output(struct actor *actp, char *msg, ...)
{
char tmpbuf[200];
/* XXX */
va_list argp;
va_start(argp, msg);
vsprintf(tmpbuf, msg, argp);
va_end(argp);
write(actp->remotefd, tmpbuf, strlen(tmpbuf));
}
We call vsprintf to ``print'' the printf-style message to a temporary string buffer, then use the low-level Unix
write function to write the string to the network connection. (This function has a significant limitation as written: if
a single message is ever more than 200 characters long, the tmpbuf array will overflow, with results which might
range from annoying to catastrophic. It's unfortunately tricky to this sort of thing right; the comment /* XXX */ is a
reminder that the tmpbuf array with its fixed size of 200 is a weakness in the program.)
The rest of the code for the multiplayer version of the game is too bulky to include here, but I've placed it in
subdirectories of the week8 directory, too. See the README file there for more information.
C Programming
Essays
This is a (somewhat random) collection of some of the longer essays I've posted to Usenet over the years,
along with a few similarly long e-mail messages.
On correctness:
comp.lang.c, 1995-08-17
comp.lang.c, 1996-07-25
comp.lang.c, 1996-07-28
comp.lang.c, 2001-05-12
On undefined behavior:
comp.lang.c, 1995-03-11
comp.lang.c, 1995-03-21
comp.lang.c, 1998-11-05
On strings in C, and assigning them, and the choice between char * and char [] for holding them:
comp.lang.c, 1998-09-26
comp.lang.c, 1999-01-18
On the eternal question of whether "void main()" is acceptable, or whether main() must be
declared as returning int:
comp.lang.c, 1996-08-23
comp.lang.c, 1999-03-01
comp.unix.wizards/comp.lang.c, 1989-06-04
comp.lang.c, 1992-07-14
mail message, 1993-03-07
Essays
I probably won't always keep this index up-to-date, so if you like, you can "click here" for a guaranteed
current list of files in this directory.
Correctness
[This article was originally posted on August 17, 1995, in the midst of a thread on whether it's a good
idea to use void main() and the like.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: Re: Why the warning???
Message-ID: <DDGMr2.My3@eskimo.com>
X-Scs-References: <199508171432.HAA14633@mail.eskimo.com>
References: <00001a80+00003293@msn.com> <dqua.808547526@dqua>
<40s7l2$mcr@garuda.csulb.edu>
Date: Thu, 17 Aug 1995 14:54:37 GMT
Earlier in the thread, someone expressed a popular sentiment: "I use what works." Raise your hand if
you've ever said this. (My hand's up.) Now raise your hand if you've ever spent hours going through
some code weeding out gratuitous nonportabilities so that you could get it to compile under a compiler or
operating system other than the original author used. (My hand's still up.)
There's always a tradeoff to be made between portability and convenience. Almost no code is truly
portable, even code that tries very hard to be, so it's reasonable to accept certain nonportabilities in one's
code, in the interests of expediency. But unilaterally saying "I use what works" is an extremely risky
attitude. It's one thing to make an informed decision: to weigh the costs of being strictly portable against
the risks of using something nonportable. It's quite another thing to declare that you can't even be
bothered to think about the tradeoffs, and to throw yourself and your code (and perhaps your company) at
the mercy of future compilers you'll want to use.
Ask yourself: how long do you expect your code to last? If it's a quick hack that you don't expect to use
past next week, then most of this is moot. But if it's production program, a product, that you or someone
depends on or makes money on, you probably hope (you ought to hope) that it will be successful and
long-lived. And it's likely, even though you can't imagine it today, that some day you'll find yourself
wanting to compile it under some completely different compiler, or for some completely different
machine or operating system. And, when that day comes, the more your attitude during development has
been "I use what works," the more miserable the porting task will be.
When, for the platform and compiler you're using today, there's no alternative to a given bit of
nonportability (for instance, some operating-system-dependent interaction which C doesn't define),
there's no point spending lots of time desperately trying to make it portable. You can spend the time
porting that bit of the code when and if the time comes. (You might spend a bit of time today making
sure that the nonportable bit is cleanly isolated.)
When, on the other hand, there's a perfectly valid, equally easy, portable alternative, such as using int
main() in preference to void main(), or /* */ comments in preference to // comments, it really
seems foolish not to avail yourself of as many Standards as you can.
Correctness
You can't imagine what kind of compiler you might suddenly find yourself wanting to use a year from
now. Neither can I. But one thing we do know is that if it is a C compiler, it is extremely likely to comply
as closely as it can to the ANSI/ISO C Standard. So the more closely we can make the code we write
today conform to that Standard, the fewer problems we are likely to have down the road.
void main() and // comments aren't perfect examples, because // comments (for no good reason
that I know of) are likely to be in the next major revision of the C Standard, and void main() works
under most compilers and is likely to remain so (even though it doesn't have to) if for no other reason
than that so terribly much code uses it. One could, I suppose, make an informed decision to keep using
those two, if one really thought they were preferable to the strictly-portable alternatives. But in the more
general case -- for the other errors and nonportabilities your code is all too likely to contain if all you care
about is that it works today -- someone, someday, is likely to curse you if you've made their life
miserable, or to sing your praises if you've expended a modicum of thought and effort towards giving
your code a fighting chance of being portable down the road.
Steve Summit
scs@eskimo.com
[This article was originally posted on July 26, 1996, in the midst of a discussion on precedence and order of
evaluation. I have corrected one or two errors which appeared in the original posting.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: precedence vs. order of evaluation (was: HELP: Comma operator)
Message-ID: <1996Jul25.1157.scs.0001@eskimo.com>
Lines: 434
References: <BRAD.96Jul23005938@rosalyn.cs.washington.edu>
<01bb78bb$62355360$87ee6fce@timpent.airshields.com>
<TANMOY.96Jul23132821@qcd.lanl.gov>
<01bb796a$4e873a40$87ee6fce@timpent.airshields.com>
<01bb798a$11913f80$87ee6fce@timpent.airshields.com>
Date: Thu, 25 Jul 1996 18:57:00 GMT
Tim Behrendsen (tim@airshields.com) wrote:
>>>
>>>
>>>
>>>
I think I'm going to have to disagree with you; what do you think
side effects are? The assignment operator is just an operator with
it's own precedence. Side effects come into play based on the order
of evaluation, which is *purely* defined by the precedence rules.
This is a common misconception, but it's quite false. Order of evaluation is affected by much more than
"precedence rules"; precedence affects order of evaluation much less than many people think.
> There is only one thing that is relevent when you are talking about
> expression evaluation, and that is the precedence rules. Everything else
> falls out from those.
There's the misconception again. Order of evaluation does not "fall completely out of" precedence rules. It's true
that precedence seems to have something to do with expression evaluation; in the expression
a + b * c
we frequently say things like, "The multiplication happens before the addition, because * has higher precedence
than +." But it turns out this is a risky thing to say; because it gives the listener the impression that precedence
dictates order of evaluation, which as we'll see it does not. In my classes, I try desperately not to use the words
"happens before" when explaining operator precedence, although I'm afraid I rarely succeed.
Let's imagine that we had some kind of omniscient processor, which could look at an expression and instantly give
its value, without doing step-by-step evaluations. If we gave it the expression
1 + 2 * 3
it would give us the result 7. Precedence tells us why it gave us that answer and not 9. That is, even if thoughts of
"order of evaluation" never enter our heads, precedence is an important and independent concept, because it tells
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (1 of 8) [22/07/2003 5:38:54 PM]
*
/ \
2
Once it has determined the parse tree, the compiler sets about emitting instructions (in some order) which will
implement that parse tree. The shape of the tree may determine the ordering of some of the instructions, and
precedence affects the shape of the tree. But the order of some operations may not even be strictly determined by
the shape of the tree, let alone the precedence. So while precedence certainly has an influence on order of
evaluation, precedence does not "determine" order of evaluation. I'd say that precedence is about 40% related to
order of evaluation -- order of evaluation depends about 40% on precedence, and 60% on other things. (This 40%
figure is of course meaningless; it's just a number I pulled out of the air that "sounds right" [Barry, 1994]. The
point I'm trying to make is that precedence is not entirely disconnected from order of evaluation, but the
connection is nowhere near 100%.)
We've seen how precedence can affect or influence order of evaluation. Now let's start looking at all of the ways it
does not, or, stated another way, at all of the ways that order of evaluation is determined by things other than
precedence. (As we'll see, many of the "other things" turn out to be the whim of the compiler.)
Most of you have seen this example before, but I'll drag it out again. Suppose we have an expression containing
three function calls:
f() + g() * h()
Now, the result of calling g() will be multiplied by the result of calling h() "before" that product is added to the
result of calling f(), and indeed, precedence tells us that. But, unless we have multiple parallel processors, those
three functions f(), g(), and h() are going to be called in some order. What order will they be called in?
I don't know what order they'll be called in, and you don't know what order they'll be called in. Precedence doesn't
tell us, and in fact nothing in K&R, the ANSI/ISO C Standard, or Dan Streetmentioner's Boffo C Primer Triple
Plus will tell us. (C: The Complete Reference might tell us, but that's another story.) There's absolutely nothing
preventing the compiler from calling f() first, even though its result will be needed "last."
If you have any doubts about this, I encourage you to compile and run this little program:
#include <stdio.h>
main() { printf("%d\n", f() + g() * h()); return 0; }
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (2 of 8) [22/07/2003 5:38:54 PM]
In ANSI/ISO Standard C, when you care about the order in which things happen, you must take care to ensure that
order by using "sequence points." Sequence points have nothing to do with precedence, and only partly to do with
expressions; in my head they're up there next to statements. In general, when you want to control the order in
which two things happen, you make the two things separate statements, and you put one statement after the other
in the order you desire, perhaps using control flow constructs (e.g. if statements, loops, etc.) to control whether a
statement gets executed at all, or how many times. Indeed, "the end of an expression statement" is essentially one
of the defined sequence points in Standard C.
K&R C didn't have "sequence points"; Kernighan and Ritchie said that "explicit temporary variables can be used to
force a particular order of evaluation" and "intermediate results can be stored in temporary variables to ensure a
particular sequence" [K&R1 p. 49; K&R2 p. 53]. In fact, although Standard C gives us sequence points, it doesn't
give us a tool to get a hold on f() in f() + g() * h(); if we needed f() called first, we'd have to write
t = f(); t + g() * h()
or
t = f(), t + g() * h()
The most obvious sequence points in Standard C are the ones at the ends of full expressions in expression
statements, that is, as exemplified by the semicolon in the first line just above. Other sequence points are at the
comma operator (as in the second line just above), at the && and || operators, in the ?: operator, and just before a
function call. (I do not claim that this is an exhaustive list.) On those (hopefully rare) occasions when you care
about the ordering of operations within a single expression, you have to make sure that the expression contains one
or more sequence points, and in places which will in fact constrain the order to the one you want. Most of the time,
though, when you have a troublesome expression with too many ordering dependencies, the right response is the
same as that of the doctor in the old joke about the patient who complains that his hand hurts if he shakes it in a
certain way: "Well, don't do that, then." Break the expression up into separate statements, using temporary
variables if you have to, and you'll be much happier.
(I confess that this advice may be easier to give than to apply. In the article that started this thread, the poster
thought that the insertion of a sequence point or two, in the form of comma operators, should have constrained the
order of evaluation sufficiently, but unfortunately it did not, because there still weren't enough sequence points. It
didn't help that the poster thought that full parenthesization should have resolved any remaining evaluation-order
problems.)
If the f() + g() * h() example seemed artificial and hence unconvincing, let's look at a morphologically
identical but eminently realistic example. Suppose we're reading a two-byte integer from a binary data file in bigendian (most significant byte first) order. We could call fread() and then swap bytes if necessary, but that's a
cheesy solution, because if (as is conventional) we implement the "if necessary" test with an #ifdef, we've
condemned everyone who ever compiles our program to choose an appropriate setting for the #ifdef macro.
Instead, let's call getc() twice, to read the first and then the second byte, and combine them like this:
i = (firstbyte << 8) | secondbyte;
(Remember, the first byte we read is the most-significant byte, and the second byte is the least-significant.) So
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (4 of 8) [22/07/2003 5:38:54 PM]
/* WRONG */
? No! We could not! We do not know which call to getc() will happen first, and we very definitely care, because
the order of those two calls is precisely what will determine whether we read the integer in big-endian or littleendian order. So, instead, we should write
i = getc(ifp) << 8;
i |= getc(ifp);
/* MSB */
/* LSB */
By writing this as two separate statements, we get a sequence point, which ensures the order of evaluation we
need. If, on the other hand, we wanted to read in little-endian order, we'd write
i = getc(ifp);
i |= getc(ifp) << 8;
/* LSB */
/* MSB */
/* "little-endian", WRONG */
or
Both of these seem to depend (and, in fact, would depend, if we wrote them) on a left-to-right ordering of the
actual calls to getc(), which is not guaranteed. (And yes, I know that getc() is usually implemented as a
macro, which means that both of these last two expressions would in fact expand to big, complicated expressions
without any necessary function calls, but they would still not have any internal sequence points which would force
the two getc's to happen in a well-defined order.)
***
Unless you want to be a language lawyer, you don't have to memorize the definition of a sequence point or the list
of sequence points which Standard C guarantees, or even use the words "sequence point" in casual conversation.
(As you saw above, I can't remember the list off the top of my head with confidence, either, and that's because I'm
not usually interested in being a language lawyer.) What I recommend you do, instead, is develop a sense of
expressions that are "clean" versus expressions that are "ugly," and shy away from the ugly ones. ("Well, don't do
that, then.") Ugly expressions are those that have multiple assignments in them, or comma operators, or multiple
modifications of the same object, or which are so complicated and hard to understand that all the king's horses and
all of comp.std.c require at least a week to figure out what they're guaranteed to do (or not). The cleanest
expressions are those that calculate a single value, and perhaps assign it to a single location, without caring about
the order in which various sub-operations (such as interior function calls) take place.
In between the cleanest expressions and those that are unremittingly ugly are several classes of mildly-tricky
expressions which are well-defined and are useful and are commonly used and which I'm not trying to discourage
you from using. It's easier to present these by example, as exceptions to my qualifications above of ugly
expressions as those containing comma operators, ordering dependencies, or multiple assignments.
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (5 of 8) [22/07/2003 5:38:54 PM]
1. It's perfectly okay to depend on precedence, as long as you understand what it does and doesn't guarantee
you. Precedence tells you that in the expression a + b * c, the left-hand operand of the * operator is b,
not a + b. If you insist, precedence tells you that in that expression, "the multiplication happens before the
addition." But precedence does not tell you what order the function calls would happen in in f() + g()
* h(), and precedence would not give meaning to the expression
i++ + j * i++
/* XXX WRONG */
(In particular, you cannot say that since the multiplication happens "first," the second i++ must happen
before the first.) This last expression is, you guessed it, unremittingly ugly.
2. You can use the &&; and || operators to guarantee that thing B happens after thing A, and not at all if thing
A tells the whole story. Both
n > 0 && sum / n > 0
and
p == NULL || *p == '\0'
are perfectly valid, perfectly acceptable expressions.
3. You can use the comma operator when you have two (or more) things to do in the first or third controlling
expressions of a for() loop. There are precious few other realistic opportunities for using comma
operators; chances are, if you find yourself using comma operators anywhere else, you're doing something
tricky and are skirting the edge of ugliness. (It's no coincidence that Java, I gather, doesn't allow comma
operators anywhere but the first and third expressions of a for loop.)
4. You can use multiple ++ and -- operators in a single expression, perhaps in conjunction with an
assignment operator, as long as you're certain that the several things being modified are all distinct, and that
you aren't ever in a position of using something that you've already modified. The expressions
a[i++] = 0
and
*p++ = 0
are perfectly acceptable, because the things being modified, i and a[i] in the first expression and p and
*p in the second, are probably distinct. For the same reason, a[0] = i++ is okay, too. Expressions like
a[i++] = b[j++]
and
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (6 of 8) [22/07/2003 5:38:54 PM]
*p++ = *q++
are also okay, because i, j, a[i], and b[j], and p, q, *p, and *q, are probably all distinct. Finally, the
old standby
i = i + 1
is perfectly acceptable, because although it both uses and modifies i, it demonstrably uses the old value to
compute the new value.
On the other hand, the expressions
i++ * i++
/* XXX WRONG */
i = i++
/* XXX WRONG */
and
are no good, among other things because they modify i twice. And
a[i] = i++
/* XXX WRONG */
a[i++] = i
/* XXX WRONG */
and
are both wrong, because there's no telling whether i is used before it's modified, or vice versa. (If you think
that either of these somehow does guarantee whether the plain i on one side means the value of i before or
after the incrementation performed by the i++ on the other, think again.)
5. Finally, it's perfectly acceptable to write something like
i = j = 0
even though it contains multiple assignments, because again, it's an unambiguous idiom and it's clear that
two different variables are being assigned to.
Now, I'm acutely aware that these guidelines (or any like them) can never be complete, because there are an
infinite number of expressions out there, and some of them are clean and legal even though my "rules"
might seem to disallow them and my exceptions don't cover them, and others of them are unremittingly
ugly even though my "rules" don't disallow them or my exceptions seem to allow them. Somehow,
successful programmers learn to calibrate their own inner aesthetic such that the expressions which aren't
"ugly" are the ones that (a) are well-defined, and (b) are sufficient for writing real programs. I encourage all
of you to nurture such an aesthetic as well, and to learn to write programs without "ugly" expressions, rather
http://www.eskimo.com/~scs/readings/precvsooe.960725.html (7 of 8) [22/07/2003 5:38:54 PM]
than wasting time trying to figure out what they do (and whether they're guaranteed to do it). If you've got
an example of a well-defined expression which you feel is acceptable but which the guidelines I've
presented seem to discourage, or of an ill-defined expression which these guidelines don't seem to
discourage, I encourage you to post it here or mail it to me; I'd love to expand and refine these guidelines.
(Equally importantly, I'd love to figure out how to word them so that they're useful and meaningful to
someone who hasn't already developed a sense for good vs. ugly expressions.)
Also, if you're still with me, I'd like to repeat the main moral of this article, which is that "precedence" is
not the same thing as "order of evaluation." The one certainly has something to do with the other (sort of
like pointers and arrays), but it's a delicate relationship to describe precisely, so unless you care to figure out
the whole story (and figure out what number I should have used above instead of 40%), you might actually
be better off if you remember the statement that "Precedence has nothing to do with order of evaluation."
This statement isn't true, of course, but it's much closer to the truth than "Precedence has everything to do
with order of evaluation," /* XXX WRONG */
which is not only false, but quite misleading.
Finally, for anyone who is still unsure on any of this, I urge you to read section 3 of the FAQ list, or chapter
3 of the FAQ list book, because there's more on this in there. (Of course, it's written by the same mope
you're reading now, so if you don't like something I've written here, I can't guarantee that you'll find relief
in the FAQ list.)
Steve Summit
scs@eskimo.com
Expressions
3. Expressions
3.1 Why doesn't the code "a[i] = i++;" work?
3.2 Under my compiler, the code "int i = 7; printf("%d\n", i++ * i++);" prints 49.
Regardless of the order of evaluation, shouldn't it print 56?
3.3 How could the code "int i = 3; i = i++;" ever give 7?
3.4 Don't precedence and parentheses dictate order of evaluation?
3.5 But what about the && and || operators?
3.8 What's a ``sequence point''?
3.9 So given "a[i] = i++;" we don't know which cell of a[] gets written to, but i does get
incremented by one.
3.12 If I'm not using the value of the expression, should I use i++ or ++i to increment a variable?
3.14 Why doesn't the code "int a = 1000, b = 1000; long int c = a * b;" work?
3.16 Can I use ?: on the left-hand side of an assignment expression?
top
Question 3.1
Question 3.1
Why doesn't this code:
a[i] = i++;
work?
The subexpression i++ causes a side effect--it modifies i's value--which leads to undefined behavior
since i is also referenced elsewhere in the same expression. (Note that although the language in K&R
suggests that the behavior of this expression is unspecified, the C Standard makes the stronger statement
that it is undefined--see question 11.33.)
References: K&R1 Sec. 2.12
K&R2 Sec. 2.12
ANSI Sec. 3.3
ISO Sec. 6.3
Question 11.33
Question 11.33
People seem to make a point of distinguishing between implementation-defined, unspecified, and
undefined behavior. What's the difference?
Briefly: implementation-defined means that an implementation must choose some behavior and
document it. Unspecified means that an implementation should choose some behavior, but need not
document it. Undefined means that absolutely anything might happen. In no case does the Standard
impose requirements; in the first two cases it occasionally suggests (and may require a choice from
among) a small set of likely behaviors.
Note that since the Standard imposes no requirements on the behavior of a compiler faced with an
instance of undefined behavior, the compiler can do absolutely anything. In particular, there is no
guarantee that the rest of the program will perform normally. It's perilous to think that you can tolerate
undefined behavior in a program; see question 3.2 for a relatively simple example.
If you're interested in writing portable code, you can ignore the distinctions, as you'll want to avoid code
that depends on any of the three behaviors.
See also questions 3.9, and 11.34.
References: ANSI Sec. 1.6
ISO Sec. 3.10, Sec. 3.16, Sec. 3.17
Rationale Sec. 1.6
Question 3.2
Question 3.2
Under my compiler, the code
int i = 7;
printf("%d\n", i++ * i++);
prints 49. Regardless of the order of evaluation, shouldn't it print 56?
Although the postincrement and postdecrement operators ++ and -- perform their operations after
yielding the former value, the implication of ``after'' is often misunderstood. It is not guaranteed that an
increment or decrement is performed immediately after giving up the previous value and before any
other part of the expression is evaluated. It is merely guaranteed that the update will be performed
sometime before the expression is considered ``finished'' (before the next ``sequence point,'' in ANSI C's
terminology; see question 3.8). In the example, the compiler chose to multiply the previous value by
itself and to perform both increments afterwards.
The behavior of code which contains multiple, ambiguous side effects has always been undefined.
(Loosely speaking, by ``multiple, ambiguous side effects'' we mean any combination of ++, --, =, +=, =, etc. in a single expression which causes the same object either to be modified twice or modified and
then inspected. This is a rough definition; see question 3.8 for a precise one, and question 11.33 for the
meaning of ``undefined.'') Don't even try to find out how your compiler implements such things (contrary
to the ill-advised exercises in many C textbooks); as K&R wisely point out, ``if you don't know how they
are done on various machines, that innocence may help to protect you.''
References: K&R1 Sec. 2.12 p. 50
K&R2 Sec. 2.12 p. 54
ANSI Sec. 3.3
ISO Sec. 6.3
CT&P Sec. 3.7 p. 47
PCS Sec. 9.5 pp. 120-1
Question 3.8
Question 3.8
How can I understand these complex expressions? What's a ``sequence point''?
A sequence point is the point (at the end of a full expression, or at the ||, &&, ?:, or comma operators,
or just before a function call) at which the dust has settled and all side effects are guaranteed to be
complete. The ANSI/ISO C Standard states that
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be accessed only to determine the value to be stored.
The second sentence can be difficult to understand. It says that if an object is written to within a full
expression, any and all accesses to it within the same expression must be for the purposes of computing
the value to be written. This rule effectively constrains legal expressions to those in which the accesses
demonstrably precede the modification.
See also question 3.9.
References: ANSI Sec. 2.1.2.3, Sec. 3.3, Appendix B
ISO Sec. 5.1.2.3, Sec. 6.3, Annex C
Rationale Sec. 2.1.2.3
H&S Sec. 7.12.1 pp. 228-9
Question 3.9
Question 3.9
So given
a[i] = i++;
we don't know which cell of a[] gets written to, but i does get incremented by one.
No. Once an expression or program becomes undefined, all aspects of it become undefined. See
questions 3.2, 3.3, 11.33, and 11.35.
Question 3.3
Question 3.3
I've experimented with the code
int i = 3;
i = i++;
on several compilers. Some gave i the value 3, some gave 4, but one gave 7. I know the behavior is
undefined, but how could it give 7?
Undefined behavior means anything can happen. See questions 3.9 and 11.33. (Also, note that neither
i++ nor ++i is the same as i+1. If you want to increment i, use i=i+1 or i++ or ++i, not some
combination. See also question 3.12.)
Question 11.34
Question 11.34
I'm appalled that the ANSI Standard leaves so many issues undefined. Isn't a Standard's whole job to
standardize these things?
It has always been a characteristic of C that certain constructs behaved in whatever way a particular
compiler or a particular piece of hardware chose to implement them. This deliberate imprecision often
allows compilers to generate more efficient code for common cases, without having to burden all
programs with extra code to assure well-defined behavior of cases deemed to be less reasonable.
Therefore, the Standard is simply codifying existing practice.
A programming language standard can be thought of as a treaty between the language user and the
compiler implementor. Parts of that treaty consist of features which the compiler implementor agrees to
provide, and which the user may assume will be available. Other parts, however, consist of rules which
the user agrees to follow and which the implementor may assume will be followed. As long as both sides
uphold their guarantees, programs have a fighting chance of working correctly. If either side reneges on
any of its commitments, nothing is guaranteed to work.
See also question 11.35.
References: Rationale Sec. 1.1
ANSI/ISO Standard C
ANSI/ISO Standard C
top
Question 12.9
Question 12.9
Someone told me it was wrong to use %lf with printf. How can printf use %f for type double, if
scanf requires %lf?
It's true that printf's %f specifier works with both float and double arguments. Due to the
``default argument promotions'' (which apply in variable-length argument lists such as printf's,
whether or not prototypes are in scope), values of type float are promoted to double, and printf
therefore sees only doubles. See also questions 12.13 and 15.2.
References: K&R1 Sec. 7.3 pp. 145-47, Sec. 7.4 pp. 147-50
K&R2 Sec. 7.2 pp. 153-44, Sec. 7.4 pp. 157-59
ANSI Sec. 4.9.6.1, Sec. 4.9.6.2
ISO Sec. 7.9.6.1, Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64, Sec. 15.11 pp. 366-78
CT&P Sec. A.1 pp. 121-33
Question 12.13
Question 12.13
Why doesn't this code:
double d;
scanf("%f", &d);
work?
Unlike printf, scanf uses %lf for values of type double, and %f for float. See also question
12.9.
Question 12.12
Question 12.12
Why doesn't the call scanf("%d", i) work?
The arguments you pass to scanf must always be pointers. To fix the fragment above, change it to
scanf("%d", &i) .
Question 12.11
Question 12.11
How can I print numbers with commas separating the thousands?
What about currency formatted numbers?
The routines in <locale.h> begin to provide some support for these operations, but there is no
standard routine for doing either task. (The only thing printf does in response to a custom locale
setting is to change its decimal-point character.)
References: ANSI Sec. 4.4
ISO Sec. 7.4
H&S Sec. 11.6 pp. 301-4
Question 12.10
Question 12.10
How can I implement a variable field width with printf? That is, instead of %8d, I want the width to
be specified at run time.
printf("%*d", width, n) will do just what you want. See also question 12.15.
References: K&R1 Sec. 7.3
K&R2 Sec. 7.2
ANSI Sec. 4.9.6.1
ISO Sec. 7.9.6.1
H&S Sec. 15.11.6
CT&P Sec. A.1
Question 12.15
Question 12.15
How can I specify a variable width in a scanf format string?
You can't; an asterisk in a scanf format string means to suppress assignment. You may be able to use
ANSI stringizing and string concatenation to accomplish about the same thing, or to construct a scanf
format string on-the-fly.
Question 12.17
Question 12.17
When I read numbers from the keyboard with scanf "%d\n", it seems to hang until I type one extra
line of input.
Perhaps surprisingly, \n in a scanf format string does not mean to expect a newline, but rather to read
and discard characters as long as each is a whitespace character. See also question 12.20.
References: K&R2 Sec. B1.3 pp. 245-6
ANSI Sec. 4.9.6.2
ISO Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64
Question 12.20
Question 12.20
Why does everyone say not to use scanf? What should I use instead?
scanf has a number of problems--see questions 12.17, 12.18, and 12.19. Also, its %s format has the
same problem that gets() has (see question 12.23)--it's hard to guarantee that the receiving buffer
won't overflow.
More generally, scanf is designed for relatively structured, formatted input (its name is in fact derived
from ``scan formatted''). If you pay attention, it will tell you whether it succeeded or failed, but it can tell
you only approximately where it failed, and not at all how or why. It's nearly impossible to do decent
error recovery with scanf; usually it's far easier to read entire lines (with fgets or the like), then
interpret them, either using sscanf or some other techniques. (Routines like strtol, strtok, and
atoi are often useful; see also question 13.6.) If you do use sscanf, don't forget to check the return
value to make sure that the expected number of items were found.
References: K&R2 Sec. 7.4 p. 159
Question 12.18
Question 12.18
I'm reading a number with scanf %d and then a string with gets(), but the compiler seems to be
skipping the call to gets()!
scanf %d won't consume a trailing newline. If the input number is immediately followed by a newline,
that newline will immediately satisfy the gets().
As a general rule, you shouldn't try to interlace calls to scanf with calls to gets() (or any other input
routines); scanf's peculiar treatment of newlines almost always leads to trouble. Either use scanf to
read everything or nothing.
See also questions 12.20 and 12.23.
References: ANSI Sec. 4.9.6.2
ISO Sec. 7.9.6.2
H&S Sec. 15.8 pp. 357-64
Question 12.23
Question 12.23
Why does everyone say not to use gets()?
Unlike fgets(), gets() cannot be told the size of the buffer it's to read into, so it cannot be
prevented from overflowing that buffer. As a general rule, always use fgets(). See question 7.1 for a
code fragment illustrating the replacement of gets() with fgets().
References: Rationale Sec. 4.9.7.2
H&S Sec. 15.7 p. 356
Question 7.1
Question 7.1
Why doesn't this fragment work?
char *answer;
printf("Type something:\n");
gets(answer);
printf("You typed \"%s\"\n", answer);
The pointer variable answer, which is handed to gets() as the location into which the response
should be stored, has not been set to point to any valid storage. That is, we cannot say where the pointer
answer points. (Since local variables are not initialized, and typically contain garbage, it is not even
guaranteed that answer starts out as a null pointer. See questions 1.30 and 5.1.)
The simplest way to correct the question-asking program is to use a local array, instead of a pointer, and
let the compiler worry about allocation:
#include <stdio.h>
#include <string.h>
char answer[100], *p;
printf("Type something:\n");
fgets(answer, sizeof answer, stdin);
if((p = strchr(answer, '\n')) != NULL)
*p = '\0';
printf("You typed \"%s\"\n", answer);
This example also uses fgets() instead of gets(), so that the end of the array cannot be overwritten.
(See question 12.23. Unfortunately for this example, fgets() does not automatically delete the trailing
\n, gets() would.) It would also be possible to use malloc() to allocate the answer buffer.
Question 7.2
Question 7.2
I can't get strcat to work. I tried
char *s1 = "Hello, ";
char *s2 = "world!";
char *s3 = strcat(s1, s2);
but I got strange results.
As in question 7.1, the main problem here is that space for the concatenated result is not properly
allocated. C does not provide an automatically-managed string type. C compilers only allocate memory
for objects explicitly mentioned in the source code (in the case of ``strings,'' this includes character arrays
and string literals). The programmer must arrange for sufficient space for the results of run-time
operations such as string concatenation, typically by declaring arrays, or by calling malloc.
strcat performs no allocation; the second string is appended to the first one, in place. Therefore, one
fix would be to declare the first string as an array:
char s1[20] = "Hello, ";
Since strcat returns the value of its first argument (s1, in this case), the variable s3 is superfluous.
The original call to strcat in the question actually has two problems: the string literal pointed to by
s1, besides not being big enough for any concatenated text, is not necessarily writable at all. See
question 1.32.
References: CT&P Sec. 3.2 p. 32
Question 1.32
Question 1.32
What is the difference between these initializations?
char a[] = "string literal";
char *p = "string literal";
My program crashes if I try to assign a new value to p[i].
A string literal can be used in two slightly different ways. As an array initializer (as in the declaration of
char a[]), it specifies the initial values of the characters in that array. Anywhere else, it turns into an
unnamed, static array of characters, which may be stored in read-only memory, which is why you can't
safely modify it. In an expression context, the array is converted at once to a pointer, as usual (see section
6), so the second declaration initializes p to point to the unnamed array's first element.
(For compiling old code, some compilers have a switch controlling whether strings are writable or not.)
See also questions 1.31, 6.1, 6.2, and 6.8.
References: K&R2 Sec. 5.5 p. 104
ANSI Sec. 3.1.4, Sec. 3.5.7
ISO Sec. 6.1.4, Sec. 6.5.7
Rationale Sec. 3.1.4
H&S Sec. 2.7.4 pp. 31-2
6.20 How can I use statically- and dynamically-allocated multidimensional arrays interchangeably when
passing them to functions?
6.21 Why doesn't sizeof properly report the size of an array which is a parameter to a function?
top
Question 16.8
Question 16.8
What do ``Segmentation violation'' and ``Bus error'' mean?
These generally mean that your program tried to access memory it shouldn't have, invariably as a result
of improper pointer use. Likely causes are inadvertent use of null pointers (see also questions 5.2 and
5.20) or uninitialized, misaligned, or otherwise improperly allocated pointers (see questions 7.1 and 7.2);
corruption of the malloc arena (see question 7.19); and mismatched function arguments, especially
involving pointers; two possibilities are scanf (see question 12.12) and fprintf (make sure it
receives its first FILE * argument).
See also questions 16.3 and 16.4.
Question 6.7
Question 6.7
How can an array be an lvalue, if you can't assign to it?
Question 6.8
Question 6.8
Practically speaking, what is the difference between arrays and pointers?
Arrays automatically allocate space, but can't be relocated or resized. Pointers must be explicitly assigned
to point to allocated space (perhaps using malloc), but can be reassigned (i.e. pointed at different
objects) at will, and have many other uses besides serving as the base of blocks of memory.
Due to the so-called equivalence of arrays and pointers (see question 6.3), arrays and pointers often seem
interchangeable, and in particular a pointer to a block of memory assigned by malloc is frequently
treated (and can be referenced using []) exactly as if it were a true array. See questions 6.14 and 6.16.
(Be careful with sizeof, though.)
See also questions 1.32 and 20.14.
Question 6.14
Question 6.14
How can I set an array's size at run time?
How can I avoid fixed-sized arrays?
The equivalence between arrays and pointers (see question 6.3) allows a pointer to malloc'ed memory
to simulate an array quite effectively. After executing
#include <stdlib.h>
int *dynarray = (int *)malloc(10 * sizeof(int));
(and if the call to malloc succeeds), you can reference dynarray[i] (for i from 0 to 9) just as if
dynarray were a conventional, statically-allocated array (int a[10]). See also question 6.16.
Question 6.13
Question 6.13
How do I declare a pointer to an array?
Usually, you don't want to. When people speak casually of a pointer to an array, they usually mean a
pointer to its first element.
Instead of a pointer to an array, consider using a pointer to one of the array's elements. Arrays of type T
decay into pointers to type T (see question 6.3), which is convenient; subscripting or incrementing the
resultant pointer will access the individual members of the array. True pointers to arrays, when
subscripted or incremented, step over entire arrays, and are generally useful only when operating on
arrays of arrays, if at all. (See question 6.18.)
If you really need to declare a pointer to an entire array, use something like ``int (*ap)[N];'' where
N is the size of the array. (See also question 1.21.) If the size of the array is unknown, N can in principle
be omitted, but the resulting type, ``pointer to array of unknown size,'' is useless.
See also question 6.12.
References: ANSI Sec. 3.2.2.1
ISO Sec. 6.2.2.1
Question 1.22
Question 1.22
How can I declare a function that can return a pointer to a function of the same type? I'm building a state
machine with one function for each state, each of which returns a pointer to the function for the next
state. But I can't find a way to declare the functions.
You can't quite do it directly. Either have the function return a generic function pointer, with some
judicious casts to adjust the types as the pointers are passed around; or have it return a structure
containing only a pointer to a function returning that structure.
Question 1.25
Question 1.25
My compiler is complaining about an invalid redeclaration of a function, but I only define it once and
call it once.
Functions which are called without a declaration in scope (perhaps because the first call precedes the
function's definition) are assumed to be declared as returning int (and without any argument type
information), leading to discrepancies if the function is later declared or defined otherwise. Non-int
functions must be declared before they are called.
Another possible source of this problem is that the function has the same name as another one declared in
some header file.
See also questions 11.3 and 15.1.
References: K&R1 Sec. 4.2 p. 70
K&R2 Sec. 4.2 p. 72
ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
H&S Sec. 4.7 p. 101
Question 15.1
Question 15.1
I heard that you have to #include <stdio.h> before calling printf. Why?
Question 14.13
Question 14.13
I'm having trouble with a Turbo C program which crashes and says something like ``floating point
formats not linked.''
Some compilers for small machines, including Borland's (and Ritchie's original PDP-11 compiler), leave
out certain floating point support if it looks like it will not be needed. In particular, the non-floating-point
versions of printf and scanf save space by not including code to handle %e, %f, and %g. It happens
that Borland's heuristics for determining whether the program uses floating point are insufficient, and the
programmer must sometimes insert an extra, explicit call to a floating-point library routine to force
loading of floating-point support. (See the comp.os.msdos.programmer FAQ list for more information.)
Floating Point
top
Question 14.1
Question 14.1
When I set a float variable to, say, 3.1, why is printf printing it as 3.0999999?
Most computers use base 2 for floating-point numbers as well as for integers. In base 2, 1/1010 (that is,
1/10 decimal) is an infinitely-repeating fraction: its binary representation is 0.0001100110011... .
Depending on how carefully your compiler's binary/decimal conversion routines (such as those used by
printf) have been written, you may see discrepancies when numbers (especially low-precision
floats) not exactly representable in base 2 are assigned or read in and then printed (i.e. converted from
base 10 to base 2 and back again). See also question 14.6.
Question 14.6
Question 14.6
How do I round numbers?
Question 14.5
Question 14.5
What's a good way to check for ``close enough'' floating-point equality?
Since the absolute accuracy of floating point values varies, by definition, with their magnitude, the best
way of comparing two floating point values is to use an accuracy threshold which is relative to the
magnitude of the numbers being compared. Rather than
double a, b;
...
if(a == b)
/* WRONG */
Question 14.4
Question 14.4
My floating-point calculations are acting strangely and giving me different answers on different
machines.
Question 14.2
Question 14.2
I'm trying to take some square roots, but I'm getting crazy numbers.
Make sure that you have #included <math.h>, and correctly declared other functions returning
double. (Another library routine to be careful with is atof, which is declared in <stdlib.h>.) See
also question 14.3.
References: CT&P Sec. 4.5 pp. 65-6
Question 14.3
Question 14.3
I'm trying to do some simple trig, and I am #including <math.h>, but I keep getting ``undefined: sin''
compilation errors.
Make sure you're actually linking with the math library. For instance, under Unix, you usually need to
use the -lm option, at the end of the command line, when compiling/linking. See also questions 13.25
and 13.26.
Question 13.25
Question 13.25
I keep getting errors due to library functions being undefined, but I'm #including all the right header files.
In some cases (especially if the functions are nonstandard) you may have to explicitly ask for the correct
libraries to be searched when you link the program. See also questions 11.30, 13.26, and 14.3.
Question 11.30
Question 11.30
Why are some ANSI/ISO Standard library routines showing up as undefined, even though I've got an
ANSI compiler?
It's possible to have a compiler available which accepts ANSI syntax, but not to have ANSI-compatible
header files or run-time libraries installed. (In fact, this situation is rather common when using a nonvendor-supplied compiler such as gcc.) See also questions 11.29, 13.25, and 13.26.
Question 11.29
Question 11.29
My compiler is rejecting the simplest possible test programs, with all kinds of syntax errors.
Perhaps it is a pre-ANSI compiler, unable to accept function prototypes and the like.
See also questions 1.31, 10.9, and 11.30.
Question 1.31
Question 1.31
This code, straight out of a book, isn't compiling:
f()
{
char a[] = "Hello, world!";
}
Perhaps you have a pre-ANSI compiler, which doesn't allow initialization of ``automatic aggregates'' (i.e.
non-static local arrays, structures, and unions). As a workaround, you can make the array global or
static (if you won't need a fresh copy during any subsequent calls), or replace it with a pointer (if the
array won't be written to). (You can always initialize local char * variables to point to string literals,
but see question 1.32.) If neither of these conditions hold, you'll have to initialize the array by hand with
strcpy when f is called. See also question 11.29.
top
Question 12.2
Question 12.2
Why does the code while(!feof(infp)) { fgets(buf, MAXLINE, infp);
fputs(buf, outfp); } copy the last line twice?
In C, EOF is only indicated after an input routine has tried to read, and has reached end-of-file. (In other
words, C's I/O is not like Pascal's.) Usually, you should just check the return value of the input routine
(fgets in this case); often, you don't need to use feof at all.
References: K&R2 Sec. 7.6 p. 164
ANSI Sec. 4.9.3, Sec. 4.9.7.1, Sec. 4.9.10.2
ISO Sec. 7.9.3, Sec. 7.9.7.1, Sec. 7.9.10.2
H&S Sec. 15.14 p. 382
Question 12.4
Question 12.4
My program's prompts and intermediate output don't always show up on the screen, especially when I
pipe the output through another program.
It's best to use an explicit fflush(stdout) whenever output should definitely be visible. Several
mechanisms attempt to perform the fflush for you, at the ``right time,'' but they tend to apply only
when stdout is an interactive terminal. (See also question 12.24.)
References: ANSI Sec. 4.9.5.2
ISO Sec. 7.9.5.2
Question 12.24
Question 12.24
Why does errno contain ENOTTY after a call to printf?
Many implementations of the stdio package adjust their behavior slightly if stdout is a terminal. To
make the determination, these implementations perform some operation which happens to fail (with
ENOTTY) if stdout is not a terminal. Although the output operation goes on to complete successfully,
errno still contains ENOTTY. (Note that it is only meaningful for a program to inspect the contents of
errno after an error has been reported.)
References: ANSI Sec. 4.1.3, Sec. 4.9.10.3
ISO Sec. 7.1.4, Sec. 7.9.10.3
CT&P Sec. 5.4 p. 73
PCS Sec. 14 p. 254
Question 1.7
Question 1.7
What's the best way to declare and define global variables?
First, though there can be many declarations (and in many translation units) of a single ``global'' (strictly
speaking, ``external'') variable or function, there must be exactly one definition. (The definition is the
declaration that actually allocates space, and provides an initialization value, if any.) The best
arrangement is to place each definition in some relevant .c file, with an external declaration in a header
(``.h'') file, which is #included wherever the declaration is needed. The .c file containing the definition
should also #include the same header file, so that the compiler can check that the definition matches
the declarations.
This rule promotes a high degree of portability: it is consistent with the requirements of the ANSI C
Standard, and is also consistent with most pre-ANSI compilers and linkers. (Unix compilers and linkers
typically use a ``common model'' which allows multiple definitions, as long as at most one is initialized;
this behavior is mentioned as a ``common extension'' by the ANSI Standard, no pun intended. A few very
odd systems may require an explicit initializer to distinguish a definition from an external declaration.)
It is possible to use preprocessor tricks to arrange that a line like
DEFINE(int, i);
need only be entered once in one header file, and turned into a definition or a declaration depending on
the setting of some macro, but it's not clear if this is worth the trouble.
It's especially important to put global declarations in header files if you want the compiler to catch
inconsistent declarations for you. In particular, never place a prototype for an external function in a .c
file: it wouldn't generally be checked for consistency with the definition, and an incompatible prototype
is worse than useless.
See also questions 10.6 and 18.8.
References: K&R1 Sec. 4.5 pp. 76-7
K&R2 Sec. 4.4 pp. 80-1
ANSI Sec. 3.1.2.2, Sec. 3.7, Sec. 3.7.2, Sec. F.5.11
ISO Sec. 6.1.2.2, Sec. 6.7, Sec. 6.7.2, Sec. G.5.11
Rationale Sec. 3.1.2.2
H&S Sec. 4.8 pp. 101-104, Sec. 9.2.3 p. 267
CT&P Sec. 4.2 pp. 54-56
Question 1.7
Question 10.6
Question 10.6
I'm splitting up a program into multiple source files for the first time, and I'm wondering what to put in .c
files and what to put in .h files. (What does ``.h'' mean, anyway?)
As a general rule, you should put these things in header (.h) files:
macro definitions (preprocessor #defines)
structure, union, and enumeration declarations
typedef declarations
external function declarations (see also question 1.11)
global variable declarations
It's especially important to put a declaration or definition in a header file when it will be shared between
several other files. (In particular, never put external function prototypes in .c files. See also question 1.7.)
On the other hand, when a definition or declaration should remain private to one source file, it's fine to
leave it there.
See also questions 1.7 and 10.7.
References: K&R2 Sec. 4.5 pp. 81-2
H&S Sec. 9.2.3 p. 267
CT&P Sec. 4.6 pp. 66-7
Question 1.11
Question 1.11
What does extern mean in a function declaration?
It can be used as a stylistic hint to indicate that the function's definition is probably in another source file,
but there is no formal difference between
extern int f();
and
int f();
References: ANSI Sec. 3.1.2.2, Sec. 3.5.1
ISO Sec. 6.1.2.2, Sec. 6.5.1
Rationale Sec. 3.1.2.2
H&S Secs. 4.3,4.3.1 pp. 75-6
Question 1.12
Question 1.12
What's the auto keyword good for?
Question 20.37
Question 20.37
What was the entry keyword mentioned in K&R1?
It was reserved to allow the possibility of having functions with multiple, differently-named entry points,
la FORTRAN. It was not, to anyone's knowledge, ever implemented (nor does anyone remember what
sort of syntax might have been imagined for it). It has been withdrawn, and is not a keyword in ANSI C.
(See also question 1.12.)
References: K&R2 p. 259 Appendix C
Question 20.36
Question 20.36
When will the next International Obfuscated C Code Contest (IOCCC) be held? How can I get a copy of
the current and previous winning entries?
The contest schedule is tied to the dates of the USENIX conferences at which the winners are announced.
At the time of this writing, it is expected that the yearly contest will open in October. To obtain a current
copy of the rules and guidelines, send e-mail with the Subject: line ``send rules'' to:
{apple,pyramid,sun,uunet}!hoptoad!judges or judges@toad.com
(Note that these are not the addresses for submitting entries.)
Contest winners should be announced at the winter USENIX conference in January, and are posted to the
net sometime thereafter. Winning entries from previous years (back to 1984) are archived at ftp.uu.net
(see question 18.16) under the directory pub/ioccc/.
As a last resort, previous winners may be obtained by sending e-mail to the above address, using the
Subject: ``send YEAR winners'', where YEAR is a single four-digit year, a year range, or ``all''.
Question 20.35
Question 20.35
What is ``Duff's Device''?
It's a devastatingly deviously unrolled byte-copying loop, devised by Tom Duff while he was at
Lucasfilm. In its ``classic'' form, it looks like:
register n = (count +
switch (count % 8)
{
case 0:
do { *to =
case 7:
*to =
case 6:
*to =
case 5:
*to =
case 4:
*to =
case 3:
*to =
case 2:
*to =
case 1:
*to =
} while
}
7) / 8;
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
*from++;
(--n > 0);
where count bytes are to be copied from the array pointed to by from to the memory location pointed
to by to (which is a memory-mapped device output register, which is why to isn't incremented). It
solves the problem of handling the leftover bytes (when count isn't a multiple of 8) by interleaving a
switch statement with the loop which copies bytes 8 at a time. (Believe it or not, it is legal to have
case labels buried within blocks nested in a switch statement like this. In his announcement of the
technique to C's developers and the world, Duff noted that C's switch syntax, in particular its ``fall
through'' behavior, had long been controversial, and that ``This code forms some sort of argument in that
debate, but I'm not sure whether it's for or against.'')
Question 20.34
Question 20.34
Here's a good puzzle: how do you write a program which produces its own source code as its output?
It is actually quite difficult to write a self-reproducing program that is truly portable, due particularly to
quoting and character set difficulties.
Here is a classic example (which is normally presented on one line, although it will ``fix'' itself the first
time it's run):
char*s="char*s=%c%s%c;main(){printf(s,34,s,34);}";
main(){printf(s,34,s,34);}
(This program, like many of the genre, assumes that the double-quote character " has the value 34, as it
does in ASCII.)
Question 20.32
Question 20.32
Will 2000 be a leap year? Is (year % 4 == 0) an accurate test for leap years?
Yes and no, respectively. The full expression for the present Gregorian calendar is
year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)
See a good astronomical almanac or other reference for details. (To forestall an eternal debate: references
which claim the existence of a 4000-year rule are wrong.)
Question 20.31
Question 20.31
How can I find the day of the week given the date?
Use mktime or localtime (see questions 13.13 and 13.14, but beware of DST adjustments if
tm_hour is 0), or Zeller's congruence (see the sci.math FAQ list), or this elegant code by Tomohiko
Sakamoto:
dayofweek(y, m, d)
/* 0 = Sunday */
int y, m, d;
/* 1 <= m <= 12, y > 1752 or so */
{
static int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4};
y -= m < 3;
return (y + y/4 - y/100 + y/400 + t[m-1] + d) % 7;
}
See also questions 13.14 and 20.32.
References: ANSI Sec. 4.12.2.3
ISO Sec. 7.12.2.3
Question 13.13
Question 13.13
I know that the library routine localtime will convert a time_t into a broken-down struct tm,
and that ctime will convert a time_t to a printable string. How can I perform the inverse operations
of converting a struct tm or a string into a time_t?
Question 13.12
Question 13.12
How can I get the current date or time of day in a C program?
Just use the time, ctime, and/or localtime functions. (These routines have been around for years,
and are in the ANSI standard.) Here is a simple example:
#include <stdio.h>
#include <time.h>
main()
{
time_t now;
time(&now);
printf("It's %.24s.\n", ctime(&now));
return 0;
}
References: K&R2 Sec. B10 pp. 255-7
ANSI Sec. 4.12
ISO Sec. 7.12
H&S Sec. 18
Question 13.11
Question 13.11
How can I sort more data than will fit in memory?
You want an ``external sort,'' which you can read about in Knuth, Volume 3. The basic idea is to sort the
data in chunks (as much as will fit in memory at one time), write each sorted chunk to a temporary file,
and then merge the files. Your operating system may provide a general-purpose sort utility, and if so, you
can try invoking it from within your program: see questions 19.27 and 19.30.
References: Knuth Sec. 5.4 pp. 247-378
Sedgewick Sec. 13 pp. 177-187
Question 13.10
Question 13.10
How can I sort a linked list?
Sometimes it's easier to keep the list in order as you build it (or perhaps to use a tree instead). Algorithms
like insertion sort and merge sort lend themselves ideally to use with linked lists. If you want to use a
standard library function, you can allocate a temporary array of pointers, fill it in with pointers to all your
list nodes, call qsort, and finally rebuild the list pointers based on the sorted array.
References: Knuth Sec. 5.2.1 pp. 80-102, Sec. 5.2.4 pp. 159-168
Sedgewick Sec. 8 pp. 98-100, Sec. 12 pp. 163-175
Question 13.9
Question 13.9
Now I'm trying to sort an array of structures with qsort. My comparison function takes pointers to
structures, but the compiler complains that the function is of the wrong type for qsort. How can I cast
the function pointer to shut off the warning?
The conversions must be in the comparison function, which must be declared as accepting ``generic
pointers'' (const void *) as discussed in question 13.8 above. The comparison function might look
like
int mystructcmp(const void *p1, const void *p2)
{
const struct mystruct *sp1 = p1;
const struct mystruct *sp2 = p2;
/* now compare sp1->whatever and sp2-> ... */
(The conversions from generic pointers to struct mystruct pointers happen in the initializations
sp1 = p1 and sp2 = p2; the compiler performs the conversions implicitly since p1 and p2 are
void pointers. Explicit casts, and char * pointers, would be required under a pre-ANSI compiler. See
also question 7.7.)
If, on the other hand, you're sorting pointers to structures, you'll need indirection, as in question 13.8:
sp1 = *(struct mystruct **)p1 .
In general, it is a bad idea to insert casts just to ``shut the compiler up.'' Compiler warnings are usually
trying to tell you something, and unless you really know what you're doing, you ignore or muzzle them at
your peril. See also question 4.9.
References: ANSI Sec. 4.10.5.2
ISO Sec. 7.10.5.2
H&S Sec. 20.5 p. 419
Question 13.9
Question 13.8
Question 13.8
I'm trying to sort an array of strings with qsort, using strcmp as the comparison function, but it's not
working.
By ``array of strings'' you probably mean ``array of pointers to char.'' The arguments to qsort's
comparison function are pointers to the objects being sorted, in this case, pointers to pointers to char.
strcmp, however, accepts simple pointers to char. Therefore, strcmp can't be used directly. Write an
intermediate comparison function like this:
/* compare strings via pointers */
int pstrcmp(const void *p1, const void *p2)
{
return strcmp(*(char * const *)p1, *(char * const *)p2);
}
The comparison function's arguments are expressed as ``generic pointers,'' const void *. They are
converted back to what they ``really are'' (char **) and dereferenced, yielding char *'s which can be
passed to strcmp. (Under a pre-ANSI compiler, declare the pointer parameters as char * instead of
void *, and drop the consts.)
(Don't be misled by the discussion in K&R2 Sec. 5.11 pp. 119-20, which is not discussing the Standard
library's qsort).
References: ANSI Sec. 4.10.5.2
ISO Sec. 7.10.5.2
H&S Sec. 20.5 p. 419
Question 13.7
Question 13.7
I need some code to do regular expression and wildcard matching.
Make sure you recognize the difference between classic regular expressions (variants of which are used
in such Unix utilities as ed and grep), and filename wildcards (variants of which are used by most
operating systems).
There are a number of packages available for matching regular expressions. Most packages use a pair of
functions, one for ``compiling'' the regular expression, and one for ``executing'' it (i.e. matching strings
against it). Look for header files named <regex.h> or <regexp.h>, and functions called
regcmp/regex, regcomp/regexec, or re_comp/re_exec. (These functions may exist in a
separate regexp library.) A popular, freely-redistributable regexp package by Henry Spencer is available
from ftp.cs.toronto.edu in pub/regexp.shar.Z or in several other archives. The GNU project has a package
called rx. See also question 18.16.
Filename wildcard matching (sometimes called ``globbing'') is done in a variety of ways on different
systems. On Unix, wildcards are automatically expanded by the shell before a process is invoked, so
programs rarely have to worry about them explicitly. Under MS-DOS compilers, there is often a special
object file which can be linked in to a program to expand wildcards while argv is being built. Several
systems (including MS-DOS and VMS) provide system services for listing or opening files specified by
wildcards. Check your compiler/library documentation.
Question 13.6
Question 13.6
How can I split up a string into whitespace-separated fields?
How can I duplicate the process by which main() is handed argc and argv?
The only Standard routine available for this kind of ``tokenizing'' is strtok, although it can be tricky to
use and it may not do everything you want it to. (For instance, it does not handle quoting.)
References: K&R2 Sec. B3 p. 250
ANSI Sec. 4.11.5.8
ISO Sec. 7.11.5.8
H&S Sec. 13.7 pp. 333-4
PCS p. 178
Question 13.5
Question 13.5
Why do some versions of toupper act strangely if given an upper-case letter?
Why does some code call islower before toupper?
Older versions of toupper and tolower did not always work correctly on arguments which did not
need converting (i.e. on digits or punctuation or letters already of the desired case). In ANSI/ISO
Standard C, these functions are guaranteed to work appropriately on all character arguments.
References: ANSI Sec. 4.3.2
ISO Sec. 7.3.2
H&S Sec. 12.9 pp. 320-1
PCS p. 182
Question 13.2
Question 13.2
Why does strncpy not always place a '\0' terminator in the destination string?
strncpy was first designed to handle a now-obsolete data structure, the fixed-length, not-necessarily\0-terminated ``string.'' (A related quirk of strncpy's is that it pads short strings with multiple \0's,
out to the specified length.) strncpy is admittedly a bit cumbersome to use in other contexts, since you
must often append a '\0' to the destination string by hand. You can get around the problem by using
strncat instead of strncpy: if the destination string starts out empty, strncat does what you
probably wanted strncpy to do. Another possibility is sprintf(dest, "%.*s", n, source)
.
When arbitrary bytes (as opposed to strings) are being copied, memcpy is usually a more appropriate
routine to use than strncpy.
Question 13.1
Question 13.1
How can I convert numbers to strings (the opposite of atoi)? Is there an itoa function?
Just use sprintf. (Don't worry that sprintf may be overkill, potentially wasting run time or code
space; it works well in practice.) See the examples in the answer to question 7.5; see also question 12.21.
You can obviously use sprintf to convert long or floating-point numbers to strings as well (using
%ld or %f).
References: K&R1 Sec. 3.6 p. 60
K&R2 Sec. 3.6 p. 64
Stdio
12. Stdio
12.1 What's wrong with the code "char c; while((c = getchar()) != EOF) ..."?
12.2 Why won't the code `` while(!feof(infp)) { fgets(buf, MAXLINE, infp);
fputs(buf, outfp); } '' work?
12.4 My program's prompts and intermediate output don't always show up on the screen.
12.5 How can I read one character at a time, without waiting for the RETURN key?
12.6 How can I print a '%' character with printf?
12.9 How can printf use %f for type double, if scanf requires %lf?
12.10 How can I implement a variable field width with printf?
12.11 How can I print numbers with commas separating the thousands?
12.12 Why doesn't the call scanf("%d", i) work?
12.13 Why doesn't the code "double d; scanf("%f", &d);" work?
12.15 How can I specify a variable width in a scanf format string?
12.17 When I read numbers from the keyboard with scanf "%d\n", it seems to hang until I type one
extra line of input.
12.18 I'm reading a number with scanf %d and then a string with gets(), but the compiler seems to
be skipping the call to gets()!
12.19 I'm re-prompting the user if scanf fails, but sometimes it seems to go into an infinite loop.
12.20 Why does everyone say not to use scanf? What should I use instead?
12.21 How can I tell how much destination buffer space I'll need for an arbitrary sprintf call? How
can I avoid overflowing the destination buffer with sprintf?
Stdio
top
Question 12.5
Question 12.5
How can I read one character at a time, without waiting for the RETURN key?
Question 12.6
Question 12.6
How can I print a '%' character in a printf format string? I tried \%, but it didn't work.
Question 19.17
Question 19.17
Why can't I open a file by its explicit path? The call
fopen("c:\newdir\file.dat", "r")
is failing.
The file you actually requested--with the characters \n and \f in its name--probably doesn't exist, and
isn't what you thought you were trying to open.
In character constants and string literals, the backslash \ is an escape character, giving special meaning
to the character following it. In order for literal backslashes in a pathname to be passed through to
fopen (or any other routine) correctly, they have to be doubled, so that the first backslash in each pair
quotes the second one:
fopen("c:\\newdir\\file.dat", "r");
Alternatively, under MS-DOS, it turns out that forward slashes are also accepted as directory separators,
so you could use
fopen("c:/newdir/file.dat", "r");
(Note, by the way, that header file names mentioned in preprocessor #include directives are not string
literals, so you may not have to worry about backslashes there.)
Question 19.16
Question 19.16
How can I delete a file?
The Standard C Library function is remove. (This is therefore one of the few questions in this section
for which the answer is not ``It's system-dependent.'') On older, pre-ANSI Unix systems, remove may
not exist, in which case you can try unlink.
References: K&R2 Sec. B1.1 p. 242
ANSI Sec. 4.9.4.1
ISO Sec. 7.9.4.1
H&S Sec. 15.15 p. 382
PCS Sec. 12 pp. 208,220-221
POSIX Sec. 5.5.1, Sec. 8.2.4
Question 19.15
Question 19.15
How can I recover the file name given an open stream or file descriptor?
This problem is, in general, insoluble. Under Unix, for instance, a scan of the entire disk (perhaps
involving special permissions) would theoretically be required, and would fail if the descriptor were
connected to a pipe or referred to a deleted file (and could give a misleading answer for a file with
multiple links). It is best to remember the names of files yourself when you open them (perhaps with a
wrapper function around fopen).
Question 2.24
Question 2.24
Is there an easy way to print enumeration values symbolically?
No. You can write a little function to map an enumeration constant to a string. (If all you're worried
about is debugging, a good debugger should automatically print enumeration constants symbolically.)
Question 2.22
Question 2.22
What is the difference between an enumeration and a set of preprocessor #defines?
At the present time, there is little difference. Although many people might have wished otherwise, the C
Standard says that enumerations may be freely intermixed with other integral types, without errors. (If
such intermixing were disallowed without explicit casts, judicious use of enumerations could catch
certain programming errors.)
Some advantages of enumerations are that the numeric values are automatically assigned, that a debugger
may be able to display the symbolic values when enumeration variables are examined, and that they obey
block scope. (A compiler may also generate nonfatal warnings when enumerations and integers are
indiscriminately mixed, since doing so can still be considered bad style even though it is not strictly
illegal.) A disadvantage is that the programmer has little control over those nonfatal warnings; some
programmers also resent not having control over the sizes of enumeration variables.
References: K&R2 Sec. 2.3 p. 39, Sec. A4.2 p. 196
ANSI Sec. 3.1.2.5, Sec. 3.5.2, Sec. 3.5.2.2, Appendix E
ISO Sec. 6.1.2.5, Sec. 6.5.2, Sec. 6.5.2.2, Annex F
H&S Sec. 5.5 pp. 127-9, Sec. 5.11.2 p. 153
Question 2.20
Question 2.20
Can I initialize unions?
ANSI Standard C allows an initializer for the first member of a union. There is no standard way of
initializing any other member (nor, under a pre-ANSI compiler, is there generally any way of initializing
a union at all).
References: K&R2 Sec. 6.8 pp. 148-9
ANSI Sec. 3.5.7
ISO Sec. 6.5.7
H&S Sec. 4.6.7 p. 100
Question 10.8
Question 10.8
Where are header (``#include'') files searched for?
The exact behavior is implementation-defined (which means that it is supposed to be documented; see
question 11.33). Typically, headers named with <> syntax are searched for in one or more standard
places. Header files named with "" syntax are first searched for in the ``current directory,'' then (if not
found) in the same standard places.
Traditionally (especially under Unix compilers), the current directory is taken to be the directory
containing the file containing the #include directive. Under other compilers, however, the current
directory (if any) is the directory in which the compiler was initially invoked. Check your compiler
documentation.
References: K&R2 Sec. A12.4 p. 231
ANSI Sec. 3.8.2
ISO Sec. 6.8.2
H&S Sec. 3.4 p. 55
Question 17.8
Question 17.8
What is Hungarian Notation''? Is it worthwhile?
Hungarian Notation is a naming convention, invented by Charles Simonyi, which encodes things about a
variable's type (and perhaps its intended use) in its name. It is well-loved in some circles and roundly
castigated in others. Its chief advantage is that it makes a variable's type or intended use obvious from its
name; its chief disadvantage is that type information is not necessarily a worthwhile thing to carry around
in the name of a variable.
References: Simonyi and Heller, ``The Hungarian Revolution''
Question 17.5
Question 17.5
I came across some code that puts a (void) cast before each call to printf. Why?
printf does return a value, though few programs bother to check the return values from each call.
Since some compilers (and lint) will warn about discarded return values, an explicit cast to (void) is
a way of saying ``Yes, I've decided to ignore the return value from this call, but please continue to warn
me about other (perhaps inadvertently) ignored return values.'' It's also common to use void casts on
calls to strcpy and strcat, since the return value is never surprising.
References: K&R2 Sec. A6.7 p. 199
Rationale Sec. 3.3.4
H&S Sec. 6.2.9 p. 172, Sec. 7.13 pp. 229-30
Question 17.4
Question 17.4
Why do some people write if(0 == x) instead of if(x == 0)?
Question 17.3
Question 17.3
Here's a neat trick for checking whether two strings are equal:
if(!strcmp(s1, s2))
Is this good style?
It is not particularly good style, although it is a popular idiom. The test succeeds if the two strings are
equal, but the use of ! (``not'') suggests that it tests for inequality.
A better option is to use a macro:
#define Streq(s1, s2) (strcmp((s1), (s2)) == 0)
Opinions on code style, like those on religion, can be debated endlessly. Though good style is a worthy
goal, and can usually be recognized, it cannot be rigorously codified. See also question 17.10.
Question 17.1
Question 17.1
What's the best style for code layout in C?
K&R, while providing the example most often copied, also supply a good excuse for disregarding it:
The position of braces is less important, although people hold passionate beliefs. We have
chosen one of several popular styles. Pick a style that suits you, then use it consistently.
It is more important that the layout chosen be consistent (with itself, and with nearby or common code)
than that it be ``perfect.'' If your coding environment (i.e. local custom or company policy) does not
suggest a style, and you don't feel like inventing your own, just copy K&R. (The tradeoffs between
various indenting and brace placement options can be exhaustively and minutely examined, but don't
warrant repetition here. See also the Indian Hill Style Guide.)
The elusive quality of ``good style'' involves much more than mere code layout details; don't spend time
on formatting to the exclusion of more substantive code quality issues.
See also question 10.6.
References: K&R1 Sec. 1.2 p. 10
K&R2 Sec. 1.2 p. 10
Question 7.19
Question 7.19
My program is crashing, apparently somewhere down inside malloc, but I can't see anything wrong
with it.
It is unfortunately very easy to corrupt malloc's internal data structures, and the resulting problems can
be stubborn. The most common source of problems is writing more to a malloc'ed region than it was
allocated to hold; a particularly common bug is to malloc(strlen(s)) instead of strlen(s) +
1. Other problems may involve using pointers to freed storage, freeing pointers twice, freeing
pointers not obtained from malloc, or trying to realloc a null pointer (see question 7.30).
See also questions 7.26, 16.8, and 18.2.
Memory Allocation
7. Memory Allocation
7.1 Why doesn't the code ``char *answer; gets(answer);'' work?
7.2 I can't get strcat to work. I tried ``char *s3 = strcat(s1, s2);'' but I got strange results.
7.3 But the man page for strcat says that it takes two char *'s as arguments. How am I supposed to
know to allocate things?
7.5 I have a function that is supposed to return a string, but when it returns to its caller, the returned
string is garbage.
7.6 Why am I getting ``warning: assignment of pointer from integer lacks a cast'' for calls to malloc?
7.7 Why does some code carefully cast the values returned by malloc to the pointer type being
allocated?
7.8 Why does so much code leave out the multiplication by sizeof(char) when allocating strings?
7.14 I've heard that some operating systems don't actually allocate malloc'ed memory until the program
tries to use it. Is this legal?
7.16 I'm allocating a large array for some numeric work, but malloc is acting strangely.
7.17 I've got 8 meg of memory in my PC. Why can I only seem to malloc 640K or so?
7.19 My program is crashing, apparently somewhere down inside malloc.
7.20 You can't use dynamically-allocated memory after you free it, can you?
7.21 Why isn't a pointer null after calling free?
7.22 When I call malloc to allocate memory for a local pointer, do I have to explicitly free it?
7.23 When I free a dynamically-allocated structure containing pointers, do I have to free each subsidiary
pointer first?
Memory Allocation
top
Question 7.3
Question 7.3
But the man page for strcat says that it takes two char *'s as arguments. How am I supposed to
know to allocate things?
In general, when using pointers you always have to consider memory allocation, if only to make sure that
the compiler is doing it for you. If a library function's documentation does not explicitly mention
allocation, it is usually the caller's problem.
The Synopsis section at the top of a Unix-style man page or in the ANSI C standard can be misleading.
The code fragments presented there are closer to the function definitions used by an implementor than
the invocations used by the caller. In particular, many functions which accept pointers (e.g. to structures
or strings) are usually called with the address of some object (a structure, or an array--see questions 6.3
and 6.4). Other common examples are time (see question 13.12) and stat.
Question 12.38
Question 12.38
How can I read a binary data file properly? I'm occasionally seeing 0x0a and 0x0d values getting
garbled, and it seems to hit EOF prematurely if the data contains the value 0x1a.
When you're reading a binary data file, you should specify "rb" mode when calling fopen, to make
sure that text file translations do not occur. Similarly, when writing binary data files, use "wb".
Note that the text/binary distinction is made when you open the file: once a file is open, it doesn't matter
which I/O calls you use on it. See also question 20.5.
References: ANSI Sec. 4.9.5.3
ISO Sec. 7.9.5.3
H&S Sec. 15.2.1 p. 348
Question 20.5
Question 20.5
How can I write data files which can be read on other machines with different word size, byte order, or
floating point formats?
The most portable solution is to use text files (usually ASCII), written with fprintf and read with
fscanf or the like. (Similar advice also applies to network protocols.) Be skeptical of arguments which
imply that text files are too big, or that reading and writing them is too slow. Not only is their efficiency
frequently acceptable in practice, but the advantages of being able to interchange them easily between
machines, and manipulate them with standard tools, can be overwhelming.
If you must use a binary format, you can improve portability, and perhaps take advantage of prewritten
I/O libraries, by making use of standardized formats such as Sun's XDR (RFC 1014), OSI's ASN.1
(referenced in CCITT X.409 and ISO 8825 ``Basic Encoding Rules''), CDF, netCDF, or HDF. See also
questions 2.12 and 12.38.
References: PCS Sec. 6 pp. 86,88
Question 20.3
Question 20.3
How do I access command-line arguments?
They are pointed to by the argv array with which main() is called.
References: K&R1 Sec. 5.11 pp. 110-114
K&R2 Sec. 5.10 pp. 114-118
ANSI Sec. 2.1.2.2.1
ISO Sec. 5.1.2.2.1
H&S Sec. 20.1 p. 416
PCS Sec. 5.6 pp. 81-2, Sec. 11 p. 159, pp. 339-40 Appendix F
Schumacher, ed., Software Solutions in C Sec. 4 pp. 75-85
Question 19.41
Question 19.41
But I can't use all these nonstandard, system-dependent functions, because my program has to be ANSI
compatible!
You're out of luck. Either you misunderstood your requirement, or it's an impossible one to meet.
ANSI/ISO Standard C simply does not define ways of doing these things. (POSIX defines a few.) It is
possible, and desirable, for most of a program to be ANSI-compatible, deferring the system-dependent
functionality to a few routines in a few files which are rewritten for each system ported to.
Question 19.40b
Question 19.40b
How do I use BIOS calls? How can I write ISR's? How can I create TSR's?
These are very particular to specific systems (PC compatibles running MS-DOS, most likely). You'll get
much better information in a specific newsgroup such as comp.os.msdos.programmer or its FAQ list;
another excellent resource is Ralf Brown's interrupt list.
Question 19.40
Question 19.40
How do I... Use sockets? Do networking? Write client/server applications?
All of these questions are outside of the scope of this list and have much more to do with the networking
facilities which you have available than they do with C. Good books on the subject are Douglas Comer's
three-volume Internetworking with TCP/IP and W. R. Stevens's UNIX Network Programming. (There is
also plenty of information out on the net itself.)
Question 19.39
Question 19.39
How can I handle floating-point exceptions gracefully?
On many systems, you can define a routine matherr which will be called when there are certain
floating-point errors, such as errors in the math routines in <math.h>. You may also be able to use
signal (see question 19.38) to catch SIGFPE. See also question 14.9.
References: Rationale Sec. 4.5.1
Question 19.38
Question 19.38
How can I trap or ignore keyboard interrupts like control-C?
Question 19.38
Question 19.37
Question 19.37
How can I implement a delay, or time a user's response, with sub-second resolution?
Unfortunately, there is no portable way. V7 Unix, and derived systems, provided a fairly useful ftime
routine with resolution up to a millisecond, but it has disappeared from System V and POSIX. Other
routines you might look for on your system include clock, delay, gettimeofday, msleep, nap,
napms, setitimer, sleep, times, and usleep. (A routine called wait, however, is at least
under Unix not what you want.) The select and poll calls (if available) can be pressed into service to
implement simple delays. On MS-DOS machines, it is possible to reprogram the system timer and timer
interrupts.
Of these, only clock is part of the ANSI Standard. The difference between two calls to clock gives
elapsed execution time, and if CLOCKS_PER_SEC is greater than 1, the difference will have subsecond
resolution. However, clock gives elapsed processor time used by the current program, which on a
multitasking system may differ considerably from real time.
If you're trying to implement a delay and all you have available is a time-reporting function, you can
implement a CPU-intensive busy-wait, but this is only an option on a single-user, single-tasking machine
as it is terribly antisocial to any other processes. Under a multitasking operating system, be sure to use a
call which puts your process to sleep for the duration, such as sleep or select, or pause in
conjunction with alarm or setitimer.
For really brief delays, it's tempting to use a do-nothing loop like
long int i;
for(i = 0; i < 1000000; i++)
;
but resist this temptation if at all possible! For one thing, your carefully-calculated delay loops will stop
working next month when a faster processor comes out. Perhaps worse, a clever compiler may notice that
the loop does nothing and optimize it away completely.
References: H&S Sec. 18.1 pp. 398-9
PCS Sec. 12 pp. 197-8,215-6
POSIX Sec. 4.5.2
Question 19.37
Question 19.36
Question 19.36
How can I read in an object file and jump to routines in it?
You want a dynamic linker or loader. It may be possible to malloc some space and read in object files,
but you have to know an awful lot about object file formats, relocation, etc. Under BSD Unix, you could
use system and ld -A to do the linking for you. Many versions of SunOS and System V have the -ldl
library which allows object files to be dynamically loaded. Under VMS, use
LIB$FIND_IMAGE_SYMBOL. GNU has a package called ``dld''. See also question 15.13.
Question 15.13
Question 15.13
How can I call a function with an argument list built up at run time?
There is no guaranteed or portable way to do this. If you're curious, however, this list's editor has a few
wacky ideas you could try...
Instead of an actual argument list, you might consider passing an array of generic (void *) pointers.
The called function can then step through the array, much like main() might step through argv.
(Obviously this works only if you have control over all the called functions.)
(See also question 19.36.)
Somehow I've never gotten around to writing the definitive essay on those "wacky ideas". Here are
several messages I've written at various times on the subject of what I call the "inverse varargs problem";
they summarize most of the ideas, and should at least get you started. (There is some overlap, because
some of the later ones were written when I didn't have copies of the earlier ones handy.)
article posted to comp.unix.wizards and comp.lang.c 1989-06-04
article posted to comp.lang.c 1992-07-14
mail message sent 1993-03-07 to someone asking about the "wacky ideas"
more recent ideas (1997-06-28)
most recent ideas (2001-05-27)
[This article was originally posted on June 4, 1989. I have altered the presentation slightly for this web page.]
From: scs@adam.pika.mit.edu (Steve Summit)
Newsgroups: comp.unix.wizards,comp.lang.c
Subject: Re: Needed: A (Portable) way of setting up the arg stack
Keywords: 1/varargs, callg
Message-ID: <11830@bloom-beacon.MIT.EDU>
Date: 4 Jun 89 16:43:52 GMT
References: <708@mitisft.Convergent.COM> <32208@apple.Apple.COM> <10354@smoke.BRL.MIL>
In article <708@mitisft.Convergent.COM> Gregory Kemnitz writes:
>I need to know how (or if) *NIX (System V.3) has the ability to let
>a stack of arguments be set for a function before it is called. I
>have several hundred pointers to functions which are called from one
>place, and each function has different numbers of arguments.
A nice problem. Doug Gwyn's suggestion is the right one, for maximum portability, but constrains the form of the
called subroutines and also any calls that do not go through the "inverse varargs mechanism." (That is, you can't
really call the subroutines in question without building the little argument vector.)
For transparency (at some expense in portability) I use a routine I call "callg," named after the VAX instruction of
the same name. (This is equivalent to Peter Desnoyers' "va_call" routine; in retrospect, I like his name better.)
va_call can be implemented in one line of assembly language on the VAX; it typically requires ten or twenty
lines on other machines, to copy the arguments from the vector to the real stack (or wherever arguments are really
passed). I have implementations for the PDP11, NS32000, 68000, and 80x86. (This is a machine specific problem,
not an operating system specific problem.) A routine such as va_call must be written in assembly language; it is
one of the handful of functions I know of that cannot possibly be written in C.
Not all machines use a stack; some use register passing or other conventions. For maximum portability, then, the
interface to a routine like va_call should allow the type of each argument to be explicitly specified, as well as
hiding the details of the argument vector construction. I have been contemplating an interface similar to that
illustrated by the following example:
#include "varargs2.h"
extern printf();
main()
{
va_stack(stack, 10);
va_push(stack,
va_push(stack,
va_push(stack,
va_push(stack,
va_call(printf, stack);
}
Note that this calls the standard printf; printf need take no special precautions, and indeed cannot necessarily
tell that it has not been called normally. (This is what I meant by "transparency.")
On a "conventional," stack-based machine, va_stack would declare an array of 10 ints (assuming that int is
the machine's natural word size) and va_push would copy words to it using pointer manipulations analogous to
those used by the va_arg macro in the current varargs and stdarg implementations. (Note that "declare
vector which holds up to 10 args" is therefore misleading; the vector holds up to 10 words, and it is up to the
programmer to leave enough slop for multi-word types such as long and double. The distinction between a
"word" and an "argument" is the one that always comes up when someone suggests supplying a va_nargs()
macro; let's not start that discussion again.)
For a register-passing machine, the choice of registers may depend on the types of the arguments. For this reason,
the interface must allow the type information to be retained in the argument vector for inspection by the va_call
routine. This would be easier to be implement if manifest constants were used, instead of C type names:
va_push(stack, 12, VA_INT);
va_push(stack, 3.14, VA_DOUBLE);
va_push(stack, "Hello, world!", VA_POINTER);
Since it would be tricky to "switch" on these constants inside the va_push macro to decide how many words of the
vector to set aside, separate push macros might be preferable:
va_push_int(stack, 12);
va_push_double(stack, 3.14);
va_push_pointer(stack, "Hello, world!");
(This option has the additional advantage over the single va_push in that it does not require that the second macro
argument be of variable type.) There is still a major difficulty here, however, in that one cannot assume the existence
of a single kind of pointer.
For the "worst" machines, the full generality of C type names (as in the first example) would probably be required.
Unfortunately, to do everything with type names you might want to do, you have to handle them specially in the
compiler. (On the other hand, the machines that would have trouble with va_push are probably the same ones that
already have to have the varargs or stdarg mechanisms recognized by the compiler.)
Lest you think that va_call, if implementable, solves the whole problem, don't let your breath out yet: what
should the return value be? In the most general case, the routines being indirectly called might return different types.
The return value of va_call would not, so to speak, be representable in closed form.
This last wrinkle (variable return type on top of variable arguments passed) is at the heart of a C interpreter that
allows intermixing of interpreted and compiled code. I know how I solved it; I'd be curious to know how Saber C
solves it. (I solved it with two more assembly language routines, also unimplementable in C. A better solution, to
half of the problem, anyway, would be to to provide a third argument, a union pointer of some kind, to va_call
http://www.eskimo.com/~scs/C-faq/varargs/pf.c
#include "varargs2.h"
extern printf();
main()
{
va_stack(stack, 10);
va_stack_init(stack);
va_push(stack,
va_push(stack,
va_push(stack,
va_push(stack,
va_call(printf, stack);
}
http://www.eskimo.com/~scs/C-faq/varargs/varargs2.h
#ifndef VARARGS2_H
#define VARARGS2_H
#define va_stack(name, max) int name[max+1]
#define va_stack_init(stack) stack[0] = 0
#define va_push(stack, arg, type) *(type *)(&stack[stack[0]+1]) = (arg), \
stack[0] += (sizeof(type) + sizeof(int) - 1) / sizeof(int)
#ifdef __STDC__
extern int va_call(int (*)(), int *);
#endif
#endif
[This article was originally posted on July 14, 1992. I have edited the text slightly for this web page.]
Newsgroups: comp.lang.c
From: scs@adam.mit.edu (Steve Summit)
Subject: Re: varargs fn calling varargs fn?
Message-ID: <1992Jul14.214625.15684@athena.mit.edu>
References: <1992Jul9.171606.24625@shearson.com> <1992Jul10.015616.12264@infodev.cam.ac.uk>
Date: Tue, 14 Jul 1992 21:46:25 GMT
In article <1992Jul9.171606.24625@shearson.com>, Frank Greco (fgreco@shearson.com) writes:
> Does anyone have any examples of a C function that uses varargs
> that calls another C function that uses the same args?
> ...
> The man page for varargs(3) sez:
>
> The argument list (or its remainder) can be passed to
> another function using a pointer to a variable of type
> va_list()...
>
> What do they mean "pointer to a variable of type va_list()? I've tried
> several combinations and bus-erroring away...
And in article <1992Jul10.015616.12264@infodev.cam.ac.uk>, Colin Bell (crb11@cus.cam.ac.uk)
writes:
> I've solved this one, but have a related one of my own. I have a function which handles
> message handling for a system: ie takes a list of arguments as one would send to
> fprintf prefixed by two or three telling the function what to do with the resulting
> string. What I would like to do is strip these arguments off the front and then pass
> the entire remainder to sprintf, and deal with the resulting string.
Preliminary answers to both of these questions can be found in the comp.lang.c frequently-asked
questions (FAQ) list:
6.2:
How can I write a function that takes a format string and a variable number of
arguments, like printf, and passes them to printf to do most of the work?
A:
Use vprintf, vfprintf, or vsprintf.
Here is an "error" routine which prints an error message, preceded by the string "error: "
and terminated with a newline:
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19920714.html (1 of 6) [22/07/2003 5:42:15 PM]
...one often speaks of "the variable-length part of a variable-length argument list," which is kind of a silly
mouthful, but it is often important to distinguish the variable-length part from the fixed part, which in
ANSI C must always exist. (For example, printf always takes one char * argument -- the "fixed"
part -- followed by 0 or more arguments of arbitrary types.)
[Passing a va_list pointer along to another routine] is the preferred way of doing things, but note that
a function that accepts a single va_list pointer is not at all variadic. (As far as C is concerned, it
accepts exactly one argument.)
This is the solution alluded to by the (admittedly rather opaque) FAQ list wording
You must provide a version of that other function which accepts a va_list pointer, as
does vfprintf in the example above.
That is, the answer to the question "Can I write a debug_printf() in terms of printf()?" is "No."
printf() is variadic, but vprintf et al take va_list pointers, so you can write a
debug_printf() in terms of one of them.
*
"Variadic" means (at least as I use it) very specifically that a subroutine, as the C compiler sees it,
accepts a variable number of formal parameters. (Sometimes the term "formal parameter" is used when
we must distinguish very carefully between the generic things a subroutine is set up to receive and the
"actual arguments" which are passed to it by a specific subroutine call.)
It is fairly easy to prove that one cannot call a variadic function with an argument list which varies at run
time (which is what you need to do if you're going to use the variable-length argument list that another
variadic function has just received). The only way that a C compiler will generate a subroutine call is
when you write an expression of the form
function ( argument-list )
where function is an expression which evaluates to a function (either the name of a function, or the
"contents" of a pointer- to-function), and argument-list is a list of 0 or more comma- separated
expressions. Obviously, since they appear in the source code, there must be a fixed number of arguments
(in any one call).
*
[When a variadic function tries to pass the variable arguments along to another, it] doesn't matter whether
or not the first function does any additional "processing" or not, and it doesn't matter whether the second
function uses stdarg.h or not (there are other, less portable methods of accessing variable-length
argument lists, and the second function might not even be written in C). The essential question is whether
a variadic function can call another variadic function (with the strict interpretation of "variadic" which I
use), passing to the second function the (indeterminate number of) arguments which it receives (at runhttp://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19920714.html (3 of 6) [22/07/2003 5:42:15 PM]
There is, however, neither a syntax by which we could declare an "object" of "type"
"variable-length argument list," nor a mechanism by which we could initialize one with a
particular value (i.e. with a particular variable- length argument list). Therefore, we cannot
accept a variable-length argument list in one variadic function and pass it (as a variablelength argument list) to another variadic function. What we can do is to take a pointer (a
va_list) to the variable-length argument list (using va_start), and pass that pointer
(as a single argument) to another function; however, that other, va_list-accepting
function is typically not variadic.
Examples of variadic functions, which accept variable- length argument lists, are printf,
fprintf, and sprintf. Examples of non-variadic functions, which accept pointers to
variable-length argument lists as single va_lists, are vprintf, vfprintf, and
vsprintf.
As a point of general advice, whenever one writes a function which accepts a variable
number of arguments, it is a good idea to write a companion routine which accepts a single
va_list. Writing these two routines is no more work than writing just the one, because
the first, variadic function can be implemented very simply in terms of the second -- it just
has to call va_start, then pass the resultant va_list to the second function.
By implementing a pair of functions, you leave the door open for someone (you or anyone
else) later to build additional functionality "on top" of the function(s), by writing another
variadic function (or, better, a variadic and non-variadic pair) which calls the second,
va_list-accepting, lower-level function as well as doing some additional processing.
If you were to write only a single, variadic function, later programmers who tried to
implement additional functionality on top of it would find themselves in the same situation
as did people trying to write things like debug_printf() in the days before the
v*printf family had been invented: they had no portable way to "hook up" to the core
functionality provided by the *printf family; they had to use nonportable kludges to call
printf with (at least some of) the (variable) arguments which the higher-level functions
received, or they had to abandon printf entirely and reimplement some or all of its
functionality.
Obviously, that's far too long-winded and repetetetive for the FAQ list, but it's a good first draft for me to
try to whittle down and distill the essence out of. (There's also a slight imperfection in its conceptual
model -- although it alludes to "objects of type `variable-length argument list,'" there are really no values
of type "variable-length argument list." All actual argument lists obviously have a certain number of
arguments, so an "object of type `variable-length argument list'" is just one that can hold arbitrary fixedlength argument lists, regardless of how long they are.)
Steve Summit
scs@eskimo.com
Question 15.5
Question 15.5
How can I write a function that takes a format string and a variable number of arguments, like printf,
and passes them to printf to do most of the work?
Question 15.7
Question 15.7
I have a pre-ANSI compiler, without <stdarg.h>. What can I do?
There's an older header, <varargs.h>, which offers about the same functionality.
References: H&S Sec. 11.4 pp. 296-9
CT&P Sec. A.2 pp. 134-139
PCS Sec. 11 pp. 184-5, Sec. 13 p. 250
Question 15.6
Question 15.6
How can I write a function analogous to scanf, that calls scanf to do most of the work?
Unfortunately, vscanf and the like are not standard. You're on your own.
top
Question 15.2
Question 15.2
How can %f be used for both float and double arguments in printf? Aren't they different types?
In the variable-length part of a variable-length argument list, the ``default argument promotions'' apply:
types char and short int are promoted to int, and float is promoted to double. (These are the
same promotions that apply to function calls without a prototype in scope, also known as ``old style''
function calls; see question 11.3.) Therefore, printf's %f format always sees a double. (Similarly,
%c always sees an int, as does %hd.) See also questions 12.9 and 12.13.
References: ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
H&S Sec. 6.3.5 p. 177, Sec. 9.4 pp. 272-3
Copyright
This collection of hypertext pages is Copyright 1995 by Steve Summit. Content from the book "C
Programming FAQs: Frequently Asked Questions" (Addison-Wesley, 1995, ISBN 0-201-84519-9) is
made available here by permission of the author and the publisher as a service to the community. It is
intended to complement the use of the published text and is protected by international copyright laws.
The content is made available here and may be accessed freely for personal use but may not be published
or retransmitted without written permission.
Question 15.8
Question 15.8
How can I discover how many arguments a function was actually called with?
This information is not available to a portable program. Some old systems provided a nonstandard
nargs function, but its use was always questionable, since it typically returned the number of words
passed, not the number of arguments. (Structures, long ints, and floating point values are usually
passed as several words.)
Any function which takes a variable number of arguments must be able to determine from the arguments
themselves how many of them there are. printf-like functions do this by looking for formatting
specifiers (%d and the like) in the format string (which is why these functions fail badly if the format
string does not match the argument list). Another common technique, applicable when the arguments are
all of the same type, is to use a sentinel value (often 0, -1, or an appropriately-cast null pointer) at the end
of the list (see the execl and vstrcat examples in questions 5.2 and 15.4). Finally, if their types are
predictable, you can pass an explicit count of the number of variable arguments (although it's usually a
nuisance for the caller to generate).
References: PCS Sec. 11 pp. 167-8
Question 15.9
Question 15.9
My compiler isn't letting me declare a function
int f(...)
{
}
i.e. with no fixed arguments.
Standard C requires at least one fixed argument, in part so that you can hand it to va_start.
References: ANSI Sec. 3.5.4, Sec. 3.5.4.3, Sec. 4.8.1.1
ISO Sec. 6.5.4, Sec. 6.5.4.3, Sec. 7.8.1.1
H&S Sec. 9.2 p. 263
Question 15.10
Question 15.10
I have a varargs function which accepts a float parameter. Why isn't
va_arg(argp, float)
working?
In the variable-length part of variable-length argument lists, the old ``default argument promotions''
apply: arguments of type float are always promoted (widened) to type double, and types char and
short int are promoted to int. Therefore, it is never correct to invoke va_arg(argp, float);
instead you should always use va_arg(argp, double). Similarly, use va_arg(argp, int) to
retrieve arguments which were originally char, short, or int. See also questions 11.3 and 15.2.
References: ANSI Sec. 3.3.2.2
ISO Sec. 6.3.2.2
Rationale Sec. 4.8.1.2
H&S Sec. 11.4 p. 297
Question 15.11
Question 15.11
I can't get va_arg to pull in an argument of type pointer-to-function.
The type-rewriting games which the va_arg macro typically plays are stymied by overly-complicated
types such as pointer-to-function. If you use a typedef for the function pointer type, however, all will
be well. See also question 1.21.
References: ANSI Sec. 4.8.1.2
ISO Sec. 7.8.1.2
Rationale Sec. 4.8.1.2
Question 15.12
Question 15.12
How can I write a function which takes a variable number of arguments and passes them to some other
function (which takes a variable number of arguments)?
In general, you cannot. Ideally, you should provide a version of that other function which accepts a
va_list pointer (analogous to vfprintf; see question 15.5). If the arguments must be passed
directly as actual arguments, or if you do not have the option of rewriting the second function to accept a
va_list (in other words, if the second, called function must accept a variable number of arguments,
not a va_list), no portable solution is possible. (The problem could perhaps be solved by resorting to
machine-specific assembly language; see also question 15.13.)
[This message was originally sent on March 7, 1993, to someone who asked about the "wacky ideas".]
From: scs@adam.mit.edu (Steve Summit)
Subject: Re: dynamic function call
Date: Sun, 7 Mar 93 17:58:52 -0500
Message-Id: <9303072258.AA22973@adam.MIT.EDU>
You wrote:
> I was curious to find out your ideas on the below question appearing
> in the C language FAQ:
>
> 7.5: How can I call a function with an argument list built up at run
> time?
>
> A: There is no guaranteed or portable way to do this. If you're
> curious, ask this list's editor, who has a few wacky ideas you
> could try... (See also question 16.10.)
[current version of this question, now numbered 15.13]
Believe it or not, you're the first to have asked, so you get to be the first to hear me apologize for the fact
that several elaborate discussions of those "wacky ideas" which I've written at various times are in fact
stuck off in my magtape archives somewhere, not readily accessible.
[The two main ones I was probably thinking of are this one and this one.]
Here is an outline; I'll try to find one of my longer old descriptions and send it later.
The basic idea is to postulate the existence of the following routine:
#include <stdarg.h>
callg(funcp, argp)
int (*funcp)();
magic_arglist_pointer argp;
This routine calls the function pointed to by funcp, passing to it the argument list pointed to by argp.
It returns whatever the pointed-to function returns.
The second question is of course how to construct the argument list pointed to by argp. I've had a
number of ideas over the years on how to do this; perhaps the best (or at least the one I'm currently
happiest with) is to do something like this:
extern func();
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19930307.html (1 of 3) [22/07/2003 5:42:38 PM]
va_alloc(arglist, 10);
va_list argp;
va_start2(arglist, argp);
va_arg(argp, int) = 1;
va_arg(argp, double) = 2.3;
va_arg(argp, char *) = "four";
va_finish2(arglist, argp);
callg(func, arglist);
The above is equivalent to the simple call
func(1, 2.3, "four");
Now, the interesting thing is that it's often (perhaps even "usually") possible to construct va_alloc,
va_start2, and va_finish2 macros such as I've illustrated above such that the standard va_arg
macro out of <stdarg.h> or <varargs.h> does the real work. (In other words, the traditional
implementations of va_arg would in fact work in an lvalue context, i.e. on the left hand side of an
assignment.)
I'm fairly sure I've got working versions of macros along the lines of va_alloc, va_start2, and
va_finish2 macros somewhere, but I can't find them at the moment, either. At the end of this
message I'll append a different set of macros, not predicated on the assumption of being able to re-use the
existing va_arg on the left hand side, which should serve as an example of the essential
implementation ideas.
A third question is what to do if the return type (not just the argument list) of the called function is not
known at compile time. (If you're still with me, we're moving in the direction of doing so much at run
time that we've practically stopped compiling and started interpreting, and in fact many of the ideas I'm
discussing in this note come out of my attempts to write a full-blown C interpreter.) We can answer the
third question by adding a third argument to our hypothetical callg() routine, involving a tag
describing the type of result we expect and a union in which any return value can be stored.
A fourth question is how to write this hypothetical callg() function. It is one of my favorite examples
of a function that essentially must be written in assembler, not for efficiency, but because it's something
you simply can't do in C. It's actually not terribly hard to write; I've got implementations of it for most of
the machines I use. I write it for a new environment by compiling a simple little program fragment such
as
int nargs;
http://www.eskimo.com/~scs/C-faq/varargs/invvarargs.19930307.html (2 of 3) [22/07/2003 5:42:38 PM]
int args[10];
int (*funcp)();
int i;
switch(nargs)
{
case 0: (*funcp)();
break;
case 1: (*funcp)(args[0]);
break;
case 2: (*funcp)(args[0], args[1]);
break;
...
for(i = 0; i < nargs; i++)
(*funcp)(args[i]);
, and massaging the assembly language output to push a variable number of arguments before performing
the (single) call. It's possible to do this without knowing very much else about the assembly/machine
language in use. (It helps a lot to have a compiler output or listing option which lets you see the its
generated assembly language.)
When I say it's generally easy to write callg, I'm thinking of conventional, stack-based machines.
Some modern machines pass some or all arguments in registers, and which registers are used can depend
on the type of the arguments, which makes this sort of thing much harder.
A fifth question is where my name "callg" comes from. The VAX has a single instruction, CALLG,
which does exactly what you want here. (In other words, an assembly-language implementation of this
callg() routine is a single line of assembler on the VAX.) I've also used the name va_call instead
of callg.
If you have questions or comments prompted by any of this, or if you'd like to see more code fragments,
feel free to ask.
Steve Summit
scs@eskimo.com
[The aforementioned "set of macros" is in this file: varargs2.h .]
[This message was originally sent on June 28, 1997, when I was temporarily acting as a moderator for
comp.lang.c.moderated. It mentions, without going into too much detail, two more-recent ideas I've
received on the subject.]
Subject: Re: Interpreter -- Calling functions
Date: Sat, 28 Jun 1997 10:44:03 -0700 (PDT)
Message-Id: <199706281744.KAA07952@solutions.solon.com>
X-scs-References: <199612161422.OAA02621@goday.ac.upc.es>
<9701171704.ZM25160@collie.hpl.hp.com>
Besides the ideas I passed along in that other message, here are two I've received more recently from
other netters. Fermin Javier Reig had a variation on the idea of using an interface like
Push(arg1);
Push(arg2);
...
Call(func);
, although his idea ran into several problems of its own, and anyway it addresses the part of the problem
that you've already solved, so I won't say more about it here. Jonathan Thompson had the idea of
"creating a very large structure of chars", filling in that structure with an image of the argument list (i.e.
the stack frame) to be built, and then passing what looks like a single argument, i.e. that one structure, to
the called function, which could -- with luck -- interpret it as multiple arguments. I think this trick could
be made to work on some machines, although I haven't experimented with it yet, and it obviously
assumes a pure stack calling model, i.e. it wouldn't work at all on a register-passing machine.
Steve Summit
comp.lang.c.moderated co-moderator
scs@eskimo.com
[This message was originally sent on May 27, 2001, to someone who was asking about constructing
variable-length argument lists. I have edited the text slightly for this web page.]
From: scs@eskimo.com (Steve Summit)
Date: Sun, 27 May 2001 10:49:32 -0500
Message-Id: <2001May27.1049.scs.009@aeroroot.scs.ndip.eskimo.net>
Subject: Re: Variable Argument lists in C
You wrote:
> Hi, I was searching on Google for a way to build a variable of type
> va_list. My task is to accept a variable number of parameters with
> one function, remove a few arguments from that list, and then pass the
> va_list to another function, probably vprintf or vsprintf...
> Do you know how this could be accomplished? I either need
> to simply remove the args from the va_list that I obtain from the
> first function call, or build a new va_list and omit the unwanted args.
Well, there are two very different answers to this question. The first is that there is no portable way of
doing this sort of thing at all; the "solutions" all involve either assembly language or ghastly kludges, or
both. If at all possible, you should find some other way of accomplishing your higher task that doesn't
involve trying to build or manipulate argument lists on the fly. Getting involved with dynamic argument
lists is a lot like getting involved with absinthe: darkly exotic and stimulating at first, but destructive and
mind-rotting in the end.
That's the sober, responsible answer. The second answer is that dynamic function calls are exotic and
stimulating and tons of fun. But they're not easy; they do require either assembly language or ghastly
kludges (or both). This is what question 15.13 in the comp.lang.c FAQ list is about; I'm appending my
collection of "wacky ideas" at the end of this message. Besides the ideas there, lately I've been having
mostly good luck with a set of gcc extensions: __builtin_apply_args, __builtin_apply, and
__builtin_return. (I can't tell you how to use these, because I barely understand them myself, but
if you're using gcc, they might be an option for you. The only documentation I know of for them is the
gcc Info node "Constructing Calls".)
Steve Summit
scs@eskimo.com
Question 19.33
Question 19.33
How can a process change an environment variable in its caller?
It may or may not be possible to do so at all. Different operating systems implement global name/value
functionality similar to the Unix environment in different ways. Whether the ``environment'' can be
usefully altered by a running program, and if so, how, is system-dependent.
Under Unix, a process can modify its own environment (some systems provide setenv or putenv
functions for the purpose), and the modified environment is generally passed on to child processes, but it
is not propagated back to the parent process.
Question 19.32
Question 19.32
How can I automatically locate a program's configuration files in the same directory as the executable?
It's hard; see also question 19.31. Even if you can figure out a workable way to do it, you might want to
consider making the program's auxiliary (library) directory configurable, perhaps with an environment
variable. (It's especially important to allow variable placement of a program's configuration files when
the program will be used by several people, e.g. on a multiuser system.)
Question 19.31
Question 19.31
How can my program discover the complete pathname to the executable from which it was invoked?
argv[0] may contain all or part of the pathname, or it may contain nothing. You may be able to
duplicate the command language interpreter's search path logic to locate the executable if the name in
argv[0] is present but incomplete. However, there is no guaranteed solution.
References: K&R1 Sec. 5.11 p. 111
K&R2 Sec. 5.10 p. 115
ANSI Sec. 2.1.2.2.1
ISO Sec. 5.1.2.2.1
H&S Sec. 20.1 p. 416
Question 14.9
Question 14.9
How do I test for IEEE NaN and other special values?
Many systems with high-quality IEEE floating-point implementations provide facilities (e.g. predefined
constants, and functions like isnan(), either as nonstandard extensions in <math.h> or perhaps in
<ieee.h> or <nan.h>) to deal with these values cleanly, and work is being done to formally
standardize such facilities. A crude but usually effective test for NaN is exemplified by
#define isnan(x) ((x) != (x))
although non-IEEE-aware compilers may optimize the test away.
Another possibility is to format the value in question using sprintf: on many systems it generates
strings like "NaN" and "Inf" which you could compare for in a pinch.
See also question 19.39.
Question 14.8
Question 14.8
The pre-#defined constant M_PI seems to be missing from my machine's copy of <math.h>.
That constant (which is apparently supposed to be the value of pi, accurate to the machine's precision), is
not standard. If you need pi, you'll have to #define it yourself.
References: PCS Sec. 13 p. 237
Question 14.7
Question 14.7
Why doesn't C have an exponentiation operator?
Because few processors have an exponentiation instruction. C has a pow function, declared in
<math.h>, although explicit multiplication is often better for small positive integral exponents.
References: ANSI Sec. 4.5.5.1
ISO Sec. 7.5.5.1
H&S Sec. 17.6 p. 393
Question 13.26
Question 13.26
I'm still getting errors due to library functions being undefined, even though I'm explicitly requesting the
right libraries while linking.
Many linkers make one pass over the list of object files and libraries you specify, and extract from
libraries only those modules which satisfy references which have so far come up as undefined. Therefore,
the order in which libraries are listed with respect to object files (and each other) is significant; usually,
you want to search the libraries last. (For example, under Unix, put any -l options towards the end of
the command line.) See also question 13.28.
Question 13.28
Question 13.28
What does it mean when the linker says that _end is undefined?
That message is a quirk of the old Unix linkers. You get an error about _end being undefined only when
other things are undefined, too--fix the others, and the error about _end will disappear. (See also
questions 13.25 and 13.26.)
Library Functions
Library Functions
13.24 I'm trying to port this old program. Why do I get ``undefined external'' errors for some library
functions?
13.25 I get errors due to library functions being undefined even though I #include the right header
files.
13.26 I'm still getting errors due to library functions being undefined, even though I'm requesting the
right libraries.
13.28 What does it mean when the linker says that _end is undefined?
top
Question 13.14
Question 13.14
How can I add n days to a date? How can I find the difference between two dates?
The ANSI/ISO Standard C mktime and difftime functions provide some support for both problems.
mktime accepts non-normalized dates, so it is straightforward to take a filled-in struct tm, add or
subtract from the tm_mday field, and call mktime to normalize the year, month, and day fields (and
incidentally convert to a time_t value). difftime computes the difference, in seconds, between two
time_t values; mktime can be used to compute time_t values for two dates to be subtracted.
These solutions are only guaranteed to work correctly for dates in the range which can be represented as
time_t's. The tm_mday field is an int, so day offsets of more than 32,736 or so may cause overflow.
Note also that at daylight saving time changeovers, local days are not 24 hours long.
Another approach to both problems is to use ``Julian day'' numbers. Implementations of Julian day
routines can be found in the file JULCAL10.ZIP from the Simtel/Oakland archives (see question 18.16)
and the ``Date conversions'' article mentioned in the References.
See also questions 13.13, 20.31, and 20.32.
References: K&R2 Sec. B10 p. 256
ANSI Secs. 4.12.2.2,4.12.2.3
ISO Secs. 7.12.2.2,7.12.2.3
H&S Secs. 18.4,18.5 pp. 401-2
David Burki, ``Date Conversions''
Miscellaneous
20. Miscellaneous
20.1 How can I return multiple values from a function?
20.3 How do I access command-line arguments?
20.5 How can I write data files which can be read on other machines with different data formats?
20.6 How can I call a function, given its name as a string?
20.8 How can I implement sets or arrays of bits?
20.9 How can I determine whether a machine's byte order is big-endian or little-endian?
20.10 How can I convert integers to binary or hexadecimal?
20.11 Can I use base-2 constants (something like 0b101010)?
Is there a printf format for binary?
20.12 What is the most efficient way to count the number of bits which are set in a value?
20.13 How can I make my code more efficient?
20.14 Are pointers really faster than arrays? How much do function calls slow things down?
20.17 Is there a way to switch on strings?
20.18 Is there a way to have non-constant case labels (i.e. ranges or arbitrary expressions)?
20.19 Are the outer parentheses in return statements really optional?
20.20 Why don't C comments nest? Are they legal inside quoted strings?
20.24 Why doesn't C have nested functions?
20.25 How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C?
Miscellaneous
top
Question 20.10
Question 20.10
How can I convert integers to binary or hexadecimal?
Make sure you really know what you're asking. Integers are stored internally in binary, although for most
purposes it is not incorrect to think of them as being in octal, decimal, or hexadecimal, whichever is
convenient. The base in which a number is expressed matters only when that number is read in from or
written out to the outside world.
In source code, a non-decimal base is indicated by a leading 0 or 0x (for octal or hexadecimal,
respectively). During I/O, the base of a formatted number is controlled in the printf and scanf
family of functions by the choice of format specifier (%d, %o, %x, etc.) and in the strtol and
strtoul functions by the third argument. During binary I/O, however, the base again becomes
immaterial.
For more information about ``binary'' I/O, see question 2.11. See also questions 8.6 and 13.1.
References: ANSI Secs. 4.10.1.5,4.10.1.6
ISO Secs. 7.10.1.5,7.10.1.6
Question 2.11
Question 2.11
How can I read/write structures from/to data files?
Question 8.6
Question 8.6
How can I get the numeric (character set) value corresponding to a character, or vice versa?
In C, characters are represented by small integers corresponding to their values (in the machine's
character set), so you don't need a conversion routine: if you have the character, you have its value.
Question 8.3
Question 8.3
If I can say
char a[] = "Hello, world!";
why can't I say
char a[14];
a = "Hello, world!";
Strings are arrays, and you can't assign arrays directly. Use strcpy instead:
strcpy(a, "Hello, world!");
See also questions 1.32, 4.2, and 7.2.
Question 4.2
Question 4.2
I'm trying to declare a pointer and allocate some space for it, but it's not working. What's wrong with this
code?
char *p;
*p = malloc(10);
The pointer you declared is p, not *p. To make a pointer point somewhere, you just use the name of the
pointer:
p = malloc(10);
It's when you're manipulating the pointed-to memory that you use * as an indirection operator:
*p = 'H';
See also questions 1.21, 7.1, and 8.3.
References: CT&P Sec. 3.1 p. 28
Question 3.16
Question 3.16
I have a complicated expression which I have to assign to one of two variables, depending on a
condition. Can I use code like this?
((condition) ? a : b) = complicated_expression;
No. The ?: operator, like most operators, yields a value, and you can't assign to a value. (In other words,
?: does not yield an lvalue.) If you really want to, you can try something like
*((condition) ? &a : &b) = complicated_expression;
although this is admittedly not as pretty.
References: ANSI Sec. 3.3.15 esp. footnote 50
ISO Sec. 6.3.15
H&S Sec. 7.1 pp. 179-180
Question 3.14
Question 3.14
Why doesn't the code
int a = 1000, b = 1000;
long int c = a * b;
work?
Under C's integral promotion rules, the multiplication is carried out using int arithmetic, and the result
may overflow or be truncated before being promoted and assigned to the long int left-hand side. Use
an explicit cast to force long arithmetic:
long int c = (long int)a * b;
Note that (long int)(a * b) would not have the desired effect.
A similar problem can arise when two integers are divided, with the result assigned to a floating-point
variable.
References: K&R1 Sec. 2.7 p. 41
K&R2 Sec. 2.7 p. 44
ANSI Sec. 3.2.1.5
ISO Sec. 6.2.1.5
H&S Sec. 6.3.4 p. 176
CT&P Sec. 3.9 pp. 49-50
Question 3.12
Question 3.12
If I'm not using the value of the expression, should I use i++ or ++i to increment a variable?
Since the two forms differ only in the value yielded, they are entirely equivalent when only their side
effect is needed.
See also question 3.3.
References: K&R1 Sec. 2.8 p. 43
K&R2 Sec. 2.8 p. 47
ANSI Sec. 3.3.2.4, Sec. 3.3.3.1
ISO Sec. 6.3.2.4, Sec. 6.3.3.1
H&S Sec. 7.4.4 pp. 192-3, Sec. 7.5.8 pp. 199-200
Question 3.4
Question 3.4
Can I use explicit parentheses to force the order of evaluation I want? Even if I don't, doesn't precedence
dictate it?
Not in general.
Operator precedence and explicit parentheses impose only a partial ordering on the evaluation of an
expression. In the expression
f() + g() * h()
although we know that the multiplication will happen before the addition, there is no telling which of the
three functions will be called first.
When you need to ensure the order of subexpression evaluation, you may need to use explicit temporary
variables and separate statements.
References: K&R1 Sec. 2.12 p. 49, Sec. A.7 p. 185
K&R2 Sec. 2.12 pp. 52-3, Sec. A.7 p. 200
Question 3.5
Question 3.5
But what about the && and || operators?
I see code like ``while((c = getchar()) != EOF && c != '\n')'' ...
There is a special exception for those operators (as well as the ?: operator): left-to-right evaluation is
guaranteed (as is an intermediate sequence point, see question 3.8). Any book on C should make this
clear.
References: K&R1 Sec. 2.6 p. 38, Secs. A7.11-12 pp. 190-1
K&R2 Sec. 2.6 p. 41, Secs. A7.14-15 pp. 207-8
ANSI Sec. 3.3.13, Sec. 3.3.14, Sec. 3.3.15
ISO Sec. 6.3.13, Sec. 6.3.14, Sec. 6.3.15
H&S Sec. 7.7 pp. 217-8, Sec. 7.8 pp. 218-20, Sec. 7.12.1 p. 229
CT&P Sec. 3.7 pp. 46-7
Question 4.3
Question 4.3
Does *p++ increment p, or what it points to?
Unary operators like *, ++, and -- all associate (group) from right to left. Therefore, *p++ increments
p (and returns the value pointed to by p before the increment). To increment the value pointed to by p,
use (*p)++ (or perhaps ++*p, if the order of the side effect doesn't matter).
References: K&R1 Sec. 5.1 p. 91
K&R2 Sec. 5.1 p. 95
ANSI Sec. 3.3.2, Sec. 3.3.3
ISO Sec. 6.3.2, Sec. 6.3.3
H&S Sec. 7.4.4 pp. 192-3, Sec. 7.5 p. 193, Secs. 7.5.7,7.5.8 pp. 199-200
Question 4.5
Question 4.5
I have a char * pointer that happens to point to some ints, and I want to step it over them. Why
doesn't
((int *)p)++;
work?
In C, a cast operator does not mean ``pretend these bits have a different type, and treat them
accordingly''; it is a conversion operator, and by definition it yields an rvalue, which cannot be assigned
to, or incremented with ++. (It is an anomaly in pcc-derived compilers, and an extension in gcc, that
expressions such as the above are ever accepted.) Say what you mean: use
p = (char *)((int *)p + 1);
or (since p is a char *) simply
p += sizeof(int);
Whenever possible, you should choose appropriate pointer types in the first place, instead of trying to
treat one type as another.
References: K&R2 Sec. A7.5 p. 205
ANSI Sec. 3.3.4 (esp. footnote 14)
ISO Sec. 6.3.4
Rationale Sec. 3.3.2.4
H&S Sec. 7.1 pp. 179-80
Pointers
4. Pointers
4.2 What's wrong with "char *p; *p = malloc(10);"?
4.3 Does *p++ increment p, or what it points to?
4.5 I want to use a char * pointer to step over some ints. Why doesn't "((int *)p)++;" work?
4.8 I have a function which accepts, and is supposed to initialize, a pointer, but the pointer in the caller
remains unchanged.
4.9 Can I use a void ** pointer to pass a generic pointer to a function by reference?
4.10 I have a function which accepts a pointer to an int. How can I pass a constant like 5 to it?
4.11 Does C even have ``pass by reference''?
4.12 I've seen different methods used for calling functions via pointers.
top
Question 8.2
Question 8.2
I'm checking a string to see if it matches a particular value. Why isn't this code working?
char *string;
...
if(string == "value") {
/* string matches "value" */
...
}
Strings in C are represented as arrays of characters, and C never manipulates (assigns, compares, etc.)
arrays as a whole. The == operator in the code fragment above compares two pointers--the value of the
pointer variable string and a pointer to the string literal "value"--to see if they are equal, that is, if
they point to the same place. They probably don't, so the comparison never succeeds.
To compare two strings, you generally use the library function strcmp:
if(strcmp(string, "value") == 0) {
/* string matches "value" */
...
}
Question 8.1
Question 8.1
Why doesn't
strcat(string, '!');
work?
There is a very real difference between characters and strings, and strcat concatenates strings.
Characters in C are represented by small integers corresponding to their character set values (see also
question 8.6). Strings are represented by arrays of characters; you usually manipulate a pointer to the first
character of the array. It is never correct to use one when the other is expected. To append a ! to a string,
use
strcat(string, "!");
See also questions 1.32, 7.2, and 16.6.
References: CT&P Sec. 1.5 pp. 9-10
Question 8.9
Question 8.9
I think something's wrong with my compiler: I just noticed that sizeof('a') is 2, not 1 (i.e. not
sizeof(char)).
Question 7.8
Question 7.8
I see code like
char *p = malloc(strlen(s) + 1);
strcpy(p, s);
Shouldn't that be malloc((strlen(s) + 1) * sizeof(char))?
It's never necessary to multiply by sizeof(char), since sizeof(char) is, by definition, exactly 1.
(On the other hand, multiplying by sizeof(char) doesn't hurt, and may help by introducing a
size_t into the expression.) See also question 8.9.
References: ANSI Sec. 3.3.3.4
ISO Sec. 6.3.3.4
H&S Sec. 7.5.2 p. 195
Question 7.7
Question 7.7
Why does some code carefully cast the values returned by malloc to the pointer type being allocated?
Before ANSI/ISO Standard C introduced the void * generic pointer type, these casts were typically
required to silence warnings (and perhaps induce conversions) when assigning between incompatible
pointer types. (Under ANSI/ISO Standard C, these casts are no longer necessary.)
References: H&S Sec. 16.1 pp. 386-7
Question 7.6
Question 7.6
Why am I getting ``warning: assignment of pointer from integer lacks a cast'' for calls to malloc?
Have you #included <stdlib.h>, or otherwise arranged for malloc to be declared properly?
References: H&S Sec. 4.7 p. 101
Question 7.14
Question 7.14
I've heard that some operating systems don't actually allocate malloc'ed memory until the program tries
to use it. Is this legal?
It's hard to say. The Standard doesn't say that systems can act this way, but it doesn't explicitly say that
they can't, either.
References: ANSI Sec. 4.10.3
ISO Sec. 7.10.3
Question 7.16
Question 7.16
I'm allocating a large array for some numeric work, using the line
double *array = malloc(256 * 256 * sizeof(double));
malloc isn't returning null, but the program is acting strangely, as if it's overwriting memory, or
malloc isn't allocating as much as I asked for, or something.
Notice that 256 x 256 is 65,536, which will not fit in a 16-bit int, even before you multiply it by
sizeof(double). If you need to allocate this much memory, you'll have to be careful. If size_t
(the type accepted by malloc) is a 32-bit type on your machine, but int is 16 bits, you might be able
to get away with writing 256 * (256 * sizeof(double)) (see question 3.14). Otherwise, you'll
have to break your data structure up into smaller chunks, or use a 32-bit machine, or use some
nonstandard memory allocation routines. See also question 19.23.
Question 19.22
Question 19.22
How can I find out how much memory is available?
Your operating system may provide a routine which returns this information, but it's quite systemdependent.
Question 19.20
Question 19.20
How can I read a directory in a C program?
See if you can use the opendir and readdir routines, which are part of the POSIX standard and are
available on most Unix variants. Implementations also exist for MS-DOS, VMS, and other systems. (MSDOS also has FINDFIRST and FINDNEXT routines which do essentially the same thing.) readdir
only returns file names; if you need more information about the file, try calling stat. To match
filenames to some wildcard pattern, see question 13.7.
References: K&R2 Sec. 8.6 pp. 179-184
PCS Sec. 13 pp. 230-1
POSIX Sec. 5.1
Schumacher, ed., Software Solutions in C Sec. 8
Question 19.18
Question 19.18
I'm getting an error, ``Too many open files''. How can I increase the allowable number of simultaneously
open files?
There are actually at least two resource limitations on the number of simultaneously open files: the
number of low-level ``file descriptors'' or ``file handles'' available in the operating system, and the
number of FILE structures available in the stdio library. Both must be sufficient. Under MS-DOS
systems, you can control the number of operating system file handles with a line in CONFIG.SYS. Some
compilers come with instructions (and perhaps a source file or two) for increasing the number of stdio
FILE structures.
Question 7.17
Question 7.17
I've got 8 meg of memory in my PC. Why can I only seem to malloc 640K or so?
Under the segmented architecture of PC compatibles, it can be difficult to use more than 640K with any
degree of transparency. See also question 19.23.
Question 7.20
Question 7.20
You can't use dynamically-allocated memory after you free it, can you?
No. Some early documentation for malloc stated that the contents of freed memory were ``left
undisturbed,'' but this ill-advised guarantee was never universal and is not required by the C Standard.
Few programmers would use the contents of freed memory deliberately, but it is easy to do so
accidentally. Consider the following (correct) code for freeing a singly-linked list:
struct list *listp, *nextp;
for(listp = base; listp != NULL; listp = nextp) {
nextp = listp->next;
free((void *)listp);
}
and notice what would happen if the more-obvious loop iteration expression listp = listp->next
were used, without the temporary nextp pointer.
References: K&R2 Sec. 7.8.5 p. 167
ANSI Sec. 4.10.3
ISO Sec. 7.10.3
Rationale Sec. 4.10.3.2
H&S Sec. 16.2 p. 387
CT&P Sec. 7.10 p. 95
Question 7.32
Question 7.32
What is alloca and why is its use discouraged?
alloca allocates memory which is automatically freed when the function which called alloca
returns. That is, memory allocated with alloca is local to a particular function's ``stack frame'' or
context.
alloca cannot be written portably, and is difficult to implement on machines without a conventional
stack. Its use is problematical (and the obvious implementation on a stack-based machine fails) when its
return value is passed directly to another function, as in fgets(alloca(100), 100, stdin).
For these reasons, alloca is not Standard and cannot be used in programs which must be widely
portable, no matter how useful it might be.
See also question 7.22.
References: Rationale Sec. 4.10.3
Question 16.6
Question 16.6
Why does this code:
char *p = "hello, world!";
p[0] = 'H';
crash?
String literals are not necessarily modifiable, except (in effect) when they are used as array initializers.
Try
char a[] = "hello, world!";
See also question 1.32.
References: ANSI Sec. 3.1.4
ISO Sec. 6.1.4
H&S Sec. 2.7.4 pp. 31-2
Question 16.5
Question 16.5
This program runs perfectly on one machine, but I get weird results on another. Stranger still, adding or
removing debugging printouts changes the symptoms...
Lots of things could be going wrong; here are a few of the more common things to check:
Proper use of function prototypes can catch several of these problems; lint would catch several more.
See also questions 16.3, 16.4, and 18.4.
Question 16.3
Question 16.3
This program crashes before it even runs! (When single-stepping with a debugger, it dies before the first
statement in main.)
You probably have one or more very large (kilobyte or more) local arrays. Many systems have fixed-size
stacks, and those which perform dynamic stack allocation automatically (e.g. Unix) can be confused
when the stack tries to grow by a huge chunk all at once. It is often better to declare large arrays with
static duration (unless of course you need a fresh set with each recursive call, in which case you could
dynamically allocate them with malloc; see also question 1.31).
(See also questions 11.12, 16.4, 16.5, and 18.4.)
Question 11.12
Question 11.12
Can I declare main as void, to shut off these annoying ``main returns no value'' messages?
No. main must be declared as returning an int, and as taking either zero or two arguments, of the
appropriate types. If you're calling exit() but still getting warnings, you may have to insert a
redundant return statement (or use some kind of ``not reached'' directive, if available).
Declaring a function as void does not merely shut off or rearrange warnings: it may also result in a
different function call/return sequence, incompatible with what the caller (in main's case, the C run-time
startup code) expects.
(Note that this discussion of main pertains only to ``hosted'' implementations; none of it applies to
``freestanding'' implementations, which may not even have main. However, freestanding
implementations are comparatively rare, and if you're using one, you probably know it. If you've never
heard of the distinction, you're probably using a hosted implementation, and the above rules apply.)
References: ANSI Sec. 2.1.2.2.1, Sec. F.5.1
ISO Sec. 5.1.2.2.1, Sec. G.5.1
H&S Sec. 20.1 p. 416
CT&P Sec. 3.10 pp. 50-51
Question 11.10
Question 11.10
Why can't I pass a char ** to a function which expects a const char **?
You can use a pointer-to-T (for any type T) where a pointer-to-const-T is expected. However, the rule
(an explicit exception) which permits slight mismatches in qualified pointer types is not applied
recursively, but only at the top level.
You must use explicit casts (e.g. (const char **) in this case) when assigning (or passing) pointers
which have qualifier mismatches at other than the first level of indirection.
References: ANSI Sec. 3.1.2.6, Sec. 3.3.16.1, Sec. 3.5.3
ISO Sec. 6.1.2.6, Sec. 6.3.16.1, Sec. 6.5.3
H&S Sec. 7.9.1 pp. 221-2
Question 11.13
Question 11.13
But what about main's third argument, envp?
It's a non-standard (though common) extension. If you really need to access the environment in ways
beyind what the standard getenv function provides, though, the global variable environ is probably a
better avenue (though it's equally non-standard).
References: ANSI Sec. F.5.1
ISO Sec. G.5.1
H&S Sec. 20.1 pp. 416-7
Question 11.14
Question 11.14
I believe that declaring void main() can't fail, since I'm calling exit instead of returning, and
anyway my operating system ignores a program's exit/return status.
It doesn't matter whether main returns or not, or whether anyone looks at the status; the problem is that
when main is misdeclared, its caller (the runtime startup code) may not even be able to call it correctly
(due to the potential clash of calling conventions; see question 11.12). Your operating system may ignore
the exit status, and void main() may work for you, but it is not portable and not correct.
Question 11.15
Question 11.15
The book I've been using, C Programing for the Compleat Idiot, always uses void main().
Perhaps its author counts himself among the target audience. Many books unaccountably use void
main() in examples. They're wrong.
Question 11.16
Question 11.16
Is exit(status) truly equivalent to returning the same status from main?
Yes and no. The Standard says that they are equivalent. However, a few older, nonconforming systems
may have problems with one or the other form. Also, a return from main cannot be expected to work
if data local to main might be needed during cleanup; see also question 16.4. (Finally, the two forms are
obviously not equivalent in a recursive call to main.)
References: K&R2 Sec. 7.6 pp. 163-4
ANSI Sec. 2.1.2.2.3
ISO Sec. 5.1.2.2.3
Question 16.4
Question 16.4
I have a program that seems to run correctly, but it crashes as it's exiting, after the last statement in
main(). What could be causing this?
Look for a misdeclared main() (see questions 2.18 and 10.9), or local buffers passed to setbuf or
setvbuf, or problems in cleanup functions registered by atexit. See also questions 7.5 and 11.16.
References: CT&P Sec. 5.3 pp. 72-3
Question 18.4
Question 18.4
I just typed in this program, and it's acting strangely. Can you see anything wrong with it?
See if you can run lint first (perhaps with the -a, -c, -h, -p or other options). Many C compilers are
really only half-compilers, electing not to diagnose numerous source code difficulties which would not
actively preclude code generation.
See also questions 16.5 and 16.8.
References: Ian Darwin, Checking C Programs with lint
Question 18.5
Question 18.5
How can I shut off the ``warning: possible pointer alignment problem'' message which lint gives me
for each call to malloc?
The problem is that traditional versions of lint do not know, and cannot be told, that malloc ``returns
a pointer to space suitably aligned for storage of any type of object.'' It is possible to provide a
pseudoimplementation of malloc, using a #define inside of #ifdef lint, which effectively shuts
this warning off, but a simpleminded definition will also suppress meaningful messages about truly
incorrect invocations. It may be easier simply to ignore the message, perhaps in an automated way with
grep -v. (But don't get in the habit of ignoring too many lint messages, otherwise one day you'll
overlook a significant one.)
Question 18.7
Question 18.7
Where can I get an ANSI-compatible lint?
Products called PC-Lint and FlexeLint (in ``shrouded source form,'' for compilation on 'most any system)
are available from
Gimpel Software
3207 Hogarth Lane
Collegeville, PA 19426
(+1) 610 584 4261
gimpel@netaxs.com
USA
The Unix System V release 4 lint is ANSI-compatible, and is available separately (bundled with other
C tools) from UNIX Support Labs or from System V resellers.
Another ANSI-compatible lint (which can also perform higher-level formal verification) is LCLint,
available via anonymous ftp from larch.lcs.mit.edu in pub/Larch/lclint/.
In the absence of lint, many modern compilers do attempt to diagnose almost as many problems as
lint does.
Question 18.8
Question 18.8
Don't ANSI function prototypes render lint obsolete?
Not really. First of all, prototypes work only if they are present and correct; an inadvertently incorrect
prototype is worse than useless. Secondly, lint checks consistency across multiple source files, and
checks data declarations as well as functions. Finally, an independent program like lint will probably
always be more scrupulous at enforcing compatible, portable coding practices than will any particular,
implementation-specific, feature- and extension-laden compiler.
If you do want to use function prototypes instead of lint for cross-file consistency checking, make sure
that you set the prototypes up correctly in header files. See questions 1.7 and 10.6.
top
Question 11.17
Question 11.17
I'm trying to use the ANSI ``stringizing'' preprocessing operator `#' to insert the value of a symbolic
constant into a message, but it keeps stringizing the macro's name rather than its value.
You can use something like the following two-step procedure to force a macro to be expanded as well as
stringized:
#define Str(x) #x
#define Xstr(x) Str(x)
#define OP plus
char *opname = Xstr(OP);
This code sets opname to "plus" rather than "OP".
An equivalent circumlocution is necessary with the token-pasting operator ## when the values (rather
than the names) of two macros are to be concatenated.
References: ANSI Sec. 3.8.3.2, Sec. 3.8.3.5 example
ISO Sec. 6.8.3.2, Sec. 6.8.3.5
Question 11.18
Question 11.18
What does the message ``warning: macro replacement within a string literal'' mean?
Question 11.22
Question 11.22
Is char a[3] = "abc"; legal? What does it mean?
It is legal in ANSI C (and perhaps in a few pre-ANSI systems), though useful only in rare circumstances.
It declares an array of size three, initialized with the three characters 'a', 'b', and 'c', without the
usual terminating '\0' character. The array is therefore not a true C string and cannot be used with
strcpy, printf %s, etc.
Most of the time, you should let the compiler count the initializers when initializing arrays (in the case of
the initializer "abc", of course, the computed size will be 4).
References: ANSI Sec. 3.5.7
ISO Sec. 6.5.7
H&S Sec. 4.6.4 p. 98
Question 11.24
Question 11.24
Why can't I perform arithmetic on a void * pointer?
The compiler doesn't know the size of the pointed-to objects. Before performing arithmetic, convert the
pointer either to char * or to the pointer type you're trying to manipulate (but see also question 4.5).
References: ANSI Sec. 3.1.2.5, Sec. 3.3.6
ISO Sec. 6.1.2.5, Sec. 6.3.6
H&S Sec. 7.6.2 p. 204
Question 11.25
Question 11.25
What's the difference between memcpy and memmove?
memmove offers guaranteed behavior if the source and destination arguments overlap. memcpy makes
no such guarantee, and may therefore be more efficiently implementable. When in doubt, it's safer to use
memmove.
References: K&R2 Sec. B3 p. 250
ANSI Sec. 4.11.2.1, Sec. 4.11.2.2
ISO Sec. 7.11.2.1, Sec. 7.11.2.2
Rationale Sec. 4.11.2
H&S Sec. 14.3 pp. 341-2
PCS Sec. 11 pp. 165-6
Question 11.26
Question 11.26
What should malloc(0) do? Return a null pointer or a pointer to 0 bytes?
The ANSI/ISO Standard says that it may do either; the behavior is implementation-defined (see question
11.33).
References: ANSI Sec. 4.10.3
ISO Sec. 7.10.3
PCS Sec. 16.1 p. 386
Question 11.27
Question 11.27
Why does the ANSI Standard not guarantee more than six case-insensitive characters of external
identifier significance?
The problem is older linkers which are under the control of neither the ANSI/ISO Standard nor the C
compiler developers on the systems which have them. The limitation is only that identifiers be significant
in the first six characters, not that they be restricted to six characters in length. This limitation is
annoying, but certainly not unbearable, and is marked in the Standard as ``obsolescent,'' i.e. a future
revision will likely relax it.
This concession to current, restrictive linkers really had to be made, no matter how vehemently some
people oppose it. (The Rationale notes that its retention was ``most painful.'') If you disagree, or have
thought of a trick by which a compiler burdened with a restrictive linker could present the C programmer
with the appearance of more significance in external identifiers, read the excellently-worded section 3.1.2
in the X3.159 Rationale (see question 11.1), which discusses several such schemes and explains why
they could not be mandated.
References: ANSI Sec. 3.1.2, Sec. 3.9.1
ISO Sec. 6.1.2, Sec. 6.9.1
Rationale Sec. 3.1.2
H&S Sec. 2.5 pp. 22-3
Question 18.15a
Question 18.15a
Does anyone have a C compiler test suite I can use?
Plum Hall (formerly in Cardiff, NJ; now in Hawaii) sells one; another package is Ronald Guilmette's
RoadTest(tm) Compiler Test Suites (ftp to netcom.com, pub/rfg/roadtest/announce.txt for information).
The FSF's GNU C (gcc) distribution includes a c-torture-test which checks a number of common
problems with compilers. Kahan's paranoia test, found in netlib/paranoia on netlib.att.com, strenuously
tests a C implementation's floating point capabilities.
Strange Problems
top
top
Question 9.1
Question 9.1
What is the right type to use for Boolean values in C? Why isn't it a standard type? Should I use
#defines or enums for the true and false values?
C does not provide a standard Boolean type, in part because picking one involves a space/time tradeoff
which can best be decided by the programmer. (Using an int may be faster, while using char may
save data space. Smaller types may make the generated code bigger or slower, though, if they require lots
of conversions to and from int.)
The choice between #defines and enumeration constants for the true/false values is arbitrary and not
terribly interesting (see also questions 2.22 and 17.10). Use any of
#define TRUE 1 #define YES 1
#define FALSE 0 #define NO 0
enum bool {false, true};
or use raw 1 and 0, as long as you are consistent within one program
or project. (An enumeration may be preferable if your debugger shows
the names of enumeration constants when examining variables.)
Some people prefer variants like
#define TRUE (1==1)
#define FALSE (!TRUE)
or define ``helper'' macros such as
#define Istrue(e) ((e) != 0)
These don't buy anything (see question 9.2; see also questions 5.12 and 10.2).
Question 9.1
Question 5.12
Question 5.12
I use the preprocessor macro
#define Nullptr(type) (type *)0
to help me build null pointers of the correct type.
This trick, though popular and superficially attractive, does not buy much. It is not needed in assignments
and comparisons; see question 5.2. It does not even save keystrokes. Its use may suggest to the reader
that the program's author is shaky on the subject of null pointers, requiring that the #definition of the
macro, its invocations, and all other pointer usages be checked. See also questions 9.1 and 10.2.
Question 10.2
Question 10.2
Here are some cute preprocessor macros:
#define begin
#define end
{
}
Style
17. Style
17.1 What's the best style for code layout in C?
17.3 Is the code "if(!strcmp(s1, s2))" good style?
17.4 Why do some people write if(0 == x) instead of if(x == 0)?
17.5 I came across some code that puts a (void) cast before each call to printf. Why?
17.8 What is Hungarian Notation''? Is it worthwhile?
17.9 Where can I get the ``Indian Hill Style Guide'' and other coding standards?
17.10 Some people say that goto's are evil and that I should never use them. Isn't that a bit extreme?
top
Question 9.3
Question 9.3
Is if(p), where p is a pointer, a valid conditional?
top
Question 10.3
Question 10.3
How can I write a generic macro to swap two values?
There is no good answer to this question. If the values are integers, a well-known trick using exclusiveOR could perhaps be used, but it will not work for floating-point values or pointers, or if the two values
are the same variable (and the ``obvious'' supercompressed implementation for integral types
a^=b^=a^=b is illegal due to multiple side-effects; see question 3.2). If the macro is intended to be
used on values of arbitrary type (the usual goal), it cannot use a temporary, since it does not know what
type of temporary it needs (and would have a hard time naming it if it did), and standard C does not
provide a typeof operator.
The best all-around solution is probably to forget about using a macro, unless you're willing to pass in the
type as a third argument.
Question 10.4
Question 10.4
What's the best way to write a multi-statement macro?
The usual goal is to write a macro that can be invoked as if it were a statement consisting of a single
function call. This means that the ``caller'' will be supplying the final semicolon, so the macro body
should not. The macro body cannot therefore be a simple brace-enclosed compound statement, because
syntax errors would result if it were invoked (apparently as a single statement, but with a resultant extra
semicolon) as the if branch of an if/else statement with an explicit else clause.
The traditional solution, therefore, is to use
#define MACRO(arg1, arg2) do { \
/* declarations */
\
stmt1;
\
stmt2;
\
/* ... */
\
} while(0)
/* (no trailing ; ) */
When the caller appends a semicolon, this expansion becomes a single statement regardless of context.
(An optimizing compiler will remove any ``dead'' tests or branches on the constant condition 0, although
lint may complain.)
If all of the statements in the intended macro are simple expressions, with no declarations or loops,
another technique is to write a single, parenthesized expression using one or more comma operators. (For
an example, see the first DEBUG() macro in question 10.26.) This technique also allows a value to be
``returned.''
References: H&S Sec. 3.3.2 p. 45
CT&P Sec. 6.3 pp. 82-3
C Preprocessor
10. C Preprocessor
10.2 I've got some cute preprocessor macros that let me write C code that looks more like Pascal. What
do y'all think?
10.3 How can I write a generic macro to swap two values?
10.4 What's the best way to write a multi-statement macro?
10.6 What are .h files and what should I put in them?
10.7 Is it acceptable for one header file to #include another?
10.8 Where are header (``#include'') files searched for?
10.9 I'm getting strange syntax errors on the very first declaration in a file, but it looks fine.
10.11 Where can I get a copy of a missing header file?
10.12 How can I construct preprocessor #if expressions which compare strings?
10.13 Does the sizeof operator work in preprocessor #if directives?
10.14 Can I use an #ifdef in a #define line, to define something two different ways?
10.15 Is there anything like an #ifdef for typedefs?
10.16 How can I use a preprocessor #if expression to detect endianness?
10.18 How can I preprocess some code to remove selected conditional compilations, without
preprocessing everything?
10.19 How can I list all of the pre#defined identifiers?
10.20 I have some old code that tries to construct identifiers with a macro like "#define Paste(a,
b) a/**/b", but it doesn't work any more.
C Preprocessor
10.22
What does the message ``warning: macro replacement within a string literal'' mean?
10.23 How can I use a macro argument inside a string literal in the macro expansion?
10.25 I've got this tricky preprocessing I want to do and I can't figure out a way to do it.
10.26 How can I write a macro which takes a variable number of arguments?
top
Question 10.13
Question 10.13
Does the sizeof operator work in preprocessor #if directives?
No. Preprocessing happens during an earlier phase of compilation, before type names have been parsed.
Instead of sizeof, consider using the predefined constants in ANSI's <limits.h>, if applicable, or
perhaps a ``configure'' script. (Better yet, try to write code which is inherently insensitive to type sizes.)
References: ANSI Sec. 2.1.1.2, Sec. 3.8.1 footnote 83
ISO Sec. 5.1.1.2, Sec. 6.8.1
H&S Sec. 7.11.1 p. 225
Question 10.14
Question 10.14
Can I use an #ifdef in a #define line, to define something two different ways?
No. You can't ``run the preprocessor on itself,'' so to speak. What you can do is use one of two
completely separate #define lines, depending on the #ifdef setting.
References: ANSI Sec. 3.8.3, Sec. 3.8.3.4
ISO Sec. 6.8.3, Sec. 6.8.3.4
H&S Sec. 3.2 pp. 40-1
Question 10.15
Question 10.15
Is there anything like an #ifdef for typedefs?
Question 10.19
Question 10.19
How can I list all of the pre#defined identifiers?
There's no standard way, although it is a common need. If the compiler documentation is unhelpful, the
most expedient way is probably to extract printable strings from the compiler or preprocessor executable
with something like the Unix strings utility. Beware that many traditional system-specific
pre#defined identifiers (e.g. ``unix'') are non-Standard (because they clash with the user's
namespace) and are being removed or renamed.
Question 10.20
Question 10.20
I have some old code that tries to construct identifiers with a macro like
#define Paste(a, b) a/**/b
but it doesn't work any more.
It was an undocumented feature of some early preprocessor implementations (notably John Reiser's) that
comments disappeared entirely and could therefore be used for token pasting. ANSI affirms (as did
K&R1) that comments are replaced with white space. However, since the need for pasting tokens was
demonstrated and real, ANSI introduced a well-defined token-pasting operator, ##, which can be used
like this:
#define Paste(a, b) a##b
See also question 11.17.
References: ANSI Sec. 3.8.3.3
ISO Sec. 6.8.3.3
Rationale Sec. 3.8.3.3
H&S Sec. 3.3.9 p. 52
Question 10.22
Question 10.22
Why is the macro
#define TRACE(n) printf("TRACE: %d\n", n)
giving me the warning ``macro replacement within a string literal''? It seems to be expanding
TRACE(count);
as
printf("TRACE: %d\count", count);
Question 10.23
Question 10.23
How can I use a macro argument inside a string literal in the macro expansion?
Question 10.25
Question 10.25
I've got this tricky preprocessing I want to do and I can't figure out a way to do it.
C's preprocessor is not intended as a general-purpose tool. (Note also that it is not guaranteed to be
available as a separate program.) Rather than forcing it to do something inappropriate, consider writing
your own little special-purpose preprocessing tool, instead. You can easily get a utility like make(1) to
run it for you automatically.
If you are trying to preprocess something other than C, consider using a general-purpose preprocessor.
(One older one available on most Unix systems is m4.)
Question 5.13
Question 5.13
This is strange. NULL is guaranteed to be 0, but the null pointer is not?
When the term ``null'' or ``NULL'' is casually used, one of several things may be meant:
1. 1. The conceptual null pointer, the abstract language concept defined in question 5.1. It is
implemented with...
2. 2. The internal (or run-time) representation of a null pointer, which may or may not be all-bits-0
and which may be different for different pointer types. The actual values should be of concern
only to compiler writers. Authors of C programs never see them, since they use...
3. 3. The null pointer constant, which is a constant integer 0 (see question 5.2). It is often hidden
behind...
4. 4. The NULL macro, which is #defined to be 0 or ((void *)0) (see question 5.4). Finally,
as red herrings, we have...
5. 5. The ASCII null character (NUL), which does have all bits zero, but has no necessary relation to
the null pointer except in name; and...
6. 6. The ``null string,'' which is another name for the empty string (""). Using the term ``null
string'' can be confusing in C, because an empty string involves a null ('\0') character, but not a
null pointer, which brings us full circle...
This article uses the phrase ``null pointer'' (in lower case) for sense 1, the character ``0'' or the phrase
``null pointer constant'' for sense 3, and the capitalized word ``NULL'' for sense 4.
Question 5.14
Question 5.14
Why is there so much confusion surrounding null pointers? Why do these questions come up so often?
C programmers traditionally like to know more than they need to about the underlying machine
implementation. The fact that null pointers are represented both in source code, and internally to most
machines, as zero invites unwarranted assumptions. The use of a preprocessor macro (NULL) may seem
to suggest that the value could change some day, or on some weird machine. The construct ``if(p ==
0)'' is easily misread as calling for conversion of p to an integral type, rather than 0 to a pointer type,
before the comparison. Finally, the distinction between the several uses of the term ``null'' (listed in
question 5.13) is often overlooked.
One good way to wade out of the confusion is to imagine that C used a keyword (perhaps nil, like
Pascal) as a null pointer constant. The compiler could either turn nil into the correct type of null pointer
when it could determine the type from the source code, or complain when it could not. Now in fact, in C
the keyword for a null pointer constant is not nil but 0, which works almost as well, except that an
uncast 0 in a non-pointer context generates an integer zero instead of an error message, and if that uncast
0 was supposed to be a null pointer constant, the code may not work.
Question 5.15
Question 5.15
I'm confused. I just can't understand all this null pointer stuff.
Question 20.11
Question 20.11
Can I use base-2 constants (something like 0b101010)?
Is there a printf format for binary?
No, on both counts. You can convert base-2 string representations to integers with strtol.
Question 20.12
Question 20.12
What is the most efficient way to count the number of bits which are set in a value?
Many ``bit-fiddling'' problems like this one can be sped up and streamlined using lookup tables (but see
question 20.13).
Question 20.13
Question 20.13
How can I make my code more efficient?
Efficiency, though a favorite comp.lang.c topic, is not important nearly as often as people tend to think it
is. Most of the code in most programs is not time-critical. When code is not time-critical, it is far more
important that it be written clearly and portably than that it be written maximally efficiently. (Remember
that computers are very, very fast, and that even ``inefficient'' code can run without apparent delay.)
It is notoriously difficult to predict what the ``hot spots'' in a program will be. When efficiency is a
concern, it is important to use profiling software to determine which parts of the program deserve
attention. Often, actual computation time is swamped by peripheral tasks such as I/O and memory
allocation, which can be sped up by using buffering and caching techniques.
Even for code that is time-critical, it is not as important to ``microoptimize'' the coding details. Many of
the ``efficient coding tricks'' which are frequently suggested (e.g. substituting shift operators for
multiplication by powers of two) are performed automatically by even simpleminded compilers.
Heavyhanded optimization attempts can make code so bulky that performance is actually degraded, and
are rarely portable (i.e. they may speed things up on one machine but slow them down on another). In
any case, tweaking the coding usually results in at best linear performance improvements; the big payoffs
are in better algorithms.
For more discussion of efficiency tradeoffs, as well as good advice on how to improve efficiency when it
is important, see chapter 7 of Kernighan and Plauger's The Elements of Programming Style, and Jon
Bentley's Writing Efficient Programs.
Question 20.14
Question 20.14
Are pointers really faster than arrays? How much do function calls slow things down? Is ++i faster than
i = i + 1?
Precise answers to these and many similar questions depend of course on the processor and compiler in
use. If you simply must know, you'll have to time test programs carefully. (Often the differences are so
slight that hundreds of thousands of iterations are required even to see them. Check the compiler's
assembly language output, if available, to see if two purported alternatives aren't compiled identically.)
It is ``usually'' faster to march through large arrays with pointers rather than array subscripts, but for
some processors the reverse is true.
Function calls, though obviously incrementally slower than in-line code, contribute so much to
modularity and code clarity that there is rarely good reason to avoid them.
Before rearranging expressions such as i = i + 1, remember that you are dealing with a compiler,
not a keystroke-programmable calculator. Any decent compiler will generate identical code for ++i, i
+= 1, and i = i + 1. The reasons for using ++i or i += 1 over i = i + 1 have to do with
style, not efficiency. (See also question 3.12.)
Question 20.24
Question 20.24
Why doesn't C have nested functions?
It's not trivial to implement nested functions such that they have the proper access to local variables in
the containing function(s), so they were deliberately left out of C as a simplification. (gcc does allow
them, as an extension.) For many potential uses of nested functions (e.g. qsort comparison functions),
an adequate if slightly cumbersome solution is to use an adjacent function with static declaration,
communicating if necessary via a few static variables. (A cleaner solution when such functions must
communicate is to pass around a pointer to a structure containing the necessary context.)
Question 20.25
Question 20.25
How can I call FORTRAN (C++, BASIC, Pascal, Ada, LISP) functions from C? (And vice versa?)
The answer is entirely dependent on the machine and the specific calling sequences of the various
compilers in use, and may not be possible at all. Read your compiler documentation very carefully;
sometimes there is a ``mixed-language programming guide,'' although the techniques for passing
arguments and ensuring correct run-time startup are often arcane. More information may be found in
FORT.gz by Glenn Geers, available via anonymous ftp from suphys.physics.su.oz.au in the src directory.
cfortran.h, a C header file, simplifies C/FORTRAN interfacing on many popular machines. It is available
via anonymous ftp from zebra.desy.de (131.169.2.244).
In C++, a "C" modifier in an external function declaration indicates that the function is to be called
using C calling conventions.
References: H&S Sec. 4.9.8 pp. 106-7
Question 20.26
Question 20.26
Does anyone know of a program for converting Pascal or FORTRAN (or LISP, Ada, awk, ``Old'' C, ...)
to C?
This FAQ list's maintainer also has available a list of a few other commercial translation products, and
some for more obscure languages.
See also questions 11.31 and 18.16.
Question 20.27
Question 20.27
Is C++ a superset of C? Can I use a C++ compiler to compile C code?
C++ was derived from C, and is largely based on it, but there are some legal C constructs which are not
legal C++. Conversely, ANSI C inherited several features from C++, including prototypes and const,
so neither language is really a subset or superset of the other. In spite of the differences, many C
programs will compile correctly in a C++ environment, and many recent compilers offer both C and C++
compilation modes.
References: H&S p. xviii, Sec. 1.1.5 p. 6, Sec. 2.8 pp. 36-7, Sec. 4.9 pp. 104-107
Question 20.28
Question 20.28
I need a sort of an ``approximate'' strcmp routine, for comparing two strings for close, but not necessarily
exact, equality.
Some nice information and algorithms having to do with approximate string matching, as well as a useful
bibliography, can be found in Sun Wu and Udi Manber's paper ``AGREP--A Fast Approximate PatternMatching Tool.''
Another approach involves the ``soundex'' algorithm, which maps similar-sounding words to the same
codes. Soundex was designed for discovering similar-sounding names (for telephone directory
assistance, as it happens), but it can be pressed into service for processing arbitrary words.
References: Knuth Sec. 6 pp. 391-2 Volume 3
Wu and Manber, ``AGREP--A Fast Approximate Pattern-Matching Tool''
Question 20.29
Question 20.29
What is hashing?
Hashing is the process of mapping strings to integers, usually in a relatively small range. A ``hash
function'' maps a string (or some other data structure) to a a bounded number (the ``hash bucket'') which
can more easily be used as an index in an array, or for performing repeated comparisons. (Obviously, a
mapping from a potentially huge set of strings to a small set of integers will not be unique. Any
algorithm using hashing therefore has to deal with the possibility of ``collisions.'') Many hashing
functions and related algorithms have been developed; a full treatment is beyond the scope of this list.
References: K&R2 Sec. 6.6
Knuth Sec. 6.4 pp. 506-549 Volume 3
Sedgewick Sec. 16 pp. 231-244
Question 20.38
Question 20.38
Where does the name ``C'' come from, anyway?
C was derived from Ken Thompson's experimental language B, which was inspired by Martin Richards's
BCPL (Basic Combined Programming Language), which was a simplification of CPL (Cambridge
Programming Language). For a while, there was speculation that C's successor might be named P (the
third letter in BCPL) instead of D, but of course the most visible descendant language today is C++.
Question 20.39
Question 20.39
How do you pronounce ``char''?
You can pronounce the C keyword ``char'' in at least three ways: like the English words ``char,'' ``care,''
or ``car;'' the choice is arbitrary.
Question 13.15
Question 13.15
I need a random number generator.
The Standard C library has one: rand. The implementation on your system may not be perfect, but
writing a better one isn't necessarily easy, either.
If you do find yourself needing to implement your own random number generator, there is plenty of
literature out there; see the References. There are also any number of packages on the net: look for r250,
RANLIB, and FSULTRA (see question 18.16).
References: K&R2 Sec. 2.7 p. 46, Sec. 7.8.7 p. 168
ANSI Sec. 4.10.2.1
ISO Sec. 7.10.2.1
H&S Sec. 17.7 p. 393
PCS Sec. 11 p. 172
Knuth Vol. 2 Chap. 3 pp. 1-177
Park and Miller, ``Random Number Generators: Good Ones are hard to Find''
Question 13.16
Question 13.16
How can I get random integers in a certain range?
/* POOR */
(which tries to return numbers from 0 to N-1) is poor, because the low-order bits of many random
number generators are distressingly non-random. (See question 13.18.) A better method is something like
(int)((double)rand() / ((double)RAND_MAX + 1) * N)
If you're worried about using floating point, you could use
rand() / (RAND_MAX / N + 1)
Both methods obviously require knowing RAND_MAX (which ANSI #defines in <stdlib.h>), and
assume that N is much less than RAND_MAX.
(Note, by the way, that RAND_MAX is a constant telling you what the fixed range of the C library rand
function is. You cannot set RAND_MAX to some other value, and there is no way of requesting that rand
return numbers in some other range.)
If you're starting with a random number generator which returns floating-point values between 0 and 1,
all you have to do to get integers from 0 to N-1 is multiply the output of that generator by N.
References: K&R2 Sec. 7.8.7 p. 168
PCS Sec. 11 p. 172
Question 13.18
Question 13.18
I need a random true/false value, so I'm just taking rand() % 2, but it's alternating 0, 1, 0, 1, 0...
Poor pseudorandom number generators (such as the ones unfortunately supplied with some systems) are
not very random in the low-order bits. Try using the higher-order bits: see question 13.16.
References: Knuth Sec. 3.2.1.1 pp. 12-14
Question 13.17
Question 13.17
Each time I run my program, I get the same sequence of numbers back from rand().
You can call srand to seed the pseudo-random number generator with a truly random initial value.
Popular seed values are the time of day, or the elapsed time before the user presses a key (although
keypress times are hard to determine portably; see question 19.37). (Note also that it's rarely useful to
call srand more than once during a run of a program; in particular, don't try calling srand before each
call to rand, in an attempt to get ``really random'' numbers.)
References: K&R2 Sec. 7.8.7 p. 168
ANSI Sec. 4.10.2.2
ISO Sec. 7.10.2.2
H&S Sec. 17.7 p. 393
Question 13.20
Question 13.20
How can I generate random numbers with a normal or Gaussian distribution?
Question 13.20
Question 13.24
Question 13.24
I'm trying to port this old program. Why do I get ``undefined external'' errors for some library functions?
Some old or semistandard functions have been renamed or replaced over the years;
if you need:/you should instead:
index
use strchr.
rindex
use strrchr.
bcopy
use memmove, after interchanging the first and second arguments (see also question 11.25).
bcmp
use memcmp.
bzero
use memset, with a second argument of 0.
Contrariwise, if you're using an older system which is missing the functions in the second column, you
may be able to implement them in terms of, or substitute, the functions in the first.
References: PCS Sec. 11
Question 12.34
Question 12.34
Once I've used freopen, how can I get the original stdout (or stdin) back?
There isn't a good way. If you need to switch back, the best solution is not to have used freopen in the
first place. Try using your own explicit output (or input) stream variable, which you can reassign at will,
while leaving the original stdout (or stdin) undisturbed.
Question 12.33
Question 12.33
How can I redirect stdin or stdout to a file from within a program?
undefined behavior
[This article was originally posted on July 28, 1996, in the midst of a thread on precedence and order of evaluation. I have
edited the text slightly for this web page.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: Re: precedence vs. order of evaluation
Message-ID: <Dv9v5H.EGu@eskimo.com>
X-scs-References: <199607252021.NAA15633@mail.eskimo.com>
References: <01bb798a$11913f80$87ee6fce@timpent.airshields.com>
<1996Jul25.1157.scs.0001@eskimo.com> <kluev-2607962112500001@kluev.macsimum.gamma.ru>
Date: Sun, 28 Jul 1996 21:17:40 GMT
In article <kluev-2607962112500001@kluev.macsimum.gamma.ru>, kluev@macsimum.gamma.ru (Michael Kluev) writes:
>
>
>
>
I always have a hard time with questions like these. Sometimes, they hardly make sense: after all, the language is what it is,
and we simply have to...
> Sure, I know the answer: "this is C standard" or "this is how
> language is defined", but I'm not looking for such an answer.
Oh. Then I guess I'm not allowed to use that answer, then.
> What I am looking for is the answer of the following question:
> "Why the language was defined such a way".
The usual answer is that it's to avoid unnecessarily constraining the compiler's ability to...
> If your answer is a form of: "This way compilers could do the better
> job of optimising expressions", then remember, that compiler must
> optimise not only expressions, but statements also.
Oh. So I guess I'm not allowed to use that answer, either. (Now it seems as if you're really constraining me!) So I'm afraid the
only truly factual answer I'm left with is "I don't know."
I posted a long article a year or two ago exploring some reasons why C might leave certain aspects of evaluation order
undefined. I'd re-post it now, but it would take me too long to find it in my magtape archives, and anyway the oxide is starting
to flake off (I'm not sure if it's the fault of the drive or the tapes), and I'm thinking that the next time I spin those tapes should
really be to transfer them to some new media, which I haven't selected yet. [I did eventually recover a copy.]
Instead, I'll explore a couple of different reasons. Bear in mind that these are only my own speculations, so they absolutely will
not answer your question. If you simply must know why C is defined the way it is on this point, you'll have to ask Dennis
Ritchie. The speculations I'll give you are some of the ones which might now lead me to design a language the same way, but I
freely admit that my thinking on language design has been very heavily influenced by C, which I somehow find pretty
congenial and easy to get along with.
undefined behavior
A language specification is a huge set of tradeoffs. No, don't worry, I won't talk about the forbidden tradeoff between the
programmer's freedom of expression and the compiler's license to optimize. The tradeoff I'm invoking now is one of
documentation: how much time and how many words can we afford to spend defining the language? As in so many other
areas, the law of diminishing returns sets in here, too. We have to ask ourselves whether an attempt to nail down some aspect
of the language more tightly is worth it in terms of the number of programs (or programmers) that absolutely need the extra
precision.
A language specification (like any detailed specification of a complex system) is also staggeringly difficult to write. The harder
you try, the more questions you end up inviting from devious folk who take your previous round of ever-more-detailed
specifications as a grand puzzle, the challenge being to find some loophole or ambiguous case or fascinating question which
remains unanswerable in the system as constructed.
Therefore, the standard makes certain simplifications. It makes these not just to make compilers easier to write, not just to
make the standard easier to write, but to make it easier for the rest of us to read, to wrap our brains around it and understand it.
The more exceptions it contains, or overly complex explanations of overly complex devices inserted just to placate the
nitpickers and puzzlemongers, the more likely it is to be unreadable by mere morals, or unimplementable by mere mortals, and
so ultimately to fail.
A largely forgotten aspect of the "Unix philosophy," and an aspect which like many others is equally responsible for the design
of C as Unix, and an aspect which is responsible both for the success of the operating system and the language and for the
heaping truckloads of acrimonious criticism which both incessantly receive, is that neither system was ever intended to satisfy
everybody. The designers were shooting for about a 90% solution, and were unapologetically willing to call that "good
enough." They knew, if they tried to satisfy everybody, that the first 90% of the requirements would take 90% of their time,
and that 9% of the remaining requirements would take the other 90% of their time, and that 0.9% of the remaining
requirements would take another 90% of their time, and so on [footnote 1]. Discretion being the better part of valor, they
decided -- with remarkable restraint, which I for one could never manage -- to nip that infinite regression in the bud. And in
one of X3J11's admirable successes at preserving the "spirit of C," that minimalist attitude largely pervades the C standard, as
well.
Returning to the subject of expression evaluation, the simplification of interest to us here is, in a nutshell, that how you specify
the order that things happen in is with statements, and how you compute values where the order doesn't matter much is with
expressions. If you care about the order, you need separate statements. (There were always a few exceptions to this simplistic
rule, of course; ANSI added a few more). The modern, more precise statement is that if you need to be sure that side effects
have taken effect, you need to have a sequence point, but to keep everybody's life simple, there are still relatively few defined
sequence points. If you have a complex expression with complicated sequencing requirements, you may simply have to break
it up into several statements, and maybe use a temporary variable. That was true in K&R (I've already quoted the relevant
sentences from section 2.12), and it's true today. It's a simple rule to state, it's a simple rule to implement, and at least 90% of
expressions (and in fact far more, probably closer to 99.9%) can in fact be written as single, simple expressions, because they
are simple and don't have complicated sequencing requirements.
I honestly believe that the existing rules are good enough, and that the excruciating discussions which we have about the issue
here from time to time tend to overstate its importance. I probably break an expression up into two statements to keep its
sequence correct about twice a year [footnote 2]. The vast majority of the time, you don't have to worry about the order of
multiple side effects (most of the time, because there aren't multiple side effects), and when you do, I claim (though I realize
how patronizing this sounds to the people I most wish would think about it) that the expression is probably too complicated
and that it should be probably be simplified if for no other reason than so that people could understand it, and incidentally so
that it would be well-defined to the compiler.
In closing, let me offer another way of thinking about order of evaluation, which I came up with a day or two ago and refined
during a brief e-mail exchange with James Robinson. As I mentioned, compilers tend to build parse trees, and precedence has
some influence on the shape of parse trees which has some influence on the order of evaluation. What's a good way of thinking
http://www.eskimo.com/~scs/readings/precvsooe.960728.html (2 of 4) [22/07/2003 5:53:04 PM]
undefined behavior
about the ways in which the shape of a parse tree does and doesn't necessarily affect order of evaluation?
If it weren't for side effects -- assignment operators, ++, --, function calls, and (for those of you writing device drivers)
fetches from volatile locations -- [footnote 3], the entire meaning of a parse tree would be the computation of a single value.
Each node computes a new value from the values of its children, so values percolate up the tree. We have to begin evaluation
at the bottom, of course, and we can therefore say that there's some ordering of the evaluation from bottom to top, but we don't
care about the relative ordering of parallel branches of the tree; in fact for all we care they could be evaluated in parallel. The
reason we don't care is precisely that (for the moment) we are thinking about pure expression evaluation, without side effects.
If, as I claim here, the primary purpose of a parse tree is to generate the single value that pops out of the top of it, then a good
way to think about an expression (which is the basis for a parse tree) is that its primary purpose is to generate a single value,
too. We should think about side effects as in some sense secondary, because it turns out that the Standard (and, hence, the
compiler) accords them a good deal less respect, at least with respect to their scheduling. We should imagine that the compiler
goes about the business of evaluating an expression by working its way through the parse tree, applying no more ordering
constraints than the blatantly obvious one that you can't (in general) evaluate a node until you've evaluated its children. We
should imagine that, whenever the compiler encounters a node within the tree mandating a side effect, which would require it
to store some value in some location, it makes a little note to itself: "I must remember to write that value to that location,
sometime", and then goes back to its true love, which is evaluating away on that parse tree. Finally, we should imagine that
when the compiler reaches a point which the Standard labels as a sequence point, the compiler says to itself: "Oh, well, I guess
I can't play all day, now I'll have to get down to business and attack that `to do' list."
Of course, I'm not saying that you have to think about the situation in this way. But if you're looking for a model which will let
you think about expression evaluation in a way that matches the Standard's, I think this is a pretty good one. Even though we
only evaluate expressions for their side effects (that is, the statement
i + 1;
does nothing), the right way to think about expression evaluation is that we are, after all, evaluating an expression, or figuring
out what its value is. Its one value, singular. The only ordering dependencies are those which must apply in order to ensure that
we compute the correct value. If there are any intermediate values that we care about, because we expect them to be stashed
away via side effects, we must not care what order they occur in. Therefore, if there are multiple side effects, all of them had
better write to different locations. Also (again because we can't be sure when they'll happen) none of them had better write to
locations which we might later try to read from within the same expression.
This may have seemed like a roundabout set of explanations, and I'm sure it's still unsatisfying to those who insist on knowing
why C is as it is. To summarize the arguments I've tried to present here, C specifies expression evaluation as it does because it's
simple and good enough for most purposes while still allowing (taboo answer alert!) for decently optimized code generation,
and it does not provide more guarantees about expression evaluation because few expressions would need them.
Steve Summit
scs@eskimo.com
Footnote 1. Melanie has dubbed this "Zeno's 90/90 rule."
Footnote 2. The number of times I've had to break expressions up into separate statements decreased by about 2/3 when I
realized that the compiler does not have (and has probably never had) license to rearrange
int i, isc, fac1, fac2;
...
isc = (long)i * fac1 / fac2;
undefined behavior
undefined behavior
[This article was originally posted on March 21, 1995, in the midst of a thread on undefined behavior. I have edited
the text slightly for this web page.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: Re: Quick C test - whats the correct answer???
Message-ID: <D5swGz.4Cv@eskimo.com>
Summary: why is undefined behavior undefined?
References: <fjm.58.000B63C1@ti.com> <D5JLGy.A63@tigadmin.ml.com>
Date: Tue, 21 Mar 1995 17:26:59 GMT
In article <D5JLGy.A63@tigadmin.ml.com>, Jim Frohnhofer (jimf@nottingham.ml.com) writes:
> In article 000B63C1@ti.com, fjm@ti.com (Fred J. McCall) writes:
>> That's right, and that's why you should listen to him (and to me, and to
>> everyone else telling you this). The REASON it is undefined is BECAUSE THE
>> STANDARD SAYS SO.
>
> I may regret this, but I'll jump in anyway. I accept that the behaviour
> of such a construct is undefined simply because the Standard says so. But
> isn't it legitimate for me to ask why the Standard leaves it undefined?
It is, but you should probably be careful how you ask it. If you're not careful (you were careful, but most people
aren't), it ends up sounding like you don't believe that something is undefined, or that you believe -- and your
compiler happens to agree -- that it has a sensible meaning. But as long as we're very clear in our heads that we're
asking the abstract, intellectual question "Why are these things undefined?", and not anything at all practical like
"How will these things behave in practice?", we may proceed.
The first few answers I give won't be the ones you're looking for, but towards the end I may present one which you'll
find more satisfying.
First of all, you might want to take a look at who's saying what. I'll grant that statements such as Fred's above can be
annoying. I agree completely that it's usually very good to know the reasons behind things. But if the people who
have been posting to comp.lang.c the longest, and who have been programming in C for the longest and with the
most success, keep saying "it's undefined, it's undefined, quit worrying about why, just don't do it, it's undefined",
they might have a good reason, even if they don't -- or can't -- say what is, and that might be a good enough reason
for you, too.
I've been programming in C for, oh, 15 years now. For at least 10 of those years, I've been regarded as somewhat of
an expert. For going on 5 of those years, I've been maintaining the comp.lang.c FAQ list. I am someone who is
usually incapable of learning abstract facts unless I understand the reasons behind them. When I used to study for
math tests, I wouldn't memorize formulas, I'd just make sure that I could rederive them if I needed them. Yet for
most of the 15 years I've been programming in C, I simply could not have told you why i=i++ is undefined. It's an
ugly expression; it's meaningless to me; I don't know what it does; I don't want to know what it does; I'd never write
it; I don't understand why anyone would write it; it's a mystery to me why anyone cares what it does; if I ever
encountered it in code I was maintaining, I'd rewrite it. When I was learning C from K&R1 in 1980 or whenever it
was, one of their nice little sentences, which they only say once, and which if you miss you're sunk, leaped up off
http://www.eskimo.com/~scs/readings/undef.950321.html (1 of 6) [22/07/2003 5:53:11 PM]
undefined behavior
the page and wrapped itself around my brain and has never let go:
Naturally, it is necessary to know what things to avoid, but if you don't know how they are done on
various machines, that innocence may help to protect you.
As it happens, it's possible to read K&R too carefully here: the discussion at the end of chapter 2 about a[i] =
i++ suggests that it's implementation-defined, and the word "unspecified" appears in K&R2, while ANSI says that
the behavior is undefined. The sentence I've quoted above suggests not knowing how things are done on various
machines, while in fact what we really want to know is that maybe they can't be done on various machines.
Nevertheless, the message -- that a bit of innocence may do you good -- is in this case a good one.
That's my first answer. I realize that it's wholly unsatisfying to anyone who's still interested in this question. On to
answer number two.
> As far as I know, it was created by a committee not brought down by Moses
> from the mountain top. If I want to become a better C programmer, won't
> it help me to know why the committee acted as it did.
Perhaps, but again, only if you're very careful.
Let's say you're wondering why i++ * i++ is undefined. Someone explains that it's because no one felt like
defining which order the side effects happen in. That's a nice answer: it's a bit incomplete and perhaps a bit
incorrect, but it's certainly easy to understand, and since you insisted on an answer you could understand (as
opposed to something inscrutable like "it's just undefined; don't do it"), it's the kind of answer you're likely to get.
So next week, you find yourself writing something like
i = 7;
printf("%d\n", i++ * i++);
and you say to yourself, "Whoah, that might be undefined. How did that explanation go again? It's undefined...
because nobody felt like saying... which order the side effects happened in. So either the first i gets incremented
first, and it's 7 times 8 is 56, or the second i gets incremented first, and it's 8 times 7 is 56. Hey! It may be
undefined, but I get a predictable answer, so I can use the code after all!"
So in this case, knowing a reason why has not made you a better programmer, it has made you a worse programmer,
because some day when you're not around to defend it (or when you are around, but you don't have time to debug it),
that code is going to print 49, or maybe 42, or maybe "Floating point exception -- core dumped".
If, on the other hand, you didn't know why it was undefined, just that it was undefined, you would have instead
written
printf("%d\n", i * (i+1));
i += 2;
(or whatever it was that you meant), and your code would have been portable, robust, and well-defined.
http://www.eskimo.com/~scs/readings/undef.950321.html (2 of 6) [22/07/2003 5:53:11 PM]
undefined behavior
undefined behavior
person could be expected to use in 6,000 years might be relegated to the dustbin, instead. (Naturally, since we're
being precise, we'll have to precisely define the boundaries of the parts we've decided to be imprecise about; the
paraphrase above of Vroomfondel's demand from The Hitchhiker's Guide to the Galaxy is not facetious.)
In practice, you can only tell what a computer program has done when it causes a side effect. In a nutshell, then,
what a Standard for a programming language does is to define the mapping between the programs you write and the
side effects they produce. Consequently, we are very concerned with side effects: how they're expressed in our
programs, and how the Standard defines them to behave.
The fragment
int i;
i = 7;
printf("%d\n", i);
contains two obvious side effects, and it is blatantly obvious what their effect should be, and what the Standard says
agrees with what you think they should do.
The fragment
int a[10] = {0}, i = 1;
i = i++;
a[i] = i++;
printf("%d %d %d %d\n", i, a[1], a[2], a[3]);
contains a few more side effects, but (I assert) it is not obvious how they should behave. What does the second line
do? In the third line, do we decide which cell of a[] we assign to before or after we increment i again? Plenty of
people can probably come up with plenty of opinions of how this fragment ought to work, but they probably won't
all agree with each other, as the situation is not nearly so clear-cut.
Furthermore, a Standard obviously cannot talk about individual program fragments like i = i++ and a[i] =
i++, because there are an infinite number of those. It must make general prescriptions about how the various
primitive elements of the language behave, which we then use in combination to determine how real programs
behave. If you think you know what a[i] = i++ should do, you can't just say that; instead you have to come up
with a general statement which says how all expressions which modify and then use the same object should act, and
you have to convince yourself that this rule is appropriate not only for the a[i] = i++ case you thought about but
also for all the other expressions (infinitely many of them, remember) that you did not think about, and you have to
convince a bunch of other people of your conviction.
The people writing the C Standard decided that they could not do that. Instead, they decided that expressions such as
a[i] = i++ and i++ * i++ and i = i++ would be undefined. They decided this because these expressions
are too hard to define and too stupid to be worth defining. They came up with some definitions (no small feat in
itself) of which expressions this undefinedness applies to. You've seen these definitions; they're the ones that say
Between the previous and next sequence point an object shall have its stored value modified at most
once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to
undefined behavior
undefined behavior
contents of the incremented pointer". When we use words like "first" and "and then", we're talking about
order of evaluation, but order of evaluation can be a very slippery thing to talk about. Here, by talking
about it too soon, we'd confuse ourselves into getting the wrong answer, because as we'll see, although in
*p++ we do in fact take "the contents of the pointer as operated on by the postfix ++ operator", by the
definition of postfix ++, the pointer value we'll take the contents of is the prior, unincremented value.
So let's look at *p++ very carefully. Once more, since ++ has higher precedence, we're (a) applying ++
to p, and (b) applying * to the result. In other words, the correct interpretation is as if we had written
*(p++)
So we're taking the contents of the subexpression p++. Now, what is the value of the subexpression
p++? The definition of postfix ++ is that it increments a variable, but yields as its value the prior,
unincremented value. Remember, when you say
int i = 5;
int j = i++;
you end up with the value 5 in j, even though i ends up as 6. It's the same with pointer values: when you
evaluate *p++, you end up with the value that p used to point to, even though p ends up pointing one
beyond it. Try it -- compile and run this little program:
#include <stdio.h>
int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
int main()
{
int *p = &a[5];
int i = *p++;
printf("%d %d\n", i, *p);
return 0;
}
The program prints "5 6" -- i receives 5, the value p started out pointing to, but p ends up pointing at 6.
Now, what if you don't want the value pointed to by the old value of p, but the new one? That is, what if
you want to increment the pointer, and then take the value pointed to? This is now an order of evaluation
question, and it turns out that one of the first things to think of whenever you have an order of evaluation
question is that there's a good chance that the answer does not involve precedence.
We want an expression like *p++, in which a pointer is incremented and its pointed-to value is fetched,
except that it's the new, incremented value that's fetched. One way, of course, would be to do it as two
separate statements: first evaluate p = p + 1, then apply *p. But if we want to be Real C
http://www.eskimo.com/~scs/readings/precvsooe.20010512.html (2 of 4) [22/07/2003 5:53:19 PM]
Programmers and use the slick shortcut ++ operator, all we have to do is remember that the distinction
between using the old or the new, the unincremented or the incremented value, is precisely the distinction
between the postfix and prefix forms of ++. So to increment p and fetch the incremented value, we'd
write *++p.
(Notice, by the way, that if the explanation of *p++ were "first we increment the pointer, then we take
the contents of the incremented pointer" -- which, again, it is not -- then there would be no difference
between *p++ and *++p. But of course there is -- and should be -- a very significant difference between
these two, different expressions.)
While we're here, let's explain a few more things. Suppose we were using *p++, and we noticed that it
was the old value of the pointer that was being used by the * operator, and we wanted it to be the new
value, and we said to ourselves, "I want it to first increment the pointer, then take the contents of the
incremented pointer", and suppose further that we thought that the way to get the order of evaluation we
wanted was to use explicit parentheses, leading us to write *(p++). What happens next?
Well, it doesn't work, of course. *(p++) acts precisely like plain *p++, and the reason is that the real
purpose of parentheses is not to force an order of evaluation, but rather to override the default
precedence. So when you write *(p++), all that the parentheses say is, "++ is applied to p, and * is
applied to the result", and that's the interpretation that the higher precedence of ++ implied already,
which is why the parentheses don't make any difference. In other words, when you've got an order of
evaluation problem, not only is the answer probably not going to involve precedence, it's probably not
going to involve parentheses, either.
What are parentheses good for, then? Well, let's go back to the earlier question of whether *p++
increments the pointer or the thing pointed to. We've seen that it increments the pointer, but what if we
want to increment the thing pointed to? That's a precedence problem, so the appropriate answer is, "use
explicit parentheses", and the result is (*p)++. This says, * is applied to p to fetch a value, and ++ is
applied to the fetched value.
Exercise for the reader: (*p)++ increments the pointed-to value, but (again, by the definition of postfix
++) returns the old, unincremented pointed-to value. What if you wanted to increment the pointed-to
value and return the new, incremented, pointed-to value? (Do you need explicit parentheses in this case?)
If I haven't put you to sleep by now, I'll launch into one more little digression concerning the use of timerelated concepts to explain precedence. Ideally, to avoid confusion between precedence and order of
evaluation, we'd use time-related language to explain only order of evaluation, not precedence. But when
your C teacher was first explaining precedence, the explanation probably involved an expression along
the lines of
1 + 2 * 3
and the statement was probably made that "the multiplication happens first, before the addition, because
multiplication has higher precedence than addition". And having heard an explanation like that, it's all
too easy to come away with the impression that precedence controls order of evaluation, and it's also all
too easy to get into trouble later when considering an expression like *p++, where the precedence does
not control, doesn't even come close to controlling, the aspect of evaluation order that you're interested
in.
If we look at the expression 1 + 2 * 3 very carefully, the key aspect explained by precedence is that
the multiplication operator is applied to the operands 2 and 3, and the addition operator is applied to 1
and the result of the multiplication. (Notice that I said "and", not "and then"). Here, although there's
definitely an order of evaluation issue, and although the order of evaluation is apparently influenced by
the precedence somehow, the influence is actually not a direct one. The real constraint on the order of
evaluation is that we obviously can't complete an addition which involves the result of the multiplication
until we've got the result of that multiplication. So we're probably going to have to do the multiplication
first and the addition second, but this is mostly a consequence of causality, not precedence.
But if your C instructor misled you into thinking that precedence had more to do with order of evaluation
than it does, have pity, because I don't know of a good way of explaining precedence that doesn't involve
time-related concepts, either. My students used to have to watch me do battle with myself, lips flapping
like a fish but with no words coming out, as one part of my brain, paranoid about the possibility of later
misunderstandings, desperately tried to keep another part from blurting out the dreaded "the
multiplication happens first, before the addition, because multiplication has higher precedence than
addition". (And at least half the time, that's probably what I ended up saying anyway.)
> So, for example, having
> char *p = "Hello";
> printf("%c", *p++);
> shouldn't return 'e' ?
Nope. By the arguments above, it prints 'H'. To print 'e', you'd want *++p.
See also the comp.lang.c FAQ list, question 4.3. See also
http://www.eskimo.com/~scs/readings/precvsooe.960725.html.
Steve Summit
scs@eskimo.com
8889185069805239596085360662344508517964037796278267072470951552302
undefined behavior
[This article was originally posted on March 11, 1995, in the midst of a thread on undefined behavior. I am
indebted to Michael Hodous for having saved a copy of this article, and locating it and forwarding it to me over a
year later, after I'd lost mine.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: undefined behavior and Doubting Thomases (was: Quick C test...)
Message-ID: <D5Aqno.II8@eskimo.com>
References: <danpop.794784912@rscernix> <3jq1mn$63g@news.sas.ab.ca>
<danpop.794862450@rscernix>
Date: Sat, 11 Mar 1995 22:04:35 GMT
Lines: 114
In article <danpop.794862450@rscernix>, Dan Pop (danpop@cernapo.cern.ch) writes:
> In <3jq1mn$63g@news.sas.ab.ca> dmartin@freenet.edmonton.ab.ca () writes:
>> You can say "THE STANDARD SAYS IT'S UNDEFINED BEHAVIOUR!!!" until your
>> keyboard wears out and you still won't have answered the basic question of
>> WHY?
>
> I did, because this is the right answer. Your "answer" doesn't explain
> why the standard chose "undefined behaviour" instead of "unspecified
> behaviour" or "implementation defined behaviour". It won't convince
> anybody who can think for himself.
Actually, Douglas and Dan are both right.
Posters to comp.lang.c do assert, over and over and over again, that certain behavior is undefined. It doesn't help.
Other posters keep asking "Why?" or saying "Oh, but if I don't care which outcome I get, I'm okay."
Posters to comp.lang.c do assert, over and over and over again, that "No, it's UNDEFINED! Undefined behavior
means that ANYTHING can happen! If you write i=i++, your computer can teleport itself to 16th century
Wittenberg and change Western history by nailing sections 1.6 and 3.3 of ANSI X3.159 to the church door!". It
still doesn't help. Hyperbolic scenarios like that are fun to construct, but if someone has heard but not accepted one
sober definition of undefined behavior, and persists in their dogged belief that i=i++ is meaningful to them and
their compiler, those increasingly fantastic explanations don't usually penetrate, either.
If we want to educate, if we want people to stop believing that they can write i=i++ or predict what it should do,
we can't just keep repeating "It's undefined". We can't say anything that sounds like "It's undefined because we and
the Standard say it is, so just accept it." If it sounds like that, if it can possibly be construed to sound like that, then
people with a hyperactive (but perhaps beneficial, in other situations) skepticism gland will construe it like that,
and will cross their arms, state that they're from Missouri, and demand that we show them why. We can't say
"because the Standard says so" for the nth time -- they remain unconvinced. We can't emulate an inane beer
commercial and say "Why ask why?" -- their arms remain crossed. We have to give them a reason, in language
they can understand, and if we think that the best reason really is "because the Standard says so," or that it's not
really sensible to ask why or meaningful to explain why, then we had better -- again, if we're interested in
http://www.eskimo.com/~scs/readings/undef.950311.html (1 of 3) [22/07/2003 5:53:23 PM]
undefined behavior
educating and in keeping the noise level down -- keep those opinions to ourselves.
I keep trying to find ways of saying these things that really work. (Two other FAQ list issues that seem to defy
being put to rest are whether arrays are pointers, and whether it's possible to write a generic swap macro.) You may
think that the current FAQ list wording is adequate (if so, thank you! :-) ), and that people, having read it, shouldn't
keep wondering why things are the way they are, but FAQ lists are by their very nature pragmatic: if people
manage not to understand them, then they (the lists) are not doing their job. If an FAQ list which merely stated the
facts were adequate, if it were a reader's fault for misunderstanding it, then we wouldn't need FAQ lists at all; there
are already plenty of documents out there which merely state the facts, and which readers manage to
misunderstand.
Roger Miller came up with a very nice explanation of how behavior which the pedants claim is undefined can
seem to work "correctly" on the skeptic's computer:
Somebody once told me that in basketball you can't hold the ball and run. I got a basketball and tried
it and it worked just fine. He obviously didn't understand basketball.
(This observation can also be used to make Dan's point: about the only answer to the question "Why can't you hold
the ball and run in basketball?" is "because it's against the rules" or "because that's not the way the game is
played.")
At one time I was toying with statements of the form
Undefined means that, notwithstanding question 8.2, the code
printf("%d", j++ <= j);
can print 42, or "forty-two".
(For those who don't have a copy of the FAQ list in front of you, question 8.2 is the one that says that boolean
operators like <= always return 0 or 1. I confess that by mentioning the number 42, I've started down the road
towards hyperbole.) I think I even once posted here something along the lines of
Boolean operators like <= always return 0 or 1.
The code
printf("%d", j++ <= j);
can print 42, or "forty-two".
If you can hold these two thoughts simultaneously, you will have achieved Enlightenment.
These arguments do have a certain subtle appeal, but when you're trying to pierce those curtains of skepticism, it
seems that neither subtlety nor hit-em-over-the-head, nasal demons arguments are sufficient.
The massive FAQ list revision I'm currently immersed in (and which I shouldn't even be taking time out from to
http://www.eskimo.com/~scs/readings/undef.950311.html (2 of 3) [22/07/2003 5:53:23 PM]
undefined behavior
post this article) has a bunch more to say about undefined behavior; we'll see if any of it helps.
Steve Summit
scs@eskimo.com
undefined behavior
undefined behavior
The situation with ++j * ++j is actually even worse than with f1() * f2(). In the function call
case, it's probably true that either f1 is called first and then f2, or f2 is called first and then f1.
Applying the same logic to ++j * ++j, we might imagine that either the left-hand ++j gets evaluated
first and then the right-hand one, yielding 4 * 5 = 20, or else the right-hand one gets evaluated first and
then the left-hand one, yielding 5 * 4 = 20. But it turns out that this analysis doesn't work, either.
The problem is that ++j does two things. It delivers the value j+1 to the surrounding expression, and it
stores the value j+1 back into j, thus incrementing it. But we don't know (because the C Standard
doesn't tell us) precisely when the updated value of j gets written back to j.
In some expressions, it doesn't matter. If we wrote
i = ++j * k;
the compiler could emit code which essentially either interpreted the expression as
j = j + 1;
i = j * k;
or
i = (j+1) * k;
j = j + 1;
If we wrote
i = ++j * ++k;
there would be at least six possible interpretations:
j = j + 1;
k = k + 1;
i = j * k;
k = k + 1;
j = j + 1;
i = j * k;
i = (j+1) * (k+1);
j = j + 1;
k = k + 1;
i = (j+1) * (k+1);
k = k + 1;
j = j + 1;
j = j + 1;
i = j * (k+1);
k = k + 1;
i = (j+1) * k;
undefined behavior
k = k + 1;
j = j + 1;
Now, all six of those interpretations yield the same answer (16, if j and k both start out as 3). But if you
replace k in each of them by j, resulting in six possible interpretations of the expression ++j * ++j,
you can get at least three different answers (16, 20, and 25, by my reckoning).
But it gets worse. The official wording in the ANSI/ISO C Standard says that if you attempt to modify
the same object at one spot within an expression and refer to that same object at another spot (as in the
expression ++j * j), or if you attempt to modify the same object twice within an expression (as in the
expression ++j * ++j), the results are undefined [footnote 1]. What this means is that the compiler
isn't even required to choose among the two (or six, or however many you can come up with) "plausible"
interpretations of the expression; it can do any random thing it wants to, and it won't be in violation of
the Standard. (In practice, it's not so much that compilers do "any random thing they want to" in these
situations, but what does happen is that they do any random thing that falls out of their algorithms for
handling well-formed expressions, algorithms which are not designed -- because they do not have to be -to do anything reasonable with undefined expressions.)
Now, you might say (and I'd agree with you) that the expression i = ++j * ++j would be so
unlikely to come up in a real program that it hardly matters what it does. But if you think you can analyze
that expression and figure out what it does, then there's a flaw either in your analysis or in your
understanding of C. And I would urge you to repair that flaw -- to teach yourself that you cannot analyze
the behavior of expressions like these -- because that way you won't falsely imagine that you can predict
the behavior of other, more insidious undefined expressions which do occasionally crop up in real
programs, expressions which will cause real problems if they are retained, expressions which it will do
nobody any positive good to be able to analyze or justify, expressions which should ideally be
recognized as the undefined expressions they are and banished from the program as soon as possible.
As an example of how a "more insidious undefined expression" can in fact crop up in a real program,
consider the following decidedly non-hypothetical scenario. (The reason I know it's not hypothetical is
that I know the programmer it happened to.) The program is the interpreter for a stack machine, and it
contains code like
switch(op)
{
case ADD:
push(pop() + pop());
break;
case SUBTRACT:
tmp = pop();
push(pop() - tmp);
break;
case MULTIPLY:
undefined behavior
push(pop() * pop());
break;
case SUBTRACT:
tmp = pop();
if(tmp == 0) {error("divide by 0"); break; }
push(pop() / tmp);
break;
...
In the original implementation, push() and pop() are function calls. The original programmer,
knowing that the order of function calls within an expression is unpredictable (see our discussion of f1
and f2 above) has taken care to use a temporary variable to ensure that the operands are popped in the
right order for subtraction and division, where it matters.
But the interpreter is too slow, and a later programmer reasons that part of the problem is function call
overhead. (Most of the time, it's a good idea not to worry about function call overhead too much, but in
some situations, such as the inner loop of an interpreter which might be interpreting huge amounts of
code, it can indeed be significant.)
I don't remember if the push and pop functions were
void push(val)
{
stack[stackp++] = val;
}
pop()
{
return stack[--stackp];
}
or
void push(val)
{
*stackp++ = val;
}
pop()
{
undefined behavior
return *--stackp;
}
(That is, although the stack and the stack pointer were both global, I don't remember whether the stack
pointer was an index or a pointer into the stack.) But, in either case, it's easy to see how we might replace
the push and pop functions with macros, thus removing the function call overhead in the critical inner
loop, without rewriting the entire interpreter. (In fact, this interpreter had hundreds of operators, some
much more complicated than simple arithmetic, so there were hundreds of calls to push() and pop()
spread across many source files.)
The macros would, of course, either be
#define push(val) (stack[stackp++] = (val))
#define pop() stack[--stackp]
or
#define push(val) (*stackp++ = (val))
#define pop() *--stackp
and this seems like a very simple way to dramatically increase the efficiency of the interpreter. (It did,
too, by 20 or 30 percent.) The problem was that the interpreter also began delivering wildly erroneous
answers.
The problem turned out to be in those innocuous-looking lines
push(pop() + pop());
and
push(pop() * pop());
for addition and multiplication. When push() and pop() are macros, these lines expand to either
(stack[stackp++] = (stack[--stackp] + stack[--stackp]));
(stack[stackp++] = (stack[--stackp] * stack[--stackp]));
or
(*stackp++ = (*--stackp + *--stackp));
(*stackp++ = (*--stackp * *--stackp));
undefined behavior
And these expressions bear an eerie similarity to the one that began this thread. (They're complicated by
an added level of indirection, and they each attempt to modify stackp three times within one expression.)
And yes, under a popular, real-world compiler, namely gcc, these expressions did produce some crazy
result which was not at all the expected one but which, by a strict and proper interpretation of the C
Standard, was not incorrect. (That is, it was not incorrect for the compiler and the emitted code to have
produced the undesired result.) The compiler had not done anything wrong; the error was on the part of
the programmer who attempted to replace the function calls with macros, macros which proved to be
dangerous because of their embedded side effects.
And finally, in case anyone is doubting the truth of this story, I can tell you that the maintenance
programmer who introduced the undefined behavior was (as others of you have probably already
guessed) none other than the maintainer of the comp.lang.c FAQ list, Steve Summit, who presumably
ought to have known better.
Steve Summit
scs@eskimo.com
Footnote 1: The real rule on undefined expressions is that you can't modify an object twice, or modify
and then inspect it, between sequence points. There are a few operators -- including function calls -which do introduce sequence points, so it's possible to come up with expressions which contain multiple
side effects but which are not undefined.
There's actually a very interesting sort of paradox underlying this whole issue. It's true that strings in C
are arrays, and it's true that you can't assign arrays, and it's true that you therefore often need to call
strcpy when you want to "assign" a string. But this does not mean that whenever you have the notions
of "string" and "assign" in your head, you should reflexively call strcpy! The reason is that strings are
often referred to by pointers, and strcpy can be dangerous when used with unknown pointers (that is,
when you're not sure how much memory they point to, or whether the memory is writable).
The paradox is that although your C teacher told you that since arrays aren't assignable you have to call
strcpy to "assign" a string, pointers are assignable, and it's often much safer to use simple assignment
when working with pointers to strings! (Besides safety, if you're at all worried about efficiency, assigning
pointers can be significantly more efficient than copying strings around all the time.)
For example, the code
char *play[2] = {"1", "2"};
strcpy(play[0], "3");
strcpy(play[1], "4");
is wrong, because it attempts to overwrite constant strings. Furthermore, the code
char *play[2] = {"1", "2"};
strcpy(play[0], "345");
http://www.eskimo.com/~scs/readings/strings.980926.html (2 of 5) [22/07/2003 5:53:30 PM]
strcpy(play[1], "678");
is doubly wrong, because it attempts to overwrite constant strings and overflows them if it succeeds.
BUT, the code
char *play[2] = {"1", "2"};
play[0] = "3";
play[1] = "4";
is perfectly legal and correct, as is
char *play[2] = {"1", "2"};
play[0] = "345";
play[1] = "678";
In these cases, we're reassigning the pointers to point to different string constants, and although those
string constants might not be writable, either, we've successfully accomplished our goal of assigning new
string values to play[0] and play[1].
Here's another example. When attempting to write a function which returns a string, beginners frequently
write code like this:
char *digitvalue(int d)
{
char result[10];
switch(d) {
case 0: strcpy(result,
case 1: strcpy(result,
case 2: strcpy(result,
...
case 9: strcpy(result,
}
return result;
}
"zero"); break;
"one"); break;
"two"); break;
"nine"); break;
This code has a serious and fatal bug, of course: the local result array will not exist by the time the
function returns to its caller, so the pointer that the caller receives will be useless. (See the comp.lang.c
FAQ list, question 7.5.) One recommended fix for this problem is to declare the result array as
static, but taking the larger view, it turns out that might not be the best fix in this situation, after all!
Much better, I think, is to rewrite the function completely, as
const char *digitvalue(int d)
http://www.eskimo.com/~scs/readings/strings.980926.html (3 of 5) [22/07/2003 5:53:30 PM]
{
switch(d) {
case
case
case
...
case
}
0: return "zero";
1: return "one";
2: return "two";
9: return "nine";
}
Now we've gotten rid of the result array completely; this has the additional benefit that besides not
having to worry about its allocation, we also don't have to worry about making it big enough. (In other
words, the arbitrary value 10 from the previous implementation has disappeared.) Since the returned
pointers now point directly to string literals, the pointed-to storage must not be written to, and I've
documented this fact by declaring the function as returning const char *. (This may seem like an
unfortunate and perhaps unacceptable additional restriction over the initial implementation, but since the
initial implementation returned a pointer to storage which was neither readable nor writable, I'm perfectly
comfortable with the function as modified.) If your programming style disallows functions with multiple
exit points, you can use a temporary internal return value which is a pointer rather than an array:
const char *digitvalue(int d)
{
const char *result;
switch(d) {
case 0: result
case 1: result
case 2: result
...
case 9: result
}
return result;
}
= "zero"; break;
= "one"; break;
= "two"; break;
= "nine"; break;
Finally, as an aside, this particular switch statement is somewhat unnecessary -- since the case values are
dense and 0-based, it would suffice to pluck the digit names out of an array:
const char *digitnames[] = {"zero", "one", "two", "three",
"four", "five", "six", "seven", "eight", "nine"};
const char *digitvalue(int d)
{
if(d >= 0 && d <= 9)
return digitnames[d];
http://www.eskimo.com/~scs/readings/strings.980926.html (4 of 5) [22/07/2003 5:53:30 PM]
else
return "error";
}
(We can immediately tell that this last implementation is superior, because I've been able to present a
complete, equally-readable example, with error-checking, in less space than I was formerly using to
present abbreviated examples. This last example, too, could be written with one exit point by assigning
one of the digitname strings to a local temporary return value pointer, but the point is that it's superior to
one that used strcpy to copy one of the digitname strings to a local temporary return value array.)
To summarize the argument I've been making, IF all the strings you're working with are constant, it's
arguably safer to use pointer assignment all of the time, and never to call strcpy.
BUT, that's a mighty big "if", of course. If you're constructing any strings on the fly, perhaps by calling
strcat, you have to be more careful, because neither a rule like "always use arrays and call strcpy
for simple assignments" nor "always use pointers and use assignment for simple assignments" will
suffice. If you're constructing strings on the fly (or reading them in from the outside world), you have to
ensure that the storage you write them into is both writable and big enough. If you're using arrays, this
will probably mean declaring arrays having some explicit, maximum size (i.e. the maximum size of the
strings you'll ever be able to construct or read in), and if you're using pointers, it will probably mean
calling malloc or realloc. (And, if you've settled on arrays, you may end up going back to strcpy
for all string "assignments".)
The bottom line is: if you have a program that manipulates a lot of strings, what should you choose as
your basic "string" type? If you choose "array of char", then yes, you will have to call strcpy all the
time. But if you choose "pointer to char", you probably do not want to call strcpy; most of the time
you will want to use pointer assignment instead. (And if you want to get the best of both worlds, by using
some arrays and some pointers, you'll have to be more careful still.)
Steve Summit
scs@eskimo.com
[This article was originally posted on January 18, 1999; I have touched it up lightly since then.]
From: scs@eskimo.com (Steve Summit)
Newsgroups: comp.lang.c
Subject: Re: a++ versus ++a
Date: 18 Jan 1999 16:31:45 GMT
Lines: 457
Message-ID: <77vnlh$8qq$1@eskinews.eskimo.com>
References: <36A2B0E3.95E2A918@newsandmail.com>
In article <36A2B0E3.95E2A918@newsandmail.com>, dan.johnson@newsandmail.com writes:
> Here's my understanding Incrimenting and Pre-Incrimenting....
> a++ means the same as a + 1
> ++a means the same as 1 + a
Well, no. That wouldn't be much of a difference, would it?
> What I don't understand is what the benefit of one versus the other
> is inside of code. I have used a++ many times in my for-loops
> [ie, for (a = 0; a > 10; a++) ... etc]
Part of the trick of understanding the difference between a++ and ++a is to realize that sometimes, there is no
difference. The difference between a++ and ++a is in what value is "returned" to the surrounding expression,
but this obviously makes a difference only when there is a surrounding expression, when the a++ or ++a
appears as a subexpression within a larger expression. When the a++ or ++a stands alone, as it does in the loop
header
for(a = 0; a < 10; a++)
there is no effective difference at all; that loop is 100% equivalent to one that begins
for(a = 0; a < 10; ++a)
When the a++ or ++a does appear as a subexpression within a larger expression, the difference is that a++
"returns" the old, unincremented value of a to the surrounding expression, while for ++a you get the new,
incremented value.
> I learn best by simple example, so if you have some code you can show
> off the best use and explanation of the differences between these, then
> by all means supply it in a reply.
Here's some code. (The example is going to seem excessively long, but underlying that length is the simplicity
you asked for, so please bear with me. It's long because I'm going to try to make it realistic; I'm going to
motivate the choice between prefix and postfix ++ with a convincing example, rather than an arbitrary fragment
like b = a++, and I'm going to provide a fair amount of commentary along the way.)
Suppose we have an array of integers
int numbers[10];
which we're going to fill in with numbers read from the user. We're not sure how many numbers the user is
going to type in -- sometimes it will be less than 10 -- so we'll keep a second variable,
int nnumbers;
to keep track of how many numbers the user did type in. Our very first cut at writing the code to read in the
numbers might look something like this:
int i, x;
for(i = 0; i < 10; i++)
{
x = getnumberfromuser();
if(x <= 0)
break;
numbers[i] = x;
}
nnumbers = i;
Here we've deferred the question of how exactly we're going to accept numbers from the user by moving that
functionality off into a function, getnumberfromuser(). (I'll provide a sample implementation below, for
completeness.) We've hypothesized that getnumberfromuser() will return 0 or a negative number to
indicate that the user has finished typing in numbers (perhaps because we'll simply ask the user to type in 0 or 1 to indicate the end of the list). Since the numbers[] array is size 10, we start off imagining that we'll make
10 trips through the loop to fill in up to 10 numbers. If the user indicates the end of the list (that is, types 0 or -1)
after typing fewer than 10 numbers, we won't be getting 10 numbers, after all, so we take an early exit from the
loop with the break statement.
The code above will work, but it has a few deficiencies. The most serious (though the least obvious) is that if
the getnumberfromuser() function explicitly prompts the user to enter a number (which is quite a likely
implementation), the user experience will be significantly different depending on whether the user types 10
numbers, or fewer than 10 numbers. If the user types, say, 7 numbers, another prompt will appear (which would
be for the 8th number), and the user will type 0 or -1 to indicate completion. If the user types 10 numbers,
however, the loop will take its normal exit (because the condition i < 10 in the for loop will no longer be
true), and getnumberfromuser() will not be called at all, and no 11th prompt will appear, and the user
won't have to type (or be given the opportunity to type) 0 or -1 at all.
Other deficiencies (though they're more minor stylistic quibbles) concern the loop used and the number of
variables used. The loop
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (2 of 9) [22/07/2003 5:53:35 PM]
else
The slightly less-than-obvious way, but which is ideal for while loops like the one we're trying to write, is to
bury the assignment inside the conditional:
(x = getnumberfromuser()) > 0
Here we store getnumberfromuser's return value into the variable x, and then use the "return" value of the
assignment operator as the left-hand operand of the > operator. The "return" value of an assignment operator is
simply the value assigned, so the net result is that we both store getnumberfromuser's return value away in
x for future reference, and also immediately compare it against 0 to see if it's a number we want, or not. Placed
into a while loop (but still using some pseudocode), the expression is used like this:
while((x = getnumberfromuser()) > 0)
{
store x in numbers[];
}
we're done reading numbers;
If you don't mind the embedded assignment-and-test epitomized by the expression (x =
getnumberfromuser()) > 0, this is a splendid way to write the loop, and it corrects the first stylistic
deficiency. (Actually, I hope you don't mind the embedded assignment-and-test, because it's quite a popular
idiom in C!) Now I'll move on to the second deficiency, because even though it was quite minor, it's going to be
at the center of our decision whether to use prefix or postfix ++ (which is, believe it or not, still what I'm
ultimately trying to demonstrate here).
I want to remove the extra variable involved in the first example. When we're done reading numbers, the
variable nnumbers is supposed to contain the number of numbers the user has entered. I'm going to assert that,
within the loop, nnumbers is always going to contain the number of numbers the user has entered so far. (The
notion that "nnumbers always contains the number of numbers the user has entered so far" is an example of a
"loop invariant", and picking clean, simple loop invariants and sticking to them is an excellent technique when
we want to write clean, readable code.)
Let's think about the values that nnumbers should contain as the program fragment proceeds. Before we make
any trips through the loop, the user won't have typed any numbers, so nnumbers should (rather obviously)
start out as 0. At the end of the first trip through the loop (assuming the user doesn't type 0 or -1 right away),
nnumbers should contain 1. At the end of the second complete trip through the loop, nnumbers should
contain 2, etc.
If we're going to stick with using the nnumbers variable, and eliminate i as redundant, we're going to have to
use nnumbers to decide which element of the numbers[] array to store each new number in. Since arrays in
C are 0-based, the first number read will be stored in numbers[0], the second number will be stored in
numbers[1], etc. So it looks like we can store each new number into numbers[nnumbers], as long as
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (4 of 9) [22/07/2003 5:53:35 PM]
we're careful to increment nnumbers after we use it to decide which element of the array to store in. Our code
(which is no longer pseudocode) therefore looks like this:
nnumbers = 0;
while((x = getnumberfromuser()) > 0)
{
numbers[nnumbers] = x;
nnumbers = nnumbers + 1;
}
We've met our goals of abandoning the for loop in favor of the (for this problem) more-natural while loop,
and of consolidating our use of the former variable i into the nnumbers variable. Our loop invariant is
(mostly) met: at the end of each trip through the loop, after incrementing nnumbers, nnumbers does indeed
contain the number of numbers read so far. (We've also lost some functionality, and introduced a bug: we're no
longer checking to make sure we store no more than 10 numbers in the array, so it could overflow. We'll fix this
up in a minute.)
Our purpose here (I still haven't forgotten) is to demonstrate the choice between prefix and postfix ++. If we
want to replace the "longhand" expression nnumbers = nnumbers + 1 with the shorthand ++ form, we
could (I suppose) write the body of the loop as
{
numbers[nnumbers] = x;
nnumbers++;
}
but this doesn't buy us much (nor is it an effective demonstration of the prefix vs. postfix decision). Here, the
expression nnumbers++ stands alone, with no surrounding expression, so it doesn't matter whether it "returns"
the old or the new value. We could just as well have written
{
numbers[nnumbers] = x;
++nnumbers;
}
We can, however, collapse the two statements in the loop together, with a result that will look like (reverting to
pseudocode again for a moment):
numbers[use nnumbers and also increment it] = x;
There are two ways to look at this process of collapsing: we can say that it's just another example of C
programmers trying to show off and be obfuscated by cramming everything possible into one line, or we can
say that it's a good way of expressing an idiom and strengthening a loop invariant. (The tack this article is
taking will seem to favor the latter interpretation, but I'm not going to try to jam it down your throat; I concede
that the C programmer proclivity to cram everything into one line can be annoying even when there are good
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (5 of 9) [22/07/2003 5:53:35 PM]
based array in C -- just declare the array one bigger than it has to be:
int numbers[11];
Now you can use numbers[1] through numbers[10], and just kind of pretend that numbers[0] isn't
there.
In languages that support 1-based arrays by default, it's unfortunately rather common to see code which (to
mimic our ongoing example) initializes nnumbers to 1, since that's the first subscript value that will be used.
In other words, one way (though a poor way) to write the loop would be
nnumbers = 1;
while((x = getnumberfromuser()) > 0)
{
numbers[nnumbers] = x;
nnumbers = nnumbers + 1;
}
nnumbers = nnumbers - 1;
There's an obvious problem with this code: if its loop invariant was supposed to be "nnumbers always
contains the number of numbers the user has entered so far", it's now false most of the time; it's true only in the
middle of the loop, between the two statements. In fact, a more accurate loop invariant for the code as written
here would be that "nnumbers always contains one more than the number of numbers the user has entered so
far", which is obviously a much less useful statement. A poor loop invariant like this always has to be propped
up with fudge factors here and there, indicated by apologetic comments such as "correct for going one too far".
But 1-based arrays don't have to lead to convoluted code. Look how much simpler the code becomes if we
interchange the increment and the array assignment, and change the initial value of nnumbers back to 0:
nnumbers = 0;
while((x = getnumberfromuser()) > 0)
{
nnumbers = nnumbers + 1;
numbers[nnumbers] = x;
}
Now, it's again mostly true that "nnumbers always contains the number of numbers the user has entered so
far", but since we increment nnumbers before using it, the first number read goes into numbers[1].
You can probably see where this is going. Replacing the longhand form numbers = numbers + 1 with
the autoincrement operator ++, and restoring the overflow test, we end up with this 1-based version:
nnumbers = 0;
while((x = getnumberfromuser()) > 0)
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (7 of 9) [22/07/2003 5:53:35 PM]
{
if(nnumbers >= 10)
{
printf("too many numbers!\n");
break;
}
numbers[++nnumbers] = x;
}
If you compare this code to the previous 0-based version, you'll see that the only difference is that it uses prefix
++ instead of postfix.
***
To summarize, then: the difference between prefix and postfix ++ is that prefix "returns" to the surrounding
expression the new, incremented value of the variable, while postfix yields the old, unincremented value. This
distinction makes no difference if the prefix or postfix expression stands alone as a full expression; it only
makes a difference when the prefix or postfix expression sits as a subexpression within a larger expression
which cares about the value "returned." It may seem alarming that it's taken me 457 lines to describe this
distinction, and it's true that for most of the issues I've elaborated on here so agonizingly, experienced C
programmers make their decisions without thinking about each tiny detail so acutely. But if you're trying to
decide whether to use prefix or postfix autoincrement in some code, code which you've gotten as far as the
pseudocode
numbers[use nnumbers and also increment it] = x;
in defining, you do need to think about the fact that arrays are 0-based in C, and about your loop invariant,
before making your decision. When you do, you'll find that the prefix and postfix forms are not interchangeable,
that the distinction between them is significant and real, and that (for any given situation in which the value is
significant) one of them does just what you want while the other would be quite wrong.
Steve Summit
scs@eskimo.com
P.S. Here's the promised implementation of getnumberfromuser(), for completeness:
#include <stdio.h>
int getnumberfromuser()
{
char tmpbuf[25];
printf("enter a number [-1 to exit]: "); fflush(stdout);
if(fgets(tmpbuf, sizeof(tmpbuf), stdin) == NULL)
return -1;
return atoi(tmpbuf);
}
http://www.eskimo.com/~scs/readings/autoincrement.990118.html (8 of 9) [22/07/2003 5:53:35 PM]
[This article was originally posted on August 23, 1996, and again with annotations on January 14, 1997
and March 20, 1998. I have edited the text further for this web page.]
[A few of the points I make in this article are becoming dated. It is becoming less and less kosher to
assume that a function declared without an explicit return type will be implicitly declared as returning
int; or to fail to return a value from a non-void function. In fact, the new revision of the ANSI/ISO C
Standard ("C99") actively disallows the first of these, and at least partially the second.]
Newsgroups: comp.lang.c
From: scs@eskimo.com (Steve Summit)
Subject: Re: int main() ???? wrong?
Message-ID: <DwLyC9.5w5@eskimo.com>
References: <321b9857.75314178@news2.microserve.net>
<321cc6d4.152763064@news2.microserve.net>
Date: Fri, 23 Aug 1996 20:31:21 GMT
In article <321b9857.75314178@news2.microserve.net>, sidlip@lip.microserve.com writes:
> I see alot of reference to int main () being incorrect syntax..
> Ive gotten into the habit of doing it and would like to know what
> errors could pop up in my code from not voiding main.
I assume you mean, "what errors could pop up from not declaring main correctly." Speaking generally,
the same sorts of errors could pop up as occur any time a global variable or function is misdeclared.
Turning for a moment from main() and void to ints and doubles, let's consider the (incorrect)
code
#include <stdio.h>
extern int sqrt();
main()
{
printf("%d\n", sqrt(144.));
return 0;
}
This code is clearly trying to compute and print the square root of 144, but it almost as clearly will not
work. It declares that the library function sqrt() returns an int, and it tries to print sqrt's return
value using %d which expects an int. Yet, in actuality, sqrt() of course returns a double. The fact
that sqrt() returns a double, while the caller acts as if it returns an int, will almost certainly
prevent the program from working.
Why won't it work, exactly? It depends on the details of the particular machine's function call and return
http://www.eskimo.com/~scs/readings/voidmain.960823.html (1 of 7) [22/07/2003 5:53:40 PM]
mechanisms. We haven't said which machine we're using, and we shouldn't have to know the details of
these mechanisms (they're the compiler's worry, and to worry about them for us is one of the reasons
we're using a higher-level language like C in the first place), so we can speak only in general terms.
Perhaps the machine has one general-purpose register which is designated as the location where
functions return values of integer types, and one floating-point register which is designated as the
location where functions return values of floating-point types. If so, sqrt() will write its return value
to the floating-point return register, and the calling code will read a garbage int value from the generalpurpose return register. Or, perhaps integer and floating-point returns use the same locations. sqrt()
will write a floating-point value to the location, but the calling code will incorrectly interpret it as an
integer value. You'd then get approximately the same result as you'd get from executing
double d = 12.;
int *ip = (int *)&d;
printf("%d\n", *ip);
In both cases, the bit pattern of the floating-point number is interpreted as if it were an integer. Since the
bit-level representations of integer and floating-point data are invariably different, confusing and
inaccurate answers result. (Note that in neither case -- the incorrect declaration of sqrt(), nor the
int/double pointer game -- does the compiler emit any code to do double-to-int conversion. Any
such conversions have been effectively circumvented by the peculiarities of these two code fragments. In
the case of calling sqrt(), you'd get a proper double-to-int conversion only if sqrt() were
properly declared as returning double, and the return value were assigned or cast to an int.) Finally,
suppose that values are returned on the stack, but that sizeof(double) is greater than
sizeof(int) (as is usually the case). sqrt() will push a double-sized result on the stack, but the
caller will pop an int-sized one. Not only will the caller print a garbage answer (interpreting those bits
it did pop as if they represented an int), but the shards of the double remaining on the stack could
screw later uses of the stack up enough that the program could crash.
In the preceding example, the calling code was incorrect, because it misdeclared the return value of the
sqrt() function. We're not allowed to choose what we want the return value of sqrt() to be, because
neither the defined return type of sqrt() (as fixed by the Standard and long practice) nor the actual
return type of sqrt() (as implemented in the library provided with our compiler) are under our control.
Issuing an external declaration for sqrt() declaring that it returns an int does not make it return an
int. (Nor does the misdeclaration instruct the compiler to convert sqrt's return value from double to
int; if the declaration is wrong, how could the compiler even know that the type to be converted from
was double?) Abraham Lincoln used to ask, "If we call a tail a leg, how many legs does a dog have?"
His answer was "Four -- calling it a leg doesn't make it one."
When we write an implementation of main() with a return type of void, the situation is similar, but
the roles are reversed. Now, it is the caller that is fixed and beyond our control, and that caller is
assuming that main() returns an int. You may imagine that somewhere (it's actually in the compiler
vendor's source for the C run-time library) there is some code which looks something like this:
Yes. More precisely, main is supposed to return an int; what it actually returns is what you declare it
to return (and what your return statement(s), if any, say it returns). But if you implement main as
returning something other than what its caller expects it to return, you run the risk of its not working
correctly.
In article <321cc6d4.152763064@news2.microserve.net>, sidlip@lip.microserve.com goes on:
> Im curious why do i see void main(void) so much?
I honestly don't know.
The first few times I ever saw void main(), it was clear that the programmer was trying to avoid
warnings. The program
int main()
{
printf("Hello, world!\n");
exit(0);
}
does not return a value from main(), but it doesn't matter, because control flow never falls off the end
of main(), because exit() never returns. But the compiler usually doesn't know that exit() doesn't
return, and may warn that "control reaches end of int-valued function without returning a value." In the
program
int main()
{
printf("Hello, world!\n");
}
the compiler is likely to issue the same warning, and here it's perfectly appropriate, but the programmer
may not care, if the program isn't ever to be executed in an environment where the exit status matters. In
either case, declaring main() as void effectively shuts off that particular warning message, but at the
cost of making the program incorrect. (Indeed, some compilers will issue other warnings when main()
is misdeclared.)
It's because of this particular argument that the question currently known as 11.12 in the FAQ list is
worded the way it is, but this line of reasoning is not sufficient to explain void main()'s
unaccountable popularity today. It's become some kind of a meme plague: the "popular" textbooks all use
it, so hordes of unsuspecting C programmers learn it, and some of them go on to teach classes or write
more books which then spread the virus. In fact, my initial supposition (outlined above) of why people
might write void main() is now not only insufficient, it's almost completely forgotten! Today, when
people who use void main() are asked why they do, none of them ever mention shutting off
http://www.eskimo.com/~scs/readings/voidmain.960823.html (4 of 7) [22/07/2003 5:53:40 PM]
warnings about main() not returning a value. They all mumble something like "but everyone else uses
it" or "but it works on my compiler" or (particularly in the case of misguided teachers and textbook
authors) "that way we don't have to introduce the full complexity of return types right at first." (This last
justification is particularly specious, as we'll see.)
> what exactly is it saying? and why does it work ?
It tends to work (even though it doesn't have to and is not guaranteed to) for a combination of three
reasons:
1. Once upon a time, C did not have the void type. Functions which did not return values were
declared (usually implicitly) as returning int, but did not actually return values, and their callers
did not use the (nonexistent) return values. That's why, even today, the ANSI Standard does not
flatly prohibit falling off the end of a non-void function without executing a return statement,
or issuing an expressionless return statement within such a function. (The behavior is undefined
only if the caller tries to use the value, and never requires a diagnostic. Good compilers will emit
warnings.) In the old days, it was up to the programmer to remember which functions returned
values and which didn't, although this was of course one of the things which lint checked (and,
for those few who use it, still checks).
2. Partly because of #1, the actual low-level calling mechanisms for void-valued and intvalued functions are usually not different. A void-valued function simply omits to place a value
in the designated return value location, and the caller doesn't try to fetch a value from that
location. In these cases, the worst thing that happens when a function is incorrectly declared as
void is that the caller gets a garbage return value. Misdeclaring main (or any function) as void
will typically only result in catastrophic failures if values are returned on the stack: if the caller
pops a value but the caller didn't push one, the caller may instead have popped something vital
from the stack, such as a frame pointer or return address, such that when the caller later returns,
chaos results.
3. Since so much code (including the examples in some compiler manuals!) does misdeclare
main() as void, compiler vendors pretty much have to arrange that it works. (Alas, this
strategy is an "enabling" one; it allows the users of void main() to perpetuate their
rationalizations.) Posters to comp.lang.c can arrogantly state that "Your code is broken. Fix it.",
but compiler vendors who believe that "the customer is always right" cannot (particularly if the
vendor's offerings to date have permitted void main()).
(As an aside, these three reasons conspire to require nonstandard "pascal" keywords in compilers for
environments where code written in C must call libraries written in Pascal, or vice versa. I gather that
many Pascal compilers do use different calling conventions for procedures and functions. If C compilers
didn't have to coddle broken code by making void-valued and int-valued functions compatible, they
could arrange that void-valued functions were completely compatible with Pascal procedures. Instead, a
http://www.eskimo.com/~scs/readings/voidmain.960823.html (5 of 7) [22/07/2003 5:53:40 PM]
[ Addendum: Despite the length of this article, it only barely gets around to explaining how, precisely,
declaring main() as void could fail. Besides the problems of a garbage exit status being received by
the calling environment, a more serious failure could occur if the compiler used different calling
conventions for void- and int-valued functions. If so, then for main() to push no return value on the
stack, and for the caller to pop an int value that main() hadn't pushed, could conceivably "screw later
uses of the stack up enough that the program could crash."]
[This article was originally posted on March 1, 1999. I have edited the text slightly for this web page.]
From: scs@eskimo.com (Steve Summit)
Newsgroups: comp.lang.c
Subject: Re: "void main" is a lifestyle choice
Date: 1 Mar 1999 17:16:39 GMT
Message-ID: <7bei1n$tsl$1@eskinews.eskimo.com>
References: <36D9A98B.56C1@customized.removethis.com>
<36D9B89F.287C@customized.removethis.com> <7bco1u$cjd$1@eskinews.eskimo.com>
<36DA149D.757D@customized.removethis.com>
In article <36DA149D.757D@customized.removethis.com>, Sven writes:
>Steve Summit wrote:
>> Ben went a bit too far. Yes, to keep your compiler or lint
>> from complaining, you will probably need a "return 0" statement.
>> Yes, this is a statement that may never be executed, inserted
>> just to keep everyone happy. Yes, inserting it will bloat your
>> executable by a few bytes. All of that is okay in this case ->> really. If nothing else, it'll most certainly cost you less time
>> than arguing about it here.
>
> I made a point and I think you agree with it. I am not worried
> about code bloat or lost time for argument. Can't I raise an
> objection to an ANSI decision?
As a practical matter, no, you can't. That part of the Standard is very solidly set in stone. Furthermore,
like almost all aspects of the Standard, it codifies existing practice, which has always been that main has
been presumed to return int.
> Please keep this discussion in the academic/language-theory realm
> it was intended. If you disagree with me, then I expect a cogent
> argument in favour of "int main" and why it was chosen.
Unfortunately I don't have time for a full-blown argument this morning; perhaps another day. But
consider this: any interface specification must be adhered to by both sides unless renegotiated. You have
no more freedom to choose your own return type for main than you do for matherr, nor for the
functions passed as the last argument to qsort or bsearch, nor for the functions passed to atexit,
nor for the functions passed as the second argument to signal. Somebody else's code, code over which
you have no control, will be calling all of these functions, and unless you are prepared to get all compiler
and RTL authors to simultaneously rewrite their code to support your revised interface specification, and
to get all authors of C code in the world to simultaneously rewrite their code to work with the revised
compilers and RTLs, you're simply going to have to write this one off as one of life's little miseries, and
/*
/*
/*
/*
/*
*/
*/
*/
*/
*/
at the end of main(). That's the sort of thing I do whenever I'm forced to comply with an inept interface
specification which I don't have time to fix. This accomplishes your purpose: you have written code
which is correct and which compiles without warning or error, and you have stated for the record your
objection to the way you were forced to write it. Then, as I said, move on.
I see that I haven't quite answered your question. Why was int chosen instead of, say, void? There are
two explanations. main has to have a single return type; there is no overloading in C. Some programs do
want to return an exit status from main and some don't, so one group or the other is sure to be
disappointed.
I was about to claim that it's easier for programs which don't want to return to comply with the dictum
that main must return int than it would be for programs that do want to return to deal with a world in
which ANSI specified that main was declared void, but I guess I might have gotten that backwards. In
a hypothetical world where main had to be declared void, programs which wanted to return an exit
status at the top level could simply call exit instead of returning. (This would wreak havoc with
programs which call main recursively, of course, but they're almost so rare they can be ignored.) So I
guess I have to fall back on the second explanation.
Once upon a time, C had no void type. Functions which didn't want to return anything just left out the
expressions in the return statements and left the return type out of function declarations. But the
default for a function with an omitted return type was always (until C99 came out) int. Ergo, in the
days before void, programmers who didn't want to return a value from main (and who thought they
were declaring main as returning nothing) were declaring main as int. Pre-ANSI, pre-void practice
was unanimously for main to be declared as returning int, so there was no other possible choice for
ANSI to standardize.
>> Other options -- depending on your compiler or lint -- are to
>> tell your compiler or lint that your program's main function
>> (that is, the one called from main that does all the work and has
>> a call to exit buried somewhere down within it) never returns,
>> or to insert some kind of a "notreached" directive right after
>> the call to that function in main(). Either of these options
http://www.eskimo.com/~scs/readings/voidmain.990301.html (2 of 5) [22/07/2003 5:53:45 PM]
>> (if available) may serve to suppress the "control reaches end of
>> int-valued function without return" message, without polluting
>> your code with a statement which will never be executed.
>
> Now we're talking about non-portable work-arounds to legitimately
> generated warnings and deliberately inserted unreachable code.
> Are you serious?
Yes.
Let me ask you this: what do you do when you have a function with an imposed, unchangeable signature
but which doesn't happen to need one of its parameters? Many compilers will warn you that "argument x
is unused in function f". Many times, this is an appropriate warning. Sometimes it is not. How do you
cleanly shut it off? (I honestly don't know. Right now, I'm ignoring it in a fair amount of production code
I'm compiling with gcc. Ten years ago, I would have inserted an /* ARGSUSED */ comment to
silence lint, but that's a nonstandard extension. If I finally get so sick of the warnings that I decide to
do something about it, I'll probably insert some dummy code which seems to do something with x, but
then that's wasted code which pollutes my clean program and accomplishes nothing at run time and
exists only to coddle various compiler warnings and interface specifications at compile time.)
>>> Remember that lint or compilers set to their highest warning levels
>>> don't understand program flow.
>>
>> Actually, many of them do.
>
> They can not. Program flow is not deterministic. A compiler can not
> determine ahead of time whether a function called in main will ever
> return control to main.
Yes, Sven, I've heard of the halting problem, too.
Many compilers know, semantically, that a call to a function named exit() never returns. Based on
this information, when faced with the code
int f()
{
exit(0);
}
they will suppress their normal warning about "control reaches end of non-void function without
return". Furthermore, some of these compilers allow you to specify (via some #pragma or
nonstandard declaration) that certain of your functions do not return, either.
http://www.eskimo.com/~scs/readings/voidmain.990301.html (3 of 5) [22/07/2003 5:53:45 PM]
My point is that these compilers do have a notion of control flow, a notion which is based not on
exhaustive analysis but on hints which the compiler has been given. Furthermore, since it's this very
same, weak form of control flow "analysis" that is driving the compiler's determinations that a "statement
is not reached" or a "variable is set but not used" or that "control reaches end of non-void function
without return", it is not inappropriate to be tying in with that weak control-flow analysis mechanism
when you're trying to suppress one of the warnings based on information which you have but which the
compiler cannot otherwise know. (But no, since all of this is compiler-specific, there is no Standard or
portable way to supply the hints or otherwise tie in with the mechanisms.)
>> "They" could not have left it as void, because it never was void.
>> The defined return type of main has always been int. main does
>> have to be a function returning int, because that's what the
>> calling code (remember, there is some code, namely in the
>> program's run-time start-up, that is going to call main) expects
>> it to be.
>
> Perhaps what I'm asking for is bona fide support for ANSI convention in
> the form of a prototype for "main" - maybe it should be in <stdlib.h>.
That would be wonderful, and it might help to put this eternal debate to rest. Alas, it's impossible,
because there is no one prototype for main -- it's either
int main(void);
or
int main(int, char **);
This is one of the larger warts on the ANSI/ISO C Standard, and it's there (like all the others) to codify
existing practice.
> Until then I can code my main function anyway I want to, despite
> ANSI wishes or the yelling to the contrary from this newsgroup.
Sure. And you can write a[i] = i++, or cross the street after the DON'T WALK sign has started
flashing, or take 13 items through the "12 items or less" aisle at the supermarket, or do any of a million
other things that sit squarely in the grey area between legality and illegality and which you might or
might not get called on.
Look, don't get me wrong. I salute your wish to write clean programs which express the programmer's
intent. I salute your wish not to pollute your code with nonstandard directives or meaningless or
unreachable code. Both of these are goals which I have for my own work as well. But I urge you to pick
your battles. (Reinhold Niebuhr's serenity prayer is applicable, too.) The situation with respect to main's
return status is not perfect, but it is a comparatively minor issue, and you will not have forsaken your
quest to write good, clean code by conceding defeat on this one. When you design your own, perfect
language (no sarcasm here, either; this is what I'm doing, too) you can fix the situation with respect to
main's return value once and for all.
(Also, I might explain or apologize for some of the other responses you've gotten in this thread. We're
not as stupid as you think we are, but you're new here, I think, and you've taken a fairly argumentative
and intolerant tone. It looks like some of your respondents have dealt with you rather summarily, not
noticing where you're coming from underneath.)
Steve Summit
scs@eskimo.com
P.S. No, matherr isn't standard, either.
Elegant
[In January, 1998, in my ISP's general discussion newsgroup, we were discussing software and the
marketplace, and I mentioned that all too many people don't even know what they're missing. Someone
asked what I meant, and this was my response. I have added to it slightly since then.]
From: scs@eskimo.com (Steve Summit)
Newsgroups: lobby
Subject: Re: software elegance (was: Root for the Indians this time?)
Date: 17 Jan 1998 17:56:38 GMT
Lines: 173
Message-ID: <69qrcm$ben$1@eskinews.eskimo.com>
References: <34b7e300.18628677@news.eskimo.com> <69bncf$e0v$1@eskinews.eskimo.com>
<69o1ib$qjj$1@eskinews.eskimo.com> <34c057b5.9765222@news.eskimo.com>
In article <34c057b5.9765222@news.eskimo.com>, wayneld@eskimo.com writes:
> On 16 Jan 1998 16:23:39 GMT, scs@eskimo.com (Steve Summit) wrote:
> > -- know what they're missing in terms of software elegance.
>
> What exactly is the defintion here of "software elegance?"
Why, vague and unstated, of course; otherwise how could we have eternal pointless arguments about it? :)
Elegant software is compact and efficient. Mundane software is bloated and slow.
Elegant software has features that people need and use. Mundane software has features that may look
good on paper and impress magazine editors, but which few people use in practice and which serve only
to add to the bloat (and the bug count).
Elegant software lets you do what you want to do. Mundane software lets you do what you want to do, as
long as you do it its way.
Elegant software does one job and does it well. Mundane software tries to do two or more jobs, and/or
does its job(s) clumsily (or sometimes does no job at all).
Elegant software has a clean model for the data it works with and the operations it performs on it; this
model is easy to understand and when you do, you find that it matches the way you think about your data.
Mundane software either has no well-defined model at all, or has one so bizarre that only its programmers
comprehend it, leaving you largely in the dark, forced to perform operations by rote, without
understanding them and powerless to adapt (to new situations or in the face of simple mistakes, either
yours or the software's).
Elegant software lets you do things its designers never imagined. Mundane software lets you do only
http://www.eskimo.com/~scs/readings/software_elegance.html (1 of 4) [22/07/2003 5:53:47 PM]
Elegant
Elegant
Elegant software is programmable (when you're ready for it), and can be used by other programs.
Mundane software (as already mentioned) can perform only those functions which its designers explicitly
thought of, and none of those functions can ever be automated, because they depend heavily on visual,
graphical, user interactions.
Elegant software conforms (where applicable) to open, industry standards for file formats, networking
protocols, and the like. Mundane software invents its own formats and protocols, and depending on the
nerve of its vendor, attempts to foist them upon the industry as closed standards.
Elegant software lets you "pop the hood" and inspect your data or work with it in ways that the program
can't, i.e. makes it possible for you to access your data with external tools. Mundane software stores your
data in inscrutable formats which you can't use standard tools to access, and may actively prevent you
from even trying, in part by imposing a software license which forbids reverse engineering.
Elegant software is backwards and forwards compatible with past and future versions of itself. Mundane
software might be able to read one or two of its previous version's formats, but it can't write them, and it
crashes if it tries to read one of its future version's formats. (For that matter, elegant software uses openended data formats which rarely if ever need to be supplanted, while mundane software gratuitously
changes its data formats with every release whether it needs to or not.)
Elegant software is relatively free of bugs, and those that it has are minor, or off in corners where hardly
anyone notices them, and are usually easy enough to work around. Mundane software is full of glaring
bugs, many of which cause unrecoverable crashes if the software is used to do anything its designer didn't
imagine, is ever used outside the one narrow usage pattern which the designer had in mind and tested for.
When it happens to contain a bug serious enough to cause a crash, elegant software takes care to save
your data even as it's crashing. When mundane software crashes, it not only loses (or subtly garbles) your
data, it may also (if running under a mundane operating system) take other applications down with it, or
crash the operating system itself for good measure.
Elegant software works automatically on all machines you would expect it to, with all peripherals you
would expect it to, including those which haven't been built yet. Mundane software either works only
with the one specific hardware configuration that its designer used and tested for, or has been laboriously
tuned to work with a list of 437 existing hardware configurations which does not include yours.
Elegant software is general-purpose. Mundane software is unnecessarily specialized.
Elegant software assumes that you are intelligent and can learn, and that you are interested in using a
computer to save time, automate repetitive tasks, and eliminate busywork. Mundane software tries to do
everything for you (presuming that you can't), but ends up requiring you to specify everything you want
to do in minute, tedious, timeconsuming, mind-numbing detail.
Elegant
Elegant software is written by programmers who have their users in mind, who listen to user feedback,
who are rewarded for their creativity, who code with an eye to demonstrable, lasting correctness, and who
are constantly searching for new and better ways to write interesting programs with fewer bugs. Mundane
software is written by programmers who have themselves in mind as users, who try to tell other users
how to think like they do, who are rewarded for the number of lines of code they write per day or for how
well they're able to implement specs delivered from on high, who are happy with a program if it seems to
work today, who would like best to keep writing programs exactly as they wrote them yesterday, and who
regard bugs as inevitable.
Elegant software is a joy to use. Mundane software is a pain in the ass.
Steve Summit
scs@eskimo.com
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
21k
4k
index2.html
27-May-2001 12:53
2k
11k
precvsooe.960725.html
15-Jul-2003 07:22
21k
precvsooe.960728.html
12-May-2001 08:10
13k
recursion.990209.html
09-Nov-2001 07:35
15k
software_elegance.html
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
undef.981105.html
26-Mar-1999 08:40
11k
voidmain.960823.html
30-May-2000 21:12
18k
voidmain.990301.html
30-May-2000 21:10
12k
Size
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
voidmain.990301.html
30-May-2000 21:10
12k
voidmain.960823.html
30-May-2000 21:12
18k
undef.981105.html
26-Mar-1999 08:40
11k
undef.950321.html
03-Jan-1997 21:49
20k
undef.950311.html
03-Jan-1997 22:57
6k
strings.980926.html
05-Dec-1999 10:00
10k
software_elegance.html
03-Sep-2000 10:20
9k
recursion.990209.html
09-Nov-2001 07:35
15k
precvsooe.960728.html
12-May-2001 08:10
13k
precvsooe.960725.html
15-Jul-2003 07:22
21k
11k
index2.html
Size
27-May-2001 12:53
2k
4k
21k
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
undef.950321.html
03-Jan-1997 21:49
20k
undef.950311.html
03-Jan-1997 22:57
6k
4k
undef.981105.html
Size
26-Mar-1999 08:40
11k
21k
strings.980926.html
05-Dec-1999 10:00
10k
voidmain.990301.html
30-May-2000 21:10
12k
voidmain.960823.html
30-May-2000 21:12
18k
software_elegance.html
03-Sep-2000 10:20
9k
precvsooe.960728.html
12-May-2001 08:10
13k
index2.html
27-May-2001 12:53
2k
11k
recursion.990209.html
09-Nov-2001 07:35
15k
precvsooe.960725.html
15-Jul-2003 07:22
21k
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Size
Parent Directory
19-Jun-2003 11:00
precvsooe.960725.html
15-Jul-2003 07:22
21k
recursion.990209.html
09-Nov-2001 07:35
15k
11k
index2.html
27-May-2001 12:53
2k
precvsooe.960728.html
12-May-2001 08:10
13k
software_elegance.html
03-Sep-2000 10:20
9k
voidmain.960823.html
30-May-2000 21:12
18k
voidmain.990301.html
30-May-2000 21:10
12k
strings.980926.html
05-Dec-1999 10:00
10k
21k
undef.981105.html
26-Mar-1999 08:40
11k
4k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
index2.html
27-May-2001 12:53
2k
4k
undef.950311.html
03-Jan-1997 22:57
6k
software_elegance.html
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.981105.html
26-Mar-1999 08:40
11k
11k
voidmain.990301.html
30-May-2000 21:10
12k
precvsooe.960728.html
12-May-2001 08:10
13k
recursion.990209.html
09-Nov-2001 07:35
15k
voidmain.960823.html
30-May-2000 21:12
18k
undef.950321.html
03-Jan-1997 21:49
20k
21k
precvsooe.960725.html
21k
15-Jul-2003 07:22
Size
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
precvsooe.960725.html
15-Jul-2003 07:22
21k
21k
undef.950321.html
03-Jan-1997 21:49
20k
voidmain.960823.html
30-May-2000 21:12
18k
recursion.990209.html
09-Nov-2001 07:35
15k
precvsooe.960728.html
12-May-2001 08:10
13k
voidmain.990301.html
30-May-2000 21:10
12k
11k
undef.981105.html
26-Mar-1999 08:40
11k
strings.980926.html
05-Dec-1999 10:00
10k
software_elegance.html
03-Sep-2000 10:20
9k
undef.950311.html
03-Jan-1997 22:57
6k
4k
index2.html
2k
27-May-2001 12:53
Size
Description
Index of /~scs/readings
Index of /~scs/readings
Name
Last modified
Parent Directory
19-Jun-2003 11:00
21k
4k
index2.html
27-May-2001 12:53
2k
11k
precvsooe.960725.html
15-Jul-2003 07:22
21k
precvsooe.960728.html
12-May-2001 08:10
13k
recursion.990209.html
09-Nov-2001 07:35
15k
software_elegance.html
03-Sep-2000 10:20
9k
strings.980926.html
05-Dec-1999 10:00
10k
undef.950311.html
03-Jan-1997 22:57
6k
undef.950321.html
03-Jan-1997 21:49
20k
undef.981105.html
26-Mar-1999 08:40
11k
voidmain.960823.html
30-May-2000 21:12
18k
voidmain.990301.html
30-May-2000 21:10
12k
Size
Description
Recursion
[The text here is extracted from a pair of articles originally posted on February 9, 1999.]
From: scs@eskimo.com (Steve Summit)
Newsgroups: comp.lang.c
Subject: Re: Other examples of recursion?
Date: 10 Feb 1999 04:52:03 GMT
Message-ID: <79r39j$4vs$1@eskinews.eskimo.com>
Here is one way to think about recursion.
Think about where you live, and think about the last time it was really messy, and you asked your
roommate or spouse or significant other to help clean up. If you're unlucky, they picked up one totally
obvious but well-constrained and easy-to-pick-up piece of trash, and then decided that they'd done
enough, and curled up on the couch with a beer to watch a football game while you did all the rest of the
hard work. (Or, if your roommate/spouse/significant other is the unlucky one, perhaps you pulled this
stunt on them. It doesn't matter what the roles were, the point is that there was a totally unfair division of
labor.)
Shifting from housecleaning to computer programming, let's imagine it's our job to take a number (in
binary, say) and print out the digits of its base-ten representation in conventional left-to-right order. If
you've never attacked it before, this is an at least slightly tricky problem. You can divide by 10 to get
individual digits, but of course dividing by 10 gives you the rightmost digit first, so if you're not careful,
you end up with the digits in the wrong order.
But let's stick with this approach a bit longer, and see if we can get anywhere. We can start with the
original number, take the remainder when dividing it by 10, and print that digit. Then all we have to do it
figure out how to print the remaining digits, first. In pseudocode, we have something like
printdigits(int n)
{
int d;
print all but the last digit of n;
d = n % 10;
print digit d;
}
Printing one digit is easy -- the digit characters '0' to '9' are certain to have consecutive values in any
sane character set (and the ANSI/ISO C Standard mandates the choice of a character set that is sane in
this regard), so converting an integer d which is known to be in the range 0 to 9 simply requires adding
that integer as an offset to the character constant '0'. Applying that technique, and deferring the other
vaguely-defined part of the problem to a subfunction, we have a function which is no longer pseudocode,
but C:
http://www.eskimo.com/~scs/readings/recursion.990209.html (1 of 7) [22/07/2003 5:54:19 PM]
Recursion
printdigits(int n)
{
int d;
print_all_but_the_last_digit_of(n);
d = n % 10;
putchar('0' + d);
}
Notice how much like the lazy roommate this code is. It does the falling-off-a-log easy part, and lets
somebody else (namely, the hypothetical function print_all_but_the_last_digit_of) do all
the hard work. Moreover, it successfully pulls off an additional bit of brazen audacity which not even the
laziest roommate could get away with: it lets the somebody else do the hard part first, then waltzes back
from the couch and does the easy bit last, just before the final curtain falls and the applause begins. What
cheek!
Actually, we've got one piece of cheekiness left. We've still got to figure out how to print all but the last
digit of n, and of course we've got to print it as a string of left-to-right digits. So it looks as if we haven't
gotten anywhere -- (no surprise, right, with a roommate this lazy and unhelpful!) -- the
print_all_but_the_last_digit_of() function is going to be exactly as hard to write as
printdigits() was; the "work" that's been done (or that will have been done) so far hasn't helped us
out at all.
But wait -- printing out all but the last digit of n, in left-to-right-order, is exactly the same problem as
printing out the digits of the quantity n/10 in left-to-right order, where we use integer arithmetic to
compute n/10 and throw away the remainder. So we now have a glimmering idea of a staggeringly
audacious move which will make our earlier sloth seem virtuous by comparison: what if we call
ourselves to do the hard part, then finish up and be done with it? That is, what if we write
printdigits(int n)
{
int d;
printdigits(n/10);
d = n % 10;
putchar('0' + d);
}
where we've replaced the call to the unwritten function print_all_but_the_last_digit_of()
with a call to the function printdigits(), which is the function we were supposed to write in the
first place?? (A function which in fact we've already "written", except that our attempt so far seemed
laughingly incomplete and not at all functional...) We can't possibly get away with this, can we? This is
like trying to become rich by borrowing a dollar from yourself one million times so that you have
$1,000,000, or inventing a time machine by inventing a time machine and then going back in time and
http://www.eskimo.com/~scs/readings/recursion.990209.html (2 of 7) [22/07/2003 5:54:19 PM]
Recursion
giving it to yourself, or attempting to fly by pulling yourself up by your own bootstraps, or trying to
clean your room by asking your little brother to do it, except that you're an only child, and the "little
brother" you're asking is just your reflection in a mirror you're holding behind you, in which the
ephemeral little brother you're asking to do the work is simultaneously asking an even-more-ephemeral
little brother standing behind him, who is asking the reflection behind him, ad infinitum...
Remarkably enough, we can do this, as long as we back off from our stance of utter sloth by one little
notch. In the case when the number we're printing has only one digit, we don't need to delegate any of the
work. (Perhaps this decision only cements our reputation as a glory-hogging layabout: for one-digit
numbers, we take all the credit.) At any rate, the final implementation looks like this:
printdigits(int n)
{
int d;
if(n > 9)
printdigits(n/10);
d = n % 10;
putchar('0' + d);
}
The additional test breaks the infinite recursion; the function won't call itself forever. It will call itself as
many times as there are digits in the number, and the deepest recursive call will print the leftmost digit
first, and on the way back, each successive invocation will append another to the right. If you're having
any trouble seeing this, try rewriting the function as
printdigits(int n)
{
int d;
printf("printdigits(%d)\n", n);
if(n > 9)
printdigits(n/10);
d = n % 10;
printf("printing digit %c\n", '0' + d);
}
When called with a number such as 12345, you'll first see a flurry of recursive calls, as each invocation
madly tries to pawn the job off on somebody else, until finally there's only one digit left, and the
innermost recursive call condescends to do some actual work, and the logjam is broken, and the
containing calls chip in their pieces in succession, and the job is done, in a symphony of concerted
laziness.
Recursion
/ \
/ \
/ \
C
/ \
E
This is not the only binary tree containing the letters A through K; there's quite a number of different
ways of arranging it. But note that, for any letter, all the letters in the subtree below and to the left of it
come before it in the alphabet, and all the letters in the right subtree come after. (The top item in a node's
left subtree is not necessarily immediately less than the item at that node or anything; the immediatelypreceding item is merely down in the left subtree somewhere, along with all the rest of the preceding
items. In the tree I've just shown, for example, the letter F appears at the top of the tree although it's
obviously neither the first nor the last letter in the list. The letter B which is at the top of F's left subtree
does not immediately precede F; the letter immediately preceding F is E, which is indeed down in F's left
subtree somewhere.)
We could represent one node in such a tree with an instance of the structure
struct tnode
{
char letter;
struct tnode *left;
struct tnode *right;
};
To be clear: that's one structure describing one node; for a tree with many nodes we'd have many
instances of the structure. (We'd probably allocate them dynamically using malloc.)
This structure may look dangerous -- if we're trying to describe a struct tnode, what are those little
struct tnodes doing inside it? They're pointers to the node's left and right subtrees, and the reason
they're okay is precisely that they are merely pointers; they are not complete struct tnodes. (If
http://www.eskimo.com/~scs/readings/recursion.990209.html (4 of 7) [22/07/2003 5:54:19 PM]
Recursion
every struct tnode tried to contain one or more complete struct tnodes, the structure would
obviously need to be infinitely big.) It's perfectly acceptable (and quite common) for a structure to
contain pointers to other instances of itself; this is precisely how lists, trees, and all sorts of other linked
data structures are commonly implemented in C.
Suppose it's our job to print one of these binary trees. We've just been handed a pointer to the base (root)
of the tree. What do we do? The only node we've got easy access to is the root node, but as we saw, it's
not a very interesting node (i.e. not the first or the last element or anything); it's generally a random node
somewhere in the middle of the eventual sorted list (distinguished only by the fact that it happened to be
inserted first). The node that needs to be printed first is buried somewhere down in the left subtree, and
the node to print just before the node we've got easy access to is buried somewhere else down in the left
subtree, and the node to print next (after the one we've got) is buried somewhere down in the right
subtree. In fact, everything down in the left subtree is to be printed before the node we've got, and
everything down in the right subtree is to be printed after. A pseudocode description of our task,
therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another tree, so printing it out
sounds about as hard as printing an entire tree, which is what we were supposed to do in the first place.
In fact, it's exactly as hard: it's the same problem. Are we going in circles? Are we getting anywhere?
Yes, we are: the left subtree, even though it is still a tree, is at least smaller than the full tree we started
with. The same is true of the right subtree. Therefore, we can use a recursive call to do the hard work of
printing the subtrees, and all we have to do is the easy part: print the node we're at. The function might
look like this:
void treeprint(struct tnode *p)
{
if(p->left != NULL)
treeprint(p->left);
printf("%c\n", p->letter);
if(p->right != NULL)
treeprint(p->right);
}
In any recursive function, it is (obviously) important to terminate the recursion, that is, to make sure that
the function doesn't recursively call itself forever. In the case of binary trees, the fact that the left and
right subtrees are smaller (than the tree containing them, that is) gives us the leverage we need to make a
recursive algorithm work. When we reach a "leaf" of the tree (more precisely, when the left or right
subtree is a null pointer), there's nothing more to visit, so the recursion can stop. It turns out that we can
Recursion
test for this in two different ways, either before or after we make the "last" recursive call. In the code just
above, we tested before the call. Here's the other way to do it:
void treeprint(struct tnode *p)
{
if(p == NULL)
return;
treeprint(p->left);
printf("%c\n", p->letter);
treeprint(p->right);
}
Sometimes, there's little difference between one approach and the other. Here, though, the second
approach has a distinct advantage: it will work even if the very first call is on an empty tree. It's
extremely nice if programs work well at their boundary conditions, even if we don't think those
conditions are likely to occur.
Another impressive thing about a recursive treeprint function is that it's not just a way of writing it,
or a nifty way of writing it; it's really the only way of writing it. You might try to figure out how to write
a nonrecursive version. Once you've printed something down in the left subtree, how do you know where
to go back up to? Our struct tnode only has pointers down the tree; there aren't any pointers back to
the "parent" of each node. If you write a nonrecursive version, you have to keep track of how you got to
where you are, and it's not enough to keep track of the parent of the node you're at; you have to keep a
stack of all the nodes you've passed down through. (Even if we did have back pointers, it's not clear that
they'd give us enough extra information to work our way though the tree in the correct order.) When you
write a recursive version, on the other hand, the normal function-call stack essentially keeps track of all
this for you.
Recursion
else
}
int fib(int n)
{
static int fibs[24] = {0, 1};
if(n < 0 || n >= 24)
return -1;
if(fibs[n] == 0 && n > 0)
fibs[n] = fib(n-1) + fib(n-2);
return fibs[n];
}
[Both of these examples are lifted from Volume II, Chapter 3 of Macmillan's Handbook of Programming
Languages.]
The fib() implementation uses "memoization", involving static data, to avoid the exponential recursive
explosion which otherwise dooms Fibonacci numbers to being the classic example of when not to use
recursion.
Steve Summit
scs@eskimo.com
``Useful'' Links
Search engines
Dave Walker's Eskimo Weather Station
UW weather info
UW earthquake info
U.S. Patent & Trademark Office
U.S. Geological Survey (including EROS Data Center, USGS map data online!)
Northwest Folklife festival
The RISKS archive
DeLorme map company
U.S. Library of Congress (search)
Monty Python's Flying Circus
Unicode
NOAA Coastal & Estuarine Oceanography Branch, including tide predictions
U.S. Naval Observatory Directorate of Time
Transit and Transportation info
data file formats
cryptography, authentication, privacy, and anonymity links
Search Engines
Transportation info
Northeast
Other
Wotsit
James D. Murray's Graphics File Formats FAQ list (also published as the Encyclopedia of
Graphics File Formats)
The Audio File Formats FAQ (currently maintained by Chris Bagwell; originally by Guido van
Rossum)
.dbf files
Related Links
Home Pages
Personal:
Other:
Browser-specific home pages (the ones you get with default-configured browsers)
Eskimo North
RSA
Federal Express
U.S. Naval Observatory Directorate of Time
Full featured Linux friendly Internet dial and ISDN access, shell accounts, web hosting, usenet news and mail, with human tech
support as low as $12.00 per month (5-year dial plan L)
British
New
Manitoba
Newfoundland
Columbia
Brunswick
Prince
Ontario
Quebec Saskatchewan
Edward Is.
Arizona
Arkansas
California Colorado
Linux Friendly
System Information
Announcements
Eskimo Login Message *
General Announcements *
System Outages *
Miscellaneous
Equipment Details *
Eskimo North History *
Job Opportunities *
Links
Internal
Link to Us *
Random User Homepage *
User Pages Directory *
Business/Organizations
Animals *
Automotive *
Clothing / Accessories *
Computers / Internet *
Entertainment *
Food / Drink *
Gift Ideas *
Health / Medicine *
Home / Garden *
Media *
Recreation *
Science / Technology *
Society / Culture *
External
* Contact Us
* Find Your IP Address
* Swish Web-Indexing
* UnixHelp (local mirror)
* Virus Alerts and Filters
Search Dialup Pool
Search
States / Provinces
Phone Numbers
Cities/Regions
District of
Florida Georgia
Columbia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky Louisiana
Maine Maryland
Massachusetts Michigan Minnesota Mississippi Missouri
New
New
Montana
Nebraska
Nevada
Hampshire Jersey
North
North
New Mexico New York
Ohio
Carolina
Dakota
Rhode
South
Oklahoma
Oregon Pennsylvania
Island Carolina
South Dakota Tennessee
Texas
Utah
Vermont
Virginia
WashingtonWest Virginia Wisconsin Wyoming
Connecticut
Delaware
Copyright 1998-2003
Eskimo North
System Pages
User Pages (not yet)
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
Linux Friendly 56k V.90 and Dual-56k (112k MLP) Dial Access
Included Services
56K or 112K dial access includes, Shell access, E-mail, 2-mail lists, Usenet News, web hosting with CGI and
SSL, FTP hosting, and two IRC bots.
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
Dial access plan L is a combination of our own modem pools plus multiple national networks. Plan L provides
unlimited monthly usage, the longest session times, and the most access numbers. It is available as single or
dual channel, analog or ISDN, in most locations. 24x7 nailed use is available on Eskimo POPs only. We
recommend using our dial plan L modem pool. We can provide better service with our own POPs because we
have more control over factors that affect service. In areas where we don't have our own POPs, dial plan L has
multiple networks for redundancy.
The 12 hour session limit is only guaranteed on our own POPs. We pass this value in Radius to other networks
we use but some of them override it with shorter values.
24x7 service is available on Eskimo POPs only (the numbers in teal). 24x7 service disables idle and session
timeouts so your session will remain up indefinitely and using this service it is permissable to run servers over
your connection.
British Columbia
Ontario
Manitoba
Prince Edward Is.
New Brunswick
Quebec
Newfoundland
Saskatchewan
Arkansas
District of Columbia
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Florida
Indiana
Maine
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Colorado
Georgia
Iowa
Maryland
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Arizona
Delaware
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Setup
You will be called to confirm the information on your free trial account application. A 56k or dual-56k dial access
Username and Password will be e-mailed to you at the address specified on your application after successful
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
confirmation. The dial access Username and Password is used only for establishing your dial-up connection.
The login and password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP
read/post access, shell access, ftp access to your account, etc.
For help with specific operating system setup, click on one of the following links:
Linux
Windows 95/98
MacOS
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no refund.
Since a free trial period is offered and lower rates are given for long term accounts, we do not provide refunds.
This is true even if your account is terminated for abuse (activities listed above).
Shells!
We sell C-shells by the seashore!
Ok, not quite by the Seashore, about four miles from Puget Sound shore.
Apply for a FREE no risk, no obligation, two week trial by filing out the New User Application.
Shell Account Rates
Shell Subscription
Month
Term
Disk Space
Included
20MB
Quarter
20MB
Half-Year
20MB
One Year
Two Year
Five Year
50MB
110MB
300MB
Primary shell accounts allow you to use our host services from another access provider.
Primary Shell
US $12
US $30
US $54
US $96
US $180
US $420
Secondary shell accounts are available only as an adjunct to dial access subscriptions.
Secondary Shell
US $6
US $15
US $27
US $32
US $60
US $140
Longer period subscriptions cost less and reduce time that must be spent doing accounting
each day allowing more time to be spent improving services and addressing customer concerns.
20MB of disk space is included (50MB with yearly payments, 110MB with two-year
payments). Additional disk quota is available for $1 per ten megabytes per month.
Primary shell accounts allow you to use our host services from another access
provider. If you have a cable modem, DSL, satellite connection, or dial access
through another provider but wish to utilize our host services, such as shell access,
e-mail, news, and web hosting, then you need a primary shell account.
Secondary shell accounts are available as an adjunct to dial access subscriptions.
The provide an additional pop3 e-mail box and web space for additional family
members or employees in the same physical location using the same computer or
router and associated dial access connection.
Shell Services
One shell account is included with each of
the following account types:
dial access
UUCP news/e-mail
frame relay
NNTP read/post access.
dedicated dial-up PPP
network game hosting
Web Hosting.
For an additional fee, Virtual
Domains.
An anonymous FTP directory
Two Smartlist mailing lists *
Hosting for two IRC bots *
X applications
E-mail
Usenet News
Menu Interface (optional)
Emulated SliP/PPP
Smartlist lists are used for discussion groups, subscriptions lists (like newsletters,
etc.) and the like. These are not extra POP accounts. Contact support@eskimo.com
to set these up with a requested list name and admin password.
IRC bots are allowed only on a dedicated bot server, "chat.eskimo.com". If
compiling an eggdrop, you currently need to compile version 1.1.15 or earlier on
the main shell server ("eskimo.com") and connect to the "chat" server to run it.
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no
refund.
Since a free trial period is offered and lower rates are given for long term accounts, we do not
provide refunds. This is true even if your account is terminated for abuse (activities listed
above).
If you are dialing from within the United States, Canada or other country using the North
American dialing plan, please use the format XXX-XXX-XXXX. We can not process an
application which is missing the area code.
If you are dialing internationally please use the format CC-CTY-LOCAL where CC is your
country code, CTY is your city code (does not have to be three digits) and LOCAL is your local
telephone number. We can not process an application which is missing the country code
or city code.
Please separate the fields with dashes. We need to be able to tell which of the digits belong to
the various parts of the telephone number in order to dial it properly.
Telephone Number:
The form requires a valid-format email address. If you're applying from a public terminal, library
computer, etc., and you do not have an existing email address yet, you can put
"none@eskimo.com".
Your current email address:
Please select a login name to use on Eskimo North. Your login:
Login:
Please select a temporary password to use on Eskimo North. Your password should be:
Password:
Which type of account are you applying for?
(All dial-access accounts include one shell/POP3 account with the requested login.)
Dial plan L provides unlimited monthly hours, the longest session times, and the most access
numbers.
Select Plan
Dialup Availability
None
Eskimo Canada/USA
Worldwide
Clear Application
Content:
Copyright 1998, 1999
Eskimo North
Content Restrictions
We have two major web sites that all users may place material on. One is intended
for general viewing and the other is intended specifically for adult content. Material
placed in the directory "public_html" will be visible at the URL
http://www.eskimo.com/~login where login is your account login. Material placed in
this directory should be suited to viewing by all audiences. Nudity, sexual content,
graphic violence, or anything else not suitable to audiences of all ages should not
be placed in this directory. Material placed in the directory "adult_html" will be
visible at the URL http://adult.eskimo.com/~login. Material intended for an adult
audience should be placed in this directory. Material governed by Child Protection
Act must have access restricted to adults only. It is suggested that you use adult
check or similar adult validation services to restrict access.
Do not use this site in any method that is not consistant with all applicable local,
state, and federal laws. Doing so will be considered abuse and grounds for
account termination.
Disk Quotas
There is no limit to the number of pages you can place on the web except for that
imposed by your disk quota. The standard disk quota for a single user account is
20-110MB, depending on the payment plan selected. Additional quota is available
http://www.eskimo.com/services/hosting.html (1 of 2) [22/07/2003 5:54:59 PM]
for $1/10mb/month and must be purchased in 10mb increments for the term of the
subscription period.
CGI Scripts
CGI scripts are allowed and can be executed under your own private "cgi-bin"
directory. There is also a system "cgi-bin" directory and an assortment of CGI
programs already installed there for your use.
For more information on creating or uploading your web site to Eskimo North,
please see http://www.eskimo.com/support/www.html.
Virtual Domains
Virtual domain service provides web, ftp, and mail service under your own domain
name(s) hosted on our facilities. This service is available as an adjunct to dial
access or remote shell accounts.
Virtual Domain Rates
Discount
Month
Quarter
Half-Year
One Year
Two Year
Five Year
$15
$30
$54
$96
$180
$420
10%
$13.50
$27
$48.60
$86.40
$162
$378
20%
$12
$24
$43.20
$76.80
$144
$336
Longer period subscriptions cost less and reduce time that must be spent doing accounting
each day allowing more time to be spent improving services and addressing customer concerns.
called ISP-Hosted in some sections of the site) option. The "Fast-Track" option
will map the domain to their name servers and we will not be able to service it.
Additionally the "Fast-Track" and "Essentials" options cost more.
There are other competing registries. If you register your domain through a
competing registery and are successful, please let us know. I would be happy to
provide links to any competing registeries.
204.122.16.8
204.122.16.9
Required Information
Please include the following information with your payment for your virtual domain
when establishing a new domain or moving a domain from another provider:
Domain Name
Must be no more than 67 characters including the top level domain.
Must include the top-level domain: ".com", ".org", or ".net"
To register a domain in a national top-level domain check with us
first. Different nations place different restrictions on domain
registrations. For example, Hong Kong requires a business presence
in a Hong Kong business registry for a ".hk" domain.
Must include only letters, numbers, and dashes.
Must not start or end with a dash.
Organization Name
Organization Mailing Address
The Internic makes it very difficult to change your organization name
or address. They require a signed notorized letter of authorization on
company letterhead or their special form signed and notorized.
The above information must be included with your payment in order for us to
process your virtual domain request.
Virtual domains are mapped to either a dial access or remote shell account. We
need to know the login of that account in order to build the virtual domain. Also, if
your shell account expires, the files will be removed and the virtual domain will no
longer function.
By default, the domain will be mapped to "public_html" in your directory. If the
domain has adult content, it will be mapped to "adult_html". Please let us know in
advance if your domain will contain adult content.
Your domain may be mapped to a subdirectory under "public_html" or "adult_html"
in order to allow multiple virtual domains to be mapped to a single account or to
allow your personal web page to have content that is different from your virtual
domain. If you wish your virtual domain to be mapped to a subdirectory please be
sure to include that directory name.
Normally we will process your virtual domain request the day that payment is
received.
If you would like us to add a link to your site, please contact the site mangler.
Be sure to include the Title, URL, and a short description of your site.
http://www.eskimo.com/img/link/eskiplain.gif
http://www.eskimo.com/img/link/eskiborder.gif
http://www.eskimo.com/img/link/eskilittle2.gif
http://www.eskimo.com/img/link/eskilittle.gif
http://www.eskimo.com/img/link/eskitiny.gif
http://www.eskimo.com/img/link/eskiwhite.gif
http://www.eskimo.com/img/link/eskitiny2.gif
http://www.eskimo.com/img/link/eskiname.gif
http://www.eskimo.com/img/link/eskiname2.gif
If you have a virtual domain which allows your web files and ftp files to exist under
your own domain name, the root of the virtual domain would normally be mapped
to public_html or adult_html in your directory but if you have multiple domains or
wish to have your virtual domain separate from your personal web site, it is
possible for us to map your virtual domain to a sub-directory of one of those
directories.
http://www.eskimo.com/support/www.html (1 of 11) [22/07/2003 5:55:42 PM]
You can create your web pages on-line using any of the available Unix editors and
various image manipulation tools, or you can create your files off-line and then
upload them to our site. Either way, your main web page must be named
index.html, index.htm, or index.shtml (if using Server Side Includes, described
below) and placed in a directory called public_html if it is for a general audience
or adult_html if it is material intended for an adult audience.
To ftp your web files created off-line to our site, ftp ftp.eskimo.com. Be sure to use
ftp.eskimo.com as the ftp host, not eskimo.com, or you will be unable to connect.
Another thing that will prevent you from being able to connect to our ftp server is if
the DNS for your originating IP address is incorrect. Your IP address must have
forward and inverse DNS and they must agree with each other. Login to ftp using
your login and password rather than anonymous. This will place you in your home
directory here.
If the public_html or adult_html directory does not already exist, you will need to
create it before you can upload or create web pages on-line. You can do this by
typing:
mkdir public_html
-- or -mkdir adult_html
The mkdir command in ftp will create a directory with read, write, and search
permissions for the owner, and read and search (but not write) permissions for
everyone else. There is no need to change the permissions when you create the
directory in ftp. If you create the directory from the Unix shell, the permissions the
directory is created with will be influenced by the your umask value and may not
be initially correct. If you created the directory from the Unix shell, set the
permissions with the following command:
chmod 755 public_html
-- or --chmod 755 adult_html
Now you are ready to create or upload your web files. Move to the directory you
just created, using the following command:
cd public_html
-- or -http://www.eskimo.com/support/www.html (2 of 11) [22/07/2003 5:55:42 PM]
cd adult_html
If you are uploading your web files, you can now use the ftp put command to send
each file:
put index.html
put otherfiles.html
The ftp put command will create the files with read and write permissions for the
owner and read permissions for everyone else. These are the permissions you
want so it is not necessary to change the permissions if you uploaded your web
files with ftp.
If you are creating your files on-line you can use the text editor of your choice to
create the files now. The permissions on the files will be affected by your umask
value so they may not be initially correct. To set the file permissions correctly use
the following command for each file that you upload:
chmod 644 index.html
chmod 644 otherfiles.html
Under your public_html or index_html directories you can create any number of
sub-directories and web pages subject only to the limits of your disk quota. You
can use sub-directories to organize your web files. For example, you may wish to
create a sub-directory called images and place your images there. You might want
to create additional subdirectories under images to further organize the image
files. The directory separator used by Unix is the forward slash, '/' rather than the
backslash '\' used by Windows.
You can use CGI programs on your web site. To do so you will need to create a
cgi-bin directory under your public_html directory and you will need to make it
mode 711:
cd ~/public_html
mkdir cgi-bin
http://www.eskimo.com/support/www.html (3 of 11) [22/07/2003 5:55:42 PM]
WARNING!
CGI programs execute with your user ID and your group ID. Evil people can
assume your identiy and do evil things if you let them.
Poorly designed programs can result in damage to your files not limited to
your web site. When you put a CGI program on your web site you are
allowing people all over the Internet to execute programs with your
permissions.
Be very sure that your CGI programs do not allow a remote user to execute
arbitrary commands or read or write arbitrary filenames. Programs should
be very careful to eliminate any "../" back references, wild card or regular
expression references to filenames or commands, references to files or
commands starting at "/", and potential buffer overflow conditions which
can allow someone to cause arbitrary commands to be executed with your
permissions.
Secure/Encrypted Server
Presently secure browsing is available on the "commerce.eskimo.com" server
only, due to the way domain certificates are obtained. Using this, you can use a
CGI script in your directory to send secure data, such as preparing credit card
transactions (we're currently working on that aspect; until we have that up and
running, you'll need to use another server for the actual processing of such card
data). Use the following type of URL for secure browsing:
https://commerce.eskimo.com/~username/filename.html
-- or -https://commerce.eskimo.com/~username/cgi-bin/program.cgi
Normally, ".html" files are not parsed for SSI directives. There are two ways to tell
the server to parse a file for SSI directives. If you give the file an extension of
".shtml" the server will parse the file for SSI directives. Also, if you set the user
execute bit the file will be parsed for SSI directives. To do this type:
chmod u+x file.html
If you set the group execute bit, it will allow the content of the file to be cached in
proxy servers. Do this for STATIC content (pages kept in a file). Do not do this for
dynamic content such as a CGI script which produces different output each time it
is run. Do not do it in situations where you need to disallow content such as for a
counter or banner advertisement where a different advertisement is displayed for
each hit.
Another reason that you may wish to use sub-directories is so that you can
password protect a portion of your web site. You may wish to do this in order to
provide information only to internal members of an organization or to members of
a subscription service. The standard htaccess method of password protection is
supported. To use this method you need to create two files.
The following assumes that you are logged into the Unix shell. This is necessary
because some of the necessary commands like "encrypt" are not available from
ftp. There may be a Windows program that can generate encrypted password data
but I do not know of any off hand.
One file called ".htpasswd" contains a list of users and encrypted passwords. This
file should be placed in your home directory tree but it should NOT be placed
under either your public_html or adult_html directory. The reason for this is that
this file must have public read permission in order for the web server to be able to
access it but you do not want it available for download via the web because
people can run dictionary attack programs in an attempt to discover the passwords
that are encrypted within. The format of this file is as follows:
alice:kwzwe.fMQXsCU
bob:k2ZAZbMvR/eOM
charlie:keZqlq1fzdLxY
To create the encrypted passwords, use the encrypt command from the Unix shell:
encrypt password1
You get a response that looks like this:
"password1" = "kwzwe.fMQXsCU"
Once you have that you just need to paste the password into the file or copy it by
hand if you do not have cut-and-paste capability.
Another method of creating the ".htpasswd" file is with the shell command
"htpasswd":
Usage: htpasswd [-c] passwordfile username
The -c flag creates a new file.
Enter the password at both given prompts.
htpasswd -c .htpasswd alice
htpasswd .htpasswd bob
...
Once you have created the ".htpasswd" file you should make sure the permissions
are set correctly by using the following command:
chmod 644 .htpasswd
The second file required to password protect a portion of your web site is a file
called ".htaccess". This file must be placed in the directory to be protected and
protects the contents of that directory and any subdirectories nested within that
directory. The contents of this file tell the web server where to find the ".htpasswd"
file and what form of authentication to apply. In this example I am showing you
how to use password authentication to protect a portion of your web space. It is
also possible to limit access using groups, or by domain name or address space.
The format of the ".htaccess" file is:
AuthUserFile /u/l/login/.htpasswd
AuthGroupFile /dev/null
AuthName ByPassword
AuthType Basic
<Limit GET PUT POST>
require user alice
</Limit>
This limits the access to this directory to only 'alice'; although 'bob' and 'charlie'
have passwords, too, they can't access this directory. To allow anyone with a
password to gain entrance you can use:
...
<Limit GET PUT POST>
require valid-user
</Limit>
Substitute the path of your home directory for "/u/l/login". Case is important in the
above file, for example the "GET PUT POST" must be UPPER case, the "Limit"
must have an upper case "L" and lower case "imit". Placing all three limiters (GET,
PUT, and POST) there ensures that all access to that directory will be protected,
rather than any one specific access with others unlimited. The keywords in the
above file should be up against the left column with no spaces on the line prior to
the keywords.
Set the permissions so that the .htaccess file is publically readable:
chmod 644 .htaccess
Now when you attempt to access the URL corrosponding to the directory the
".htaccess" file is located in, your browser will prompt you for a username and
password. If you supply the correct password you will be permitted to access the
material requested, otherwise access will be denied.
Blocking Access by IP
There are situations where you may want to block (or allow) access based on the
hostname or IP address of the browsing user rather than use a password method.
For instance, if someone out there is flooding your logs with all sorts of errors
trying to find pages without using your links, etc., you can block access to their site
without blocking everyone else. Or, if you want your workstation at work to be able
to access a directory but no-one else, you can set the server to allow it from your
machine and deny it for everyone else. The trick lies in the same "<Limit GET PUT
POST>" section, but with different limiting rules.
To deny access to a set of addresses (sample address of "192.168.12.34"), while
allowing access from everywhere else, the pattern is:
<Limit GET PUT POST>
order allow,deny
allow from all
deny from 192.168.12.34
</Limit>
Similarly, to allow connections from a limited set of addresses (again with our
sample of "192.168.12.34"); note that the "order" changes as well:
<Limit GET PUT POST>
order deny,allow
deny from all
allow from 192.168.12.34
</Limit>
These limits can also be done with hostnames or patterns as well. For instance, a
directory specifically for Boeing employees might use:
<Limit GET PUT POST>
order deny,allow
deny from all
allow from .boeing.com
</Limit>
done to keep a similar look and format across an entire set of pages. The format
of this line is shown here to replace both error pages (remember to replace "login"
with your own login name):
ErrorDocument 401 /~login/401.html
ErrorDocument 404 /~login/404.html
Only 400-series errors can be redirected in this manner. Replace your login name
in the lines as well as the name of the file(s) you've created for the particular error
page. Remember that the error pages will need proper permissions (644)
themselves as well. This will change the referenced error outputs for the directory
including the ".htaccess" file as well as all subdirectories.
Hosting a Domain
Now that you have a website up on our server, you might want to have your own
domain name ("http://www.something-of-your-own.com/") rather than listing it as a
user site on our main server's name ("http://www.eskimo.com/~yourname"). For
information on how to do this (registering a domain, sending in needed payments
to us for domain hosting, etc.), please see our page about "Virtual Domain
Hosting".
Additional Resources
Subject
URL
A Beginners
Guide to http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimer.html
HTML
CGI
http://www.eskimo.com/cgihelp
Programming
HTML quick
http://www.cc.ukans.edu/~acs/docs/other/HTML_quick.shtml
reference
Starter Home
http://www.eskimo.com/old/starter.html
Page
Top Ten
Ways To Tell
If You Have
A Sucky
Home Page
http://jeffglover.com/sucky.html
[Home]
On this page you'll find a collection of links pointing to material relating to HTML and
CGI authoring, development and programming. If you know of a resource, which is not
currently listed below, please take a second to submit it and I'll add it to the list.
CGI
Mail Filtering
Newsgroups
alt.hypertext
CGI Specifications
comp.infosystems.www.authoring.cgi
comp.infosystems.www.authoring.html
Yahoo's Index
comp.infosystems.www.authoring.images
comp.infosystems.www.authoring.misc
comp.infosystems.www.announce
comp.infosystems.www.misc
comp.infosystems.www.providers
comp.infosystems.www.users
PERL
CGI Form Handling in Perl with cgi-lib.pl
HTML
Builder.Com
UNIX
HTML 4.0
HTML 4.01
Yahoo's Index
HTML Converters
HTML Quick Reference
WWW
HTML-Writers-Guild
Technical Specifications
Yahoo's Index
WWW OpenFAQs
If you have links you'd like to see listed in Web Authoring References page, please fill
in the form below. This form is for submitting links only. Comments and suggestions
should be sent to erik@skytouch.com.
Additional Comments:
Your First and/or Last Name:
Your E-mail Address:
Clear Form
Submit Form
Here's a quick rundown of the scripts, utilities and programs installed on Eskimo
North:
AnyURL
A CGI form processing program which provides a way to use
forms input to point to URLs without doing any programming.
Look here for more information.
cgi-lib.pl
A Perl library to manipulate CGI input. Look here for its
documentation.
CGI.pm
This Perl library uses perl5 objects to simplify the creation and
parsing of fill-out HTML forms. The CGI.pm library can also be
found in /usr/local/lib/perl5/ on Eskimo. Look here
for its documentation.
Count.cgi
A very powerful and fully customizable CGI
counter. Look here for more information.
Please limit yourself to one data file per account. All data files
must be of the form: your_login.dat. For example, if your
login name is JoeUser, the counter data file will have to be
defined as follows:
<IMG SRC="/cgibin/Count.cgi?df=JoeUser.dat">
Data files which do not comply with this format will be
automatically deleted. (Note: data files cannot be edited directly
on Eskimo, you'll need to use a form to do so.)
date
A simple Bourne shell script which returns the current date/time.
http://www.eskimo.com/cgihelp/cgi-list.html (1 of 2) [22/07/2003 5:56:04 PM]
$title
$title
$title
defined $fname; $incfn{$name} .= (defined $in{$name} ? "\0" : "") . (defined $fname ? $fname : "");
($ctype) = $ct =~ /^\s*Content-type:\s*"([^"]+)"/i; #"; ($ctype) = $ct =~ /^\s*Content-Type:\s*([^\s:;]+)/i
unless defined $ctype; $inct{$name} .= (defined $in{$name} ? "\0" : "") . $ctype; if ($writefiles &&
defined $fname) { $ser++; $fn = $writefiles . ".$$.$ser"; open (FILE, ">$fn") || &CgiDie("Couldn't open
$fn\n"); binmode (FILE); # write files accurately } substr($buf, 0, $lpos+4) = ''; undef $fname; undef
$ctype; } 1; END_MULTIPART if ($errflag) { local ($errmsg, $value); $errmsg = $@ || $errflag;
foreach $value (values %insfn) { unlink(split("\0",$value)); } &CgiDie($errmsg); } else { # everything's
ok. } } else { &CgiDie("cgi-lib.pl: Unknown Content-type: $ENV{'CONTENT_TYPE'}\n"); } # no-ops
to avoid warnings $insfn = $insfn; $incfn = $incfn; $inct = $inct; $^W = $perlwarn; return ($errflag ?
undef : scalar(@in)); } # PrintHeader # Returns the magic line which tells WWW that we're an HTML
document sub PrintHeader { return "Content-type: text/html\n\n"; } # HtmlTop # Returns the of a
document and the beginning of the body # with the title and a body
$title
$title
$title
$title
$title
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
package CGI;
require 5.001;
# See the bottom of this file for the POD documentation.
# string '=head'.
# You can run this file through either pod2man or pod2html to produce pretty
# documentation in manual or html file format (these utilities are part of the
# Perl 5 distribution).
#
#
#
#
#
# The most recent version and complete docs are available at:
#
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
#
ftp://ftp-genome.wi.mit.edu/pub/software/WWW/
# Modified by Erik C. Thauvin (erik@skytouch.com)
# Sun Apr 27 02:18:46 PDT 1997
#
# Added FILEPATH_INFO support.
# Set this to 1 to enable copious autoloader debugging messages
$AUTOLOAD_DEBUG=0;
# Set this to 1 to enable NPH scripts
# or:
#
1) use CGI qw(:nph)
#
2) $CGI::nph(1)
#
3) print header(-nph=>1)
$NPH=0;
$CGI::revision = '$Id: CGI.pm,v 2.35 1997/4/20 20:19 lstein Exp $';
$CGI::VERSION='2.35';
# OVERRIDE THE OS HERE IF CGI.pm GUESSES WRONG
$OS = 'UNIX';
# $OS = 'MACINTOSH';
# $OS = 'WINDOWS';
# $OS = 'VMS';
# $OS = 'OS2';
# HARD-CODED LOCATION FOR FILE UPLOAD TEMPORARY FILES.
# UNCOMMENT THIS ONLY IF YOU KNOW WHAT YOU'RE DOING.
$TempFile::TMPDIRECTORY = '/usr/tmp';
# ------------------ START OF THE LIBRARY -----------# FIGURE OUT THE OS WE'RE RUNNING UNDER
# Some systems support the $^O variable. If not
# available then require() the Config library
unless ($OS) {
unless ($OS = $^O) {
require Config;
$OS = $Config::Config{'osname'};
}
}
if ($OS=~/Win/i) {
$OS = 'WINDOWS';
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (1 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
} elsif ($OS=~/vms/i) {
$OS = 'VMS';
} elsif ($OS=~/Mac/i) {
$OS = 'MACINTOSH';
} elsif ($OS=~/os2/i) {
$OS = 'OS2';
} else {
$OS = 'UNIX';
}
# Some OS logic. Binary mode enabled on DOS, NT and VMS
$needs_binmode = $OS=~/^(WINDOWS|VMS|OS2)/;
# This is the default class for the CGI object to use when all else fails.
$DefaultClass = 'CGI' unless defined $CGI::DefaultClass;
# This is where to look for autoloaded routines.
$AutoloadClass = $DefaultClass unless defined $CGI::AutoloadClass;
# The path separator is a slash, backslash or semicolon, depending
# on the paltform.
$SL = {
UNIX=>'/',
OS2=>'\\',
WINDOWS=>'\\',
MACINTOSH=>':',
VMS=>'\\'
}->{$OS};
# Turn on NPH scripts by default when running under IIS server!
$NPH++ if defined($ENV{'SERVER_SOFTWARE'}) && $ENV{'SERVER_SOFTWARE'}=~/IIS/;
# Turn on special checking for Doug MacEachern's modperl
if (defined($ENV{'GATEWAY_INTERFACE'}) && ($MOD_PERL = $ENV{'GATEWAY_INTERFACE'} =~ /^CGI-Perl/))
{
$NPH++;
$| = 1;
$SEQNO = 1;
}
# This is really "\r\n", but the meaning of \n is different
# in MacPerl, so we resort to octal here.
$CRLF = "\015\012";
if ($needs_binmode) {
$CGI::DefaultClass->binmode(main::STDOUT);
$CGI::DefaultClass->binmode(main::STDIN);
$CGI::DefaultClass->binmode(main::STDERR);
}
# Cute feature, but it broke when the overload mechanism changed...
# %OVERLOAD = ('""'=>'as_string');
%EXPORT_TAGS = (
':html2'=>[h1..h6,qw/p br hr ol ul li dl dt dd menu code var strong em
tt i b blockquote pre img a address cite samp dfn html head
base body link nextid title meta kbd start_html end_html
input Select option/],
':html3'=>[qw/div table caption th td TR Tr super sub strike applet PARAM embed
basefont/],
':netscape'=>[qw/blink frameset frame script font fontsize center/],
':form'=>[qw/textfield textarea filefield password_field hidden checkbox
checkbox_group
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$CGI::DefaultClass->_reset_globals() if $MOD_PERL;
$initializer = to_filehandle($initializer) if $initializer;
$self->init($initializer);
return $self;
}
# We provide a DESTROY method so that the autoloader
# doesn't bother trying to find it.
sub DESTROY { }
#### Method: param
# Returns the value(s)of a named parameter.
# If invoked in a list context, returns the
# entire list. Otherwise returns the first
# member of the list.
# If name is not provided, return a list of all
# the known parameters names available.
# If more than one argument is provided, the
# second and subsequent arguments are used to
# set the value of the parameter.
####
sub param {
my($self,@p) = self_or_default(@_);
return $self->all_parameters unless @p;
my($name,$value,@other);
# For compatibility between old calling style and use_named_parameters() style,
# we have to special case for a single parameter present.
if (@p > 1) {
($name,$value,@other) = $self->rearrange([NAME,[DEFAULT,VALUE,VALUES]],@p);
my(@values);
if (substr($p[0],0,1) eq '-' || $self->use_named_parameters) {
@values = defined($value) ? (ref($value) && ref($value) eq 'ARRAY' ? @{$value} :
$value) : ();
} else {
foreach ($value,@other) {
push(@values,$_) if defined($_);
}
}
# If values is provided, then we set it.
if (@values) {
$self->add_parameter($name);
$self->{$name}=[@values];
}
} else {
$name = $p[0];
}
return () unless defined($name) && $self->{$name};
return wantarray ? @{$self->{$name}} : $self->{$name}->[0];
}
#### Method: delete
# Deletes the named parameter entirely.
####
sub delete {
my($self,$name) = self_or_default(@_);
delete $self->{$name};
delete $self->{'.fieldnames'}->{$name};
@{$self->{'.parameters'}}=grep($_ ne $name,$self->param());
return wantarray ? () : undef;
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (4 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
}
sub self_or_default {
return @_ if defined($_[0]) && !ref($_[0]) && ($_[0] eq 'CGI');
unless (defined($_[0]) &&
ref($_[0]) &&
(ref($_[0]) eq 'CGI' ||
eval "\$_[0]->isaCGI()")) { # optimize for the common case
$CGI::DefaultClass->_reset_globals()
if defined($Q) && $MOD_PERL && $CGI::DefaultClass->_new_request();
$Q = $CGI::DefaultClass->new unless defined($Q);
unshift(@_,$Q);
}
return @_;
}
sub _new_request {
return undef unless (defined(Apache->seqno()) or eval { require Apache });
if (Apache->seqno() != $SEQNO) {
$SEQNO = Apache->seqno();
return 1;
} else {
return undef;
}
}
sub _reset_globals {
undef $Q;
undef @QUERY_PARAM;
}
sub self_or_CGI {
local $^W=0;
# prevent a warning
if (defined($_[0]) &&
(substr(ref($_[0]),0,3) eq 'CGI'
|| eval "\$_[0]->isaCGI()")) {
return @_;
} else {
return ($DefaultClass,@_);
}
}
sub isaCGI {
return 1;
}
#### Method: import_names
# Import all parameters into the given namespace.
# Assumes namespace 'Q' if not specified
####
sub import_names {
my($self,$namespace) = self_or_default(@_);
$namespace = 'Q' unless defined($namespace);
die "Can't import names into 'main'\n"
if $namespace eq 'main';
my($param,@value,$var);
foreach $param ($self->param) {
# protect against silly names
($var = $param)=~tr/a-zA-Z0-9_/_/c;
$var = "${namespace}::$var";
@value = $self->param($param);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
@{$var} = @value;
${$var} = $value[0];
}
}
#### Method: use_named_parameters
# Force CGI.pm to use named parameter-style method calls
# rather than positional parameters. The same effect
# will happen automatically if the first parameter
# begins with a -.
sub use_named_parameters {
my($self,$use_named) = self_or_default(@_);
return $self->{'.named'} unless defined ($use_named);
# stupidity to avoid annoying warnings
return $self->{'.named'}=$use_named;
}
########################################
# THESE METHODS ARE MORE OR LESS PRIVATE
# GO TO THE __DATA__ SECTION TO SEE MORE
# PUBLIC METHODS
########################################
#
#
#
#
#
#
sub init {
my($self,$initializer) = @_;
my($query_string,@lines);
my($meth) = '';
# if we get called more than once, we want to initialize
# ourselves from the original query (which may be gone
# if it was read from STDIN originally.)
if (defined(@QUERY_PARAM) && !defined($initializer)) {
foreach (@QUERY_PARAM) {
$self->param('-name'=>$_,'-value'=>$QUERY_PARAM{$_});
}
return;
}
$meth=$ENV{'REQUEST_METHOD'} if defined($ENV{'REQUEST_METHOD'});
# If initializer is defined, then read parameters
# from it.
METHOD: {
if (defined($initializer)) {
if (ref($initializer) && ref($initializer) eq 'HASH') {
foreach (keys %$initializer) {
$self->param('-name'=>$_,'-value'=>$initializer->{$_});
}
last METHOD;
}
$initializer = $$initializer if ref($initializer);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
if (defined(fileno($initializer))) {
while (<$initializer>) {
chomp;
last if /^=/;
push(@lines,$_);
}
# massage back into standard format
if ("@lines" =~ /=/) {
$query_string=join("&",@lines);
} else {
$query_string=join("+",@lines);
}
last METHOD;
}
$query_string = $initializer;
last METHOD;
}
# If method is GET or HEAD, fetch the query from
# the environment.
if ($meth=~/^(GET|HEAD)$/) {
$query_string = $ENV{'QUERY_STRING'};
last METHOD;
}
# If the method is POST, fetch the query from standard
# input.
if ($meth eq 'POST') {
if (defined($ENV{'CONTENT_TYPE'})
&&
$ENV{'CONTENT_TYPE'}=~m|^multipart/form-data|) {
my($boundary) = $ENV{'CONTENT_TYPE'}=~/boundary=(\S+)/;
$self->read_multipart($boundary,$ENV{'CONTENT_LENGTH'});
} else {
$self->read_from_client(\*STDIN,\$query_string,$ENV{'CONTENT_LENGTH'},0)
if $ENV{'CONTENT_LENGTH'} > 0;
}
# Some people want to have their cake and eat it too!
# Uncomment this line to have the contents of the query string
# APPENDED to the POST data.
# $query_string .= ($query_string ? '&' : '') . $ENV{'QUERY_STRING'} if
$ENV{'QUERY_STRING'};
last METHOD;
}
# If neither is set, assume we're being debugged offline.
# Check the command line and then the standard input for data.
# We use the shellwords package in order to behave the way that
# UN*X programmers expect.
$query_string = &read_from_cmdline;
}
# We now have the query string in hand. We do slightly
# different things for keyword lists and parameter lists.
if ($query_string) {
if ($query_string =~ /=/) {
$self->parse_params($query_string);
} else {
$self->add_parameter('keywords');
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (7 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$self->{'keywords'} = [$self->parse_keywordlist($query_string)];
}
}
# Special case. Erase everything if there is a field named
# .defaults.
if ($self->param('.defaults')) {
undef %{$self};
}
# Associative array containing our defined fieldnames
$self->{'.fieldnames'} = {};
foreach ($self->param('.cgifields')) {
$self->{'.fieldnames'}->{$_}++;
}
# Clear out our default submission button flag if present
$self->delete('.submit');
$self->delete('.cgifields');
$self->save_request unless $initializer;
}
# FUNCTIONS TO OVERRIDE:
# Turn a string into a filehandle
sub to_filehandle {
my $string = shift;
if ($string && !ref($string)) {
my($package) = caller(1);
my($tmp) = $string=~/[':]/ ? $string : "$package\:\:$string";
return $tmp if defined(fileno($tmp));
}
return $string;
}
# Create a new multipart buffer
sub new_MultipartBuffer {
my($self,$boundary,$length,$filehandle) = @_;
return MultipartBuffer->new($self,$boundary,$length,$filehandle);
}
# Read data from a file handle
sub read_from_client {
my($self, $fh, $buff, $len, $offset) = @_;
local $^W=0;
# prevent a warning
return read($fh, $$buff, $len, $offset);
}
# put a filehandle into binary mode (DOS)
sub binmode {
binmode($_[1]);
}
# send output to the browser
sub put {
my($self,@p) = self_or_default(@_);
$self->print(@p);
}
# print to standard output (for overriding in mod_perl)
sub print {
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (8 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
shift;
CORE::print(@_);
}
# unescape URL-encoded data
sub unescape {
my($todecode) = @_;
$todecode =~ tr/+/ /;
# pluses become spaces
$todecode =~ s/%([0-9a-fA-F]{2})/pack("c",hex($1))/ge;
return $todecode;
}
# URL-encode data
sub escape {
my($toencode) = @_;
$toencode=~s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
return $toencode;
}
sub save_request {
my($self) = @_;
# We're going to play with the package globals now so that if we get called
# again, we initialize ourselves in exactly the same way. This allows
# us to have several of these objects.
@QUERY_PARAM = $self->param; # save list of parameters
foreach (@QUERY_PARAM) {
$QUERY_PARAM{$_}=$self->{$_};
}
}
sub parse_keywordlist {
my($self,$tosplit) = @_;
$tosplit = &unescape($tosplit); # unescape the keywords
$tosplit=~tr/+/ /;
# pluses to spaces
my(@keywords) = split(/\s+/,$tosplit);
return @keywords;
}
sub parse_params {
my($self,$tosplit) = @_;
my(@pairs) = split('&',$tosplit);
my($param,$value);
foreach (@pairs) {
($param,$value) = split('=');
$param = &unescape($param);
$value = &unescape($value);
$self->add_parameter($param);
push (@{$self->{$param}},$value);
}
}
sub add_parameter {
my($self,$param)=@_;
push (@{$self->{'.parameters'}},$param)
unless defined($self->{$param});
}
sub all_parameters {
my $self = shift;
return () unless defined($self) && $self->{'.parameters'};
return () unless @{$self->{'.parameters'}};
return @{$self->{'.parameters'}};
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (9 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
}
#### Method as_string
#
# synonym for "dump"
####
sub as_string {
&dump(@_);
}
sub AUTOLOAD {
print STDERR "CGI::AUTOLOAD for $AUTOLOAD\n" if $CGI::AUTOLOAD_DEBUG;
my($func) = $AUTOLOAD;
my($pack,$func_name) = $func=~/(.+)::([^:]+)$/;
$pack = ${"$pack\:\:AutoloadClass"} || $CGI::DefaultClass
unless defined(${"$pack\:\:AUTOLOADED_ROUTINES"});
my($sub) = \%{"$pack\:\:SUBS"};
unless (%$sub) {
my($auto) = \${"$pack\:\:AUTOLOADED_ROUTINES"};
eval "package $pack; $$auto";
die $@ if $@;
}
my($code) = $sub->{$func_name};
$code = "sub $AUTOLOAD { }" if (!$code and $func_name eq 'DESTROY');
if (!$code) {
if ($EXPORT{':any'} ||
$EXPORT{$func_name} ||
(%EXPORT_OK || grep(++$EXPORT_OK{$_},&expand_tags(':html')))
&& $EXPORT_OK{$func_name}) {
$code = $sub->{'HTML_FUNC'};
$code=~s/func_name/$func_name/mg;
}
}
die "Undefined subroutine $AUTOLOAD\n" unless $code;
eval "package $pack; $code";
if ($@) {
$@ =~ s/ at .*\n//;
die $@;
}
goto &{"$pack\:\:$func_name"};
}
# PRIVATE SUBROUTINE
# Smart rearrangement of parameters to allow named parameter
# calling. We do the rearangement if:
# 1. The first parameter begins with a # 2. The use_named_parameters() method returns true
sub rearrange {
my($self,$order,@param) = @_;
return () unless @param;
return @param unless (defined($param[0]) && substr($param[0],0,1) eq '-')
|| $self->use_named_parameters;
my $i;
for ($i=0;$i<@param;$i+=2) {
$param[$i]=~s/^\-//;
# get rid of initial - if present
$param[$i]=~tr/a-z/A-Z/; # parameters are upper case
}
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
my(%param) = @param;
my(@return_array);
my($key)='';
foreach $key (@$order) {
my($value);
# this is an awful hack to fix spurious warnings when the
# -w switch is set.
if (ref($key) && ref($key) eq 'ARRAY') {
foreach (@$key) {
last if defined($value);
$value = $param{$_};
delete $param{$_};
}
} else {
$value = $param{$key};
delete $param{$key};
}
push(@return_array,$value);
}
push (@return_array,$self->make_attributes(\%param)) if %param;
return (@return_array);
}
###############################################################################
################# THESE FUNCTIONS ARE AUTOLOADED ON DEMAND ####################
###############################################################################
$AUTOLOADED_ROUTINES = '';
# get rid of -w warning
$AUTOLOADED_ROUTINES=<<'END_OF_AUTOLOAD';
%SUBS = (
'URL_ENCODED'=> <<'END_OF_FUNC',
sub URL_ENCODED { 'application/x-www-form-urlencoded'; }
END_OF_FUNC
'MULTIPART' => <<'END_OF_FUNC',
sub MULTIPART { 'multipart/form-data'; }
END_OF_FUNC
'HTML_FUNC' => <<'END_OF_FUNC',
sub func_name {
# handle various cases in which we're called
# most of this bizarre stuff is to avoid -w errors
shift if $_[0] &&
(!ref($_[0]) && $_[0] eq $CGI::DefaultClass) ||
(ref($_[0]) &&
(substr(ref($_[0]),0,3) eq 'CGI' ||
eval "\$_[0]->isaCGI()"));
my($attr) = '';
if (ref($_[0]) && ref($_[0]) eq 'HASH') {
my(@attr) = CGI::make_attributes('',shift);
$attr = " @attr" if @attr;
}
my($tag,$untag) = ("\U<func_name\E$attr>","\U</func_name>\E");
return $tag unless @_;
if (ref($_[0]) eq 'ARRAY') {
my(@r);
foreach (@{$_[0]}) {
push(@r,"$tag$_$untag");
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
}
return "@r";
} else {
return "$tag@_$untag";
}
}
END_OF_FUNC
#### Method: keywords
# Keywords acts a bit differently. Calling it in a list context
# returns the list of keywords.
# Calling it in a scalar context gives you the size of the list.
####
'keywords' => <<'END_OF_FUNC',
sub keywords {
my($self,@values) = self_or_default(@_);
# If values is provided, then we set it.
$self->{'keywords'}=[@values] if @values;
my(@result) = @{$self->{'keywords'}};
@result;
}
END_OF_FUNC
# These are some tie() interfaces for compatibility
# with Steve Brenner's cgi-lib.pl routines
'ReadParse' => <<'END_OF_FUNC',
sub ReadParse {
local(*in);
if (@_) {
*in = $_[0];
} else {
my $pkg = caller();
*in=*{"${pkg}::in"};
}
tie(%in,CGI);
}
END_OF_FUNC
'PrintHeader' => <<'END_OF_FUNC',
sub PrintHeader {
my($self) = self_or_default(@_);
return $self->header();
}
END_OF_FUNC
'HtmlTop' => <<'END_OF_FUNC',
sub HtmlTop {
my($self,@p) = self_or_default(@_);
return $self->start_html(@p);
}
END_OF_FUNC
'HtmlBot' => <<'END_OF_FUNC',
sub HtmlBot {
my($self,@p) = self_or_default(@_);
return $self->end_html(@p);
}
END_OF_FUNC
'SplitParam' => <<'END_OF_FUNC',
sub SplitParam {
my ($param) = @_;
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (12 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
#
####
'redirect' => <<'END_OF_FUNC',
sub redirect {
my($self,@p) = self_or_default(@_);
my($url,$target,$cookie,$nph,@other) = $self->rearrange([[URI,URL],TARGET,COOKIE,NPH],@p);
$url = $url || $self->self_url;
my(@o);
foreach (@other) { push(@o,split("=")); }
if($MOD_PERL or exists $self->{'.req'}) {
my $r = $self->{'.req'} || Apache->request;
$r->header_out(Location => $url);
$r->err_header_out(Location => $url);
$r->status(302);
return;
}
push(@o,
'-Status'=>'302 Found',
'-Location'=>$url,
'-URI'=>$url,
'-nph'=>($nph||$NPH));
push(@o,'-Target'=>$target) if $target;
push(@o,'-Cookie'=>$cookie) if $cookie;
return $self->header(@o);
}
END_OF_FUNC
#### Method: start_html
# Canned HTML header
#
# Parameters:
# $title -> (optional) The title for this HTML document (-title)
# $author -> (optional) e-mail address of the author (-author)
# $base -> (optional) if set to true, will enter the BASE address of this document
#
for resolving relative references (-base)
# $xbase -> (optional) alternative base at some remote location (-xbase)
# $target -> (optional) target window to load all links into (-target)
# $script -> (option) Javascript code (-script)
# $no_script -> (option) Javascript <noscript> tag (-noscript)
# $meta -> (optional) Meta information tags
# @other -> (optional) any other named parameters you'd like to incorporate into
#
the <BODY> tag.
####
'start_html' => <<'END_OF_FUNC',
sub start_html {
my($self,@p) = &self_or_default(@_);
my($title,$author,$base,$xbase,$script,$noscript,$target,$meta,@other) =
$self->rearrange([TITLE,AUTHOR,BASE,XBASE,SCRIPT,NOSCRIPT,TARGET,META],@p);
# strangely enough, the title needs to be escaped as HTML
# while the author needs to be escaped as a URL
$title = $self->escapeHTML($title || 'Untitled Document');
$author = $self->escapeHTML($author);
my(@result);
push(@result,'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">');
push(@result,"<HTML><HEAD><TITLE>$title</TITLE>");
push(@result,"<LINK REV=MADE HREF=\"mailto:$author\">") if $author;
if ($base || $xbase || $target) {
my $href = $xbase || $self->url();
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
################################
# METHODS USED IN BUILDING FORMS
################################
#### Method: isindex
# Just prints out the isindex tag.
# Parameters:
# $action -> optional URL of script to run
# Returns:
#
A string containing a <ISINDEX> tag
'isindex' => <<'END_OF_FUNC',
sub isindex {
my($self,@p) = self_or_default(@_);
my($action,@other) = $self->rearrange([ACTION],@p);
$action = qq/ACTION="$action"/ if $action;
my($other) = @other ? " @other" : '';
return "<ISINDEX $action$other>";
}
END_OF_FUNC
#### Method: startform
# Start a form
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
# Parameters:
#
$method -> optional submission method to use (GET or POST)
#
$action -> optional URL of script to run
#
$enctype ->encoding to use (URL_ENCODED or MULTIPART)
'startform' => <<'END_OF_FUNC',
sub startform {
my($self,@p) = self_or_default(@_);
my($method,$action,$enctype,@other) =
$self->rearrange([METHOD,ACTION,ENCTYPE],@p);
$method = $method || 'POST';
$enctype = $enctype || &URL_ENCODED;
$action = $action ? qq/ACTION="$action"/ : $method eq 'GET' ?
'ACTION="'.$self->script_name.'"' : '';
my($other) = @other ? " @other" : '';
$self->{'.parametersToAdd'}={};
return qq/<FORM METHOD="$method" $action ENCTYPE="$enctype"$other>\n/;
}
END_OF_FUNC
#### Method: start_form
# synonym for startform
'start_form' => <<'END_OF_FUNC',
sub start_form {
&startform;
}
END_OF_FUNC
#### Method: start_multipart_form
# synonym for startform
'start_multipart_form' => <<'END_OF_FUNC',
sub start_multipart_form {
my($self,@p) = self_or_default(@_);
if ($self->use_named_parameters ||
(defined($param[0]) && substr($param[0],0,1) eq '-')) {
my(%p) = @p;
$p{'-enctype'}=&MULTIPART;
return $self->startform(%p);
} else {
my($method,$action,@other) =
$self->rearrange([METHOD,ACTION],@p);
return $self->startform($method,$action,&MULTIPART,@other);
}
}
END_OF_FUNC
#### Method: endform
# End a form
'endform' => <<'END_OF_FUNC',
sub endform {
my($self,@p) = self_or_default(@_);
return ($self->get_fields,"</FORM>");
}
END_OF_FUNC
#### Method: end_form
# synonym for endform
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
my($name,$checked,$value,$label,$override,@other) =
$self->rearrange([NAME,[CHECKED,SELECTED,ON],VALUE,LABEL,[OVERRIDE,FORCE]],@p);
if (!$override && defined($self->param($name))) {
$value = $self->param($name) unless defined $value;
$checked = $self->param($name) eq $value ? ' CHECKED' : '';
} else {
$checked = $checked ? ' CHECKED' : '';
$value = defined $value ? $value : 'on';
}
my($the_label) = defined $label ? $label : $name;
$name = $self->escapeHTML($name);
$value = $self->escapeHTML($value);
$the_label = $self->escapeHTML($the_label);
my($other) = @other ? " @other" : '';
$self->register_parameter($name);
return <<END;
<INPUT TYPE="checkbox" NAME="$name" VALUE="$value"$checked$other>$the_label
END
}
END_OF_FUNC
#### Method: checkbox_group
# Create a list of logically-linked checkboxes.
# Parameters:
#
$name -> Common name for all the check boxes
#
$values -> A pointer to a regular array containing the
#
values for each checkbox in the group.
#
$defaults -> (optional)
#
1. If a pointer to a regular array of checkbox values,
#
then this will be used to decide which
#
checkboxes to turn on by default.
#
2. If a scalar, will be assumed to hold the
#
value of a single checkbox in the group to turn on.
#
$linebreak -> (optional) Set to true to place linebreaks
#
between the buttons.
#
$labels -> (optional)
#
A pointer to an associative array of labels to print next to each checkbox
#
in the form $label{'value'}="Long explanatory label".
#
Otherwise the provided values are used as the labels.
# Returns:
#
An ARRAY containing a series of <INPUT TYPE="checkbox"> fields
####
'checkbox_group' => <<'END_OF_FUNC',
sub checkbox_group {
my($self,@p) = self_or_default(@_);
my($name,$values,$defaults,$linebreak,$labels,$rows,$columns,
$rowheaders,$colheaders,$override,$nolabels,@other) =
$self->rearrange([NAME,[VALUES,VALUE],[DEFAULTS,DEFAULT],
LINEBREAK,LABELS,ROWS,[COLUMNS,COLS],
ROWHEADERS,COLHEADERS,
[OVERRIDE,FORCE],NOLABELS],@p);
my($checked,$break,$result,$label);
my(%checked) = $self->previous_or_default($name,$defaults,$override);
$break = $linebreak ? "<BR>" : '';
$name=$self->escapeHTML($name);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$result .= "</TABLE>";
return $result;
}
END_OF_FUNC
#### Method: radio_group
# Create a list of logically-linked radio buttons.
# Parameters:
#
$name -> Common name for all the buttons.
#
$values -> A pointer to a regular array containing the
#
values for each button in the group.
#
$default -> (optional) Value of the button to turn on by default. Pass '-'
#
to turn _nothing_ on.
#
$linebreak -> (optional) Set to true to place linebreaks
#
between the buttons.
#
$labels -> (optional)
#
A pointer to an associative array of labels to print next to each checkbox
#
in the form $label{'value'}="Long explanatory label".
#
Otherwise the provided values are used as the labels.
# Returns:
#
An ARRAY containing a series of <INPUT TYPE="radio"> fields
####
'radio_group' => <<'END_OF_FUNC',
sub radio_group {
my($self,@p) = self_or_default(@_);
my($name,$values,$default,$linebreak,$labels,
$rows,$columns,$rowheaders,$colheaders,$override,$nolabels,@other) =
$self->rearrange([NAME,[VALUES,VALUE],DEFAULT,LINEBREAK,LABELS,
ROWS,[COLUMNS,COLS],
ROWHEADERS,COLHEADERS,
[OVERRIDE,FORCE],NOLABELS],@p);
my($result,$checked);
if (!$override && defined($self->param($name))) {
$checked = $self->param($name);
} else {
$checked = $default;
}
# If no check array is specified, check the first by default
$checked = $values->[0] unless $checked;
$name=$self->escapeHTML($name);
my(@elements);
my(@values) = $values ? @$values : $self->param($name);
my($other) = @other ? " @other" : '';
foreach (@values) {
my($checkit) = $checked eq $_ ? ' CHECKED' : '';
my($break) = $linebreak ? '<BR>' : '';
my($label)='';
unless (defined($nolabels) && $nolabels) {
$label = $_;
$label = $labels->{$_} if defined($labels) && $labels->{$_};
$label = $self->escapeHTML($label);
}
$_=$self->escapeHTML($_);
push(@elements,qq/<INPUT TYPE="radio" NAME="$name" VALUE="$_"$checkit$other>${label}
${break}/);
}
$self->register_parameter($name);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
#
lines to turn on by default.
#
2. Otherwise holds the value of the single line to turn on.
#
$size -> (optional) Size of the list.
#
$multiple -> (optional) If set, allow multiple selections.
#
$labels -> (optional)
#
A pointer to an associative array of labels to print next to each checkbox
#
in the form $label{'value'}="Long explanatory label".
#
Otherwise the provided values are used as the labels.
# Returns:
#
A string containing the definition of a scrolling list.
####
'scrolling_list' => <<'END_OF_FUNC',
sub scrolling_list {
my($self,@p) = self_or_default(@_);
my($name,$values,$defaults,$size,$multiple,$labels,$override,@other)
= $self->rearrange([NAME,[VALUES,VALUE],[DEFAULTS,DEFAULT],
SIZE,MULTIPLE,LABELS,[OVERRIDE,FORCE]],@p);
my($result);
my(@values) = $values ? @$values : $self->param($name);
$size = $size || scalar(@values);
my(%selected) = $self->previous_or_default($name,$defaults,$override);
my($is_multiple) = $multiple ? ' MULTIPLE' : '';
my($has_size) = $size ? " SIZE=$size" : '';
my($other) = @other ? " @other" : '';
$name=$self->escapeHTML($name);
$result = qq/<SELECT NAME="$name"$has_size$is_multiple$other>\n/;
foreach (@values) {
my($selectit) = $selected{$_} ? 'SELECTED' : '';
my($label) = $_;
$label = $labels->{$_} if defined($labels) && $labels->{$_};
$label=$self->escapeHTML($label);
my($value)=$self->escapeHTML($_);
$result .= "<OPTION $selectit VALUE=\"$value\">$label\n";
}
$result .= "</SELECT>\n";
$self->register_parameter($name);
return $result;
}
END_OF_FUNC
#### Method: hidden
# Parameters:
#
$name -> Name of the hidden field
#
@default -> (optional) Initial values of field (may be an array)
#
or
#
$default->[initial values of field]
# Returns:
#
A string containing a <INPUT TYPE="hidden" NAME="name" VALUE="value">
####
'hidden' => <<'END_OF_FUNC',
sub hidden {
my($self,@p) = self_or_default(@_);
# this is the one place where we departed from our standard
# calling scheme, so we have to special-case (darn)
my(@result,@value);
my($name,$default,$override,@other) =
$self->rearrange([NAME,[DEFAULT,VALUE,VALUES],[OVERRIDE,FORCE]],@p);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (28 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
my $do_override = 0;
if ( substr($p[0],0,1) eq '-' || $self->use_named_parameters ) {
@value = ref($default) ? @{$default} : $default;
$do_override = $override;
} else {
foreach ($default,$override,@other) {
push(@value,$_) if defined($_);
}
}
# use previous values if override is not set
my @prev = $self->param($name);
@value = @prev if !$do_override && @prev;
$name=$self->escapeHTML($name);
foreach (@value) {
$_=$self->escapeHTML($_);
push(@result,qq/<INPUT TYPE="hidden" NAME="$name" VALUE="$_">/);
}
return wantarray ? @result : join('',@result);
}
END_OF_FUNC
#### Method: image_button
# Parameters:
#
$name -> Name of the button
#
$src -> URL of the image source
#
$align -> Alignment style (TOP, BOTTOM or MIDDLE)
# Returns:
#
A string containing a <INPUT TYPE="image" NAME="name" SRC="url" ALIGN="alignment">
####
'image_button' => <<'END_OF_FUNC',
sub image_button {
my($self,@p) = self_or_default(@_);
my($name,$src,$alignment,@other) =
$self->rearrange([NAME,SRC,ALIGN],@p);
my($align) = $alignment ? " ALIGN=\U$alignment" : '';
my($other) = @other ? " @other" : '';
$name=$self->escapeHTML($name);
return qq/<INPUT TYPE="image" NAME="$name" SRC="$src"$align$other>/;
}
END_OF_FUNC
#### Method: self_url
# Returns a URL containing the current script and all its
# param/value pairs arranged as a query. You can use this
# to create a link that, when selected, will reinvoke the
# script with all its state information preserved.
####
'self_url' => <<'END_OF_FUNC',
sub self_url {
my($self) = self_or_default(@_);
my($query_string) = $self->query_string;
my $protocol = $self->protocol();
my $name = "$protocol://" . $self->server_name;
$name .= ":" . $self->server_port
unless $self->server_port == 80;
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$name .= $self->script_name;
$name .= $self->path_info if $self->path_info;
return $name unless $query_string;
return "$name?$query_string";
}
END_OF_FUNC
# This is provided as a synonym to self_url() for people unfortunate
# enough to have incorporated it into their programs already!
'state' => <<'END_OF_FUNC',
sub state {
&self_url;
}
END_OF_FUNC
#### Method: url
# Like self_url, but doesn't return the query string part of
# the URL.
####
'url' => <<'END_OF_FUNC',
sub url {
my($self) = self_or_default(@_);
my $protocol = $self->protocol();
my $name = "$protocol://" . $self->server_name;
$name .= ":" . $self->server_port
unless $self->server_port == 80;
$name .= $self->script_name;
return $name;
}
END_OF_FUNC
#### Method: cookie
# Set or read a cookie from the specified name.
# Cookie can then be passed to header().
# Usual rules apply to the stickiness of -value.
# Parameters:
#
-name -> name for this cookie (optional)
#
-value -> value of this cookie (scalar, array or hash)
#
-path -> paths for which this cookie is valid (optional)
#
-domain -> internet domain in which this cookie is valid (optional)
#
-secure -> if true, cookie only passed through secure channel (optional)
#
-expires -> expiry date in format Wdy, DD-Mon-YY HH:MM:SS GMT (optional)
####
'cookie' => <<'END_OF_FUNC',
# temporary, for debugging.
sub cookie {
my($self,@p) = self_or_default(@_);
my($name,$value,$path,$domain,$secure,$expires) =
$self->rearrange([NAME,[VALUE,VALUES],PATH,DOMAIN,SECURE,EXPIRES],@p);
# if no value is supplied, then we retrieve the
# value of the cookie, if any. For efficiency, we cache the parsed
# cookie in our state variables.
unless (defined($value)) {
unless ($self->{'.cookies'}) {
my(@pairs) = split("; ",$self->raw_cookie);
foreach (@pairs) {
my($key,$value) = split("=");
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
# "+3M" -- in 3 months
# "+2y" -- in 2 years
# "-3m" -- 3 minutes ago(!)
# If you don't supply one of these forms, we assume you are
# specifying the date yourself
my($offset);
if (!$time || ($time eq 'now')) {
$offset = 0;
} elsif ($time=~/^([+-]?\d+)([mhdMy]?)/) {
$offset = ($mult{$2} || 1)*$1;
} else {
return $time;
}
my($sec,$min,$hour,$mday,$mon,$year,$wday) = gmtime(time+$offset);
$year += 1900 unless $year < 100;
return sprintf("%s, %02d-%s-%02d %02d:%02d:%02d GMT",
$WDAY[$wday],$mday,$MON[$mon],$year,$hour,$min,$sec);
}
END_OF_FUNC
###############################################
# OTHER INFORMATION PROVIDED BY THE ENVIRONMENT
###############################################
#### Method: path_info
# Return the extra virtual path information provided
# after the URL (if any)
####
'path_info' => <<'END_OF_FUNC',
sub path_info {
local($path_info) = $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'};
return $path_info;
}
END_OF_FUNC
#### Method: request_method
# Returns 'POST', 'GET', 'PUT' or 'HEAD'
####
'request_method' => <<'END_OF_FUNC',
sub request_method {
return $ENV{'REQUEST_METHOD'};
}
END_OF_FUNC
#### Method: path_translated
# Return the physical path information provided
# by the URL (if any)
####
'path_translated' => <<'END_OF_FUNC',
sub path_translated {
return $ENV{'PATH_TRANSLATED'};
}
END_OF_FUNC
#### Method: query_string
# Synthesize a query string from our current
# parameters
####
'query_string' => <<'END_OF_FUNC',
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
sub query_string {
my($self) = self_or_default(@_);
my($param,$value,@pairs);
foreach $param ($self->param) {
my($eparam) = &escape($param);
foreach $value ($self->param($param)) {
$value = &escape($value);
push(@pairs,"$eparam=$value");
}
}
return join("&",@pairs);
}
END_OF_FUNC
#### Method: accept
# Without parameters, returns an array of the
# MIME types the browser accepts.
# With a single parameter equal to a MIME
# type, will return undef if the browser won't
# accept it, 1 if the browser accepts it but
# doesn't give a preference, or a floating point
# value between 0.0 and 1.0 if the browser
# declares a quantitative score for it.
# This handles MIME type globs correctly.
####
'accept' => <<'END_OF_FUNC',
sub accept {
my($self,$search) = self_or_CGI(@_);
my(%prefs,$type,$pref,$pat);
my(@accept) = split(',',$self->http('accept'));
foreach (@accept) {
($pref) = /q=(\d\.\d+|\d+)/;
($type) = m#(\S+/[^;]+)#;
next unless $type;
$prefs{$type}=$pref || 1;
}
return keys %prefs unless $search;
#
#
#
#
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$self->{'.tmpfiles'}->{$filename}= {
name=>$tmpfile,
info=>{%header}
}
}
}
END_OF_FUNC
'tmpFileName' => <<'END_OF_FUNC',
sub tmpFileName {
my($self,$filename) = self_or_default(@_);
return $self->{'.tmpfiles'}->{$filename}->{name}->as_string;
}
END_OF_FUNC
'uploadInfo' => <<'END_OF_FUNC'
sub uploadInfo {
my($self,$filename) = self_or_default(@_);
return $self->{'.tmpfiles'}->{$filename}->{info};
}
END_OF_FUNC
);
END_OF_AUTOLOAD
;
# Globals and stubs for other packages that we use
package MultipartBuffer;
# how many bytes to read at a time. We use
# a 5K buffer by default.
$FILLUNIT = 1024 * 5;
$TIMEOUT = 10*60;
# 10 minute timeout
$SPIN_LOOP_MAX = 1000; # bug fix for some Netscape servers
$CRLF=$CGI::CRLF;
#reuse the autoload function
*MultipartBuffer::AUTOLOAD = \&CGI::AUTOLOAD;
###############################################################################
################# THESE FUNCTIONS ARE AUTOLOADED ON DEMAND ####################
###############################################################################
$AUTOLOADED_ROUTINES = '';
# prevent -w error
$AUTOLOADED_ROUTINES=<<'END_OF_AUTOLOAD';
%SUBS = (
'new' => <<'END_OF_FUNC',
sub new {
my($package,$interface,$boundary,$length,$filehandle) = @_;
my $IN;
if ($filehandle) {
my($package) = caller;
# force into caller's package if necessary
$IN = $filehandle=~/[':]/ ? $filehandle : "$package\:\:$filehandle";
}
$IN = "main::STDIN" unless $IN;
$CGI::DefaultClass->binmode($IN) if $CGI::needs_binmode;
# If the user types garbage into the file upload field,
# then Netscape passes NOTHING to the server (not good).
# We may hang on this read in that case. So we implement
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$revision;
__END__
=head1 NAME
CGI - Simple Common Gateway Interface Class
=head1 SYNOPSIS
use CGI;
# the rest is too complicated for a synopsis; keep reading
=head1 ABSTRACT
This perl library uses perl5 objects to make it easy to create
Web fill-out forms and parse their contents. This package
defines CGI objects, entities that contain the values of the
current query string and other state variables.
Using a CGI object's methods, you can examine keywords and parameters
passed to your script, and create forms whose initial values
are taken from the current query (thereby preserving state
information).
The current version of CGI.pm is available at
http://www.genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html
ftp://ftp-genome.wi.mit.edu/pub/software/WWW/
=head1 INSTALLATION:
To install this package, just change to the directory in which this
file is found and type the following:
perl Makefile.PL
make
make install
This will copy CGI.pm to your perl library directory for use by all
perl scripts. You probably must be root to do this.
Now you can
load the CGI routines in your Perl scripts with the line:
use CGI;
If you don't have sufficient privileges to install CGI.pm in the Perl
library directory, you can put CGI.pm into some convenient spot, such
as your home directory, or in cgi-bin itself and prefix all Perl
scripts that call it with something along the lines of the following
preamble:
use lib '/home/davis/lib';
use CGI;
If you are using a version of perl earlier than 5.002 (such as NT perl), use
this instead:
BEGIN {
unshift(@INC,'/home/davis/lib');
}
use CGI;
The CGI distribution also comes with a cute module called L<CGI::Carp>.
It redefines the die(), warn(), confess() and croak() error routines
so that they write nicely formatted error messages into the server's
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (45 of 79) [22/07/2003 5:57:02 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
error log (or to the output stream of your choice). This avoids long
hours of groping through the error and access logs, trying to figure
out which CGI script is generating error messages. If you choose,
you can even have fatal error messages echoed to the browser to avoid
the annoying and uninformative "Server Error" message.
=head1 DESCRIPTION
=head2 CREATING A NEW QUERY OBJECT:
$query = new CGI;
This will parse the input (from both POST and GET methods) and store
it into a perl5 object called $query.
=head2 CREATING A NEW QUERY OBJECT FROM AN INPUT FILE
$query = new CGI(INPUTFILE);
If you provide a file handle to the new() method, it
will read parameters from the file (or STDIN, or whatever). The
file can be in any of the forms describing below under debugging
(i.e. a series of newline delimited TAG=VALUE pairs will work).
Conveniently, this type of file is created by the save() method
(see below). Multiple records can be saved and restored.
Perl purists will be pleased to know that this syntax accepts
references to file handles, or even references to filehandle globs,
which is the "official" way to pass a filehandle:
$query = new CGI(\*STDIN);
You can also initialize the query object from an associative array
reference:
$query = new CGI( {'dinosaur'=>'barney',
'song'=>'I love you',
'friends'=>[qw/Jessica George Nancy/]}
);
or from a properly formatted, URL-escaped query string:
$query = new CGI('dinosaur=barney&color=purple');
To create an empty query, initialize it from an empty string or hash:
$empty_query = new CGI("");
-or$empty_query = new CGI({});
=head2 FETCHING A LIST OF KEYWORDS FROM THE QUERY:
@keywords = $query->keywords
If the script was invoked as the result of an <ISINDEX> search, the
parsed keywords can be obtained as an array using the keywords() method.
=head2 FETCHING THE NAMES OF ALL THE PARAMETERS PASSED TO YOUR SCRIPT:
@names = $query->param
If the script was invoked with a parameter list
(e.g. "name1=value1&name2=value2&name3=value3"), the param()
method will return the parameter names as a list. If the
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$query->self_url;
HREF=$myself#table1>See table 1</A>";
HREF=$myself#table2>See table 2</A>";
HREF=$myself#yourself>See for yourself</A>";
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
the proper order of arguments in CGI function calls that accepted five
or six different arguments. As of 2.0, there's a better way to pass
arguments to the various CGI functions. In this style, you pass a
series of name=>argument pairs, like this:
$field = $query->radio_group(-name=>'OS',
-values=>[Unix,Windows,Macintosh],
-default=>'Unix');
The advantages of this style are that you don't have to remember the
exact order of the arguments, and if you leave out a parameter, in
most cases it will default to some reasonable value. If you provide
a parameter that the method doesn't recognize, it will usually do
something useful with it, such as incorporating it into the HTML form
tag. For example if Netscape decides next week to add a new
JUSTIFICATION parameter to the text field tags, you can start using
the feature without waiting for a new version of CGI.pm:
$field = $query->textfield(-name=>'State',
-default=>'gaseous',
-justification=>'RIGHT');
This will result in an HTML tag that looks like this:
<INPUT TYPE="textfield" NAME="State" VALUE="gaseous"
JUSTIFICATION="RIGHT">
Parameter names are case insensitive: you can use -name, or -Name or
-NAME. You don't have to use the hyphen if you don't want to. After
creating a CGI object, call the B<use_named_parameters()> method with
a nonzero value. This will tell CGI.pm that you intend to use named
parameters exclusively:
$query = new CGI;
$query->use_named_parameters(1);
$field = $query->radio_group('name'=>'OS',
'values'=>['Unix','Windows','Macintosh'],
'default'=>'Unix');
Actually, CGI.pm only looks for a hyphen in the first parameter. So
you can leave it off subsequent parameters if you like. Something to
be wary of is the potential that a string constant like "values" will
collide with a keyword (and in fact it does!) While Perl usually
figures out when you're referring to a function and when you're
referring to a string, you probably should put quotation marks around
all string constants just to play it safe.
=head2 CREATING THE HTTP HEADER:
print $query->header;
-orprint $query->header('image/gif');
-orprint $query->header('text/html','204 No response');
-orprint $query->header(-type=>'image/gif',
-nph=>1,
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
A 'true' flag if you want to include a <BASE> tag in the header. This
helps resolve relative addresses to absolute ones when the document is moved,
but makes the document hierarchy non-portable. Use with care!
=item 4, 5, 6...
Any other parameters you want to include in the <BODY> tag. This is a good
place to put Netscape extensions, such as colors and wallpaper patterns.
=back
=head2 ENDING THE HTML DOCUMENT:
print $query->end_html
This ends an HTML document by printing the </BODY></HTML> tags.
=head1 CREATING FORMS:
I<General note> The various form-creating methods all return strings
to the caller, containing the tag or tags that will create the requested
form element. You are responsible for actually printing out these strings.
It's set up this way so that you can place formatting tags
around the form elements.
I<Another note> The default values that you specify for the forms are only
used the B<first> time the script is invoked (when there is no query
string). On subsequent invocations of the script (when there is a query
string), the former values are used even if they are blank.
If you want to change the value of a field from its previous value, you have two
choices:
(1) call the param() method to set it.
(2) use the -override (alias -force) parameter (a new feature in version 2.15).
This forces the default value to be used, regardless of the previous value:
print $query->textfield(-name=>'field_name',
-default=>'starting value',
-override=>1,
-size=>50,
-maxlength=>80);
I<Yet another note> By default, the text and labels of form elements are
escaped according to HTML rules. This means that you can safely use
"<CLICK ME>" as the label for a button. However, it also interferes with
your ability to incorporate special HTML character sequences, such as Á,
into your fields. If you wish to turn off automatic escaping, call the
autoEscape() method with a false value immediately after creating the CGI object:
$query = new CGI;
$query->autoEscape(undef);
=head2 CREATING AN ISINDEX TAG
print $query->isindex(-action=>$action);
-orprint $query->isindex($action);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
Netscape 2.0.
fields or that
importantly,
2.0 forms. For
encoding type
Forms that use this type of encoding are not easily interpreted
by CGI scripts unless they use CGI.pm or another library designed
to handle them.
=back
For compatibility, the startform() method uses the older form of
encoding by default. If you want to use the newer form of encoding
by default, you can call B<start_multipart_form()> instead of
B<startform()>.
JAVASCRIPTING: The B<-name> and B<-onSubmit> parameters are provided
for use with JavaScript. The -name parameter gives the
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
print $query->textfield(-name=>'field_name',
-default=>'starting value',
-override=>1,
-size=>50,
-maxlength=>80);
JAVASCRIPTING: You can also provide B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters to register JavaScript event handlers.
The onChange handler will be called whenever the user changes the
contents of the text field. You can do text validation if you like.
onFocus and onBlur are called respectively when the insertion point
moves into and out of the text field. onSelect is called when the
user changes the portion of the text that is selected.
=head2 CREATING A BIG TEXT FIELD
print $query->textarea(-name=>'foo',
-default=>'starting value',
-rows=>10,
-columns=>50);
-or
print $query->textarea('foo','starting value',10,50);
textarea() is just like textfield, but it allows you to specify
rows and columns for a multiline text entry box. You can provide
a starting value for the field, which can be long and contain
multiple lines.
JAVASCRIPTING: The B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters are recognized. See textfield().
=head2 CREATING A PASSWORD FIELD
print $query->password_field(-name=>'secret',
-value=>'starting value',
-size=>50,
-maxlength=>80);
-orprint $query->password_field('secret','starting value',50,80);
password_field() is identical to textfield(), except that its contents
will be starred out on the web page.
JAVASCRIPTING: The B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters are recognized. See textfield().
=head2 CREATING A FILE UPLOAD FIELD
print $query->filefield(-name=>'uploaded_file',
-default=>'starting value',
-size=>50,
-maxlength=>80);
-orprint $query->filefield('uploaded_file','starting value',50,80);
filefield() will return a file upload field for Netscape 2.0 browsers.
In order to take full advantage of this I<you must use the new
multipart encoding scheme> for the form. You can do this either
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
open (OUTFILE,">>/usr/local/web/users/feedback");
while ($bytesread=read($filename,$buffer,1024)) {
print OUTFILE $buffer;
}
When a file is uploaded the browser usually sends along some
information along with it in the format of headers. The information
usually includes the MIME content type. Future browsers may send
other information as well (such as modification date and size). To
retrieve this information, call uploadInfo(). It returns a reference to
an associative array containing all the document headers.
$filename = $query->param('uploaded_file');
$type = $query->uploadInfo($filename)->{'Content-Type'};
unless ($type eq 'text/html') {
die "HTML FILES ONLY!";
}
If you are using a machine that recognizes "text" and "binary" data
modes, be sure to understand when and how to use them (see the Camel book).
Otherwise you may find that binary files are corrupted during file uploads.
JAVASCRIPTING: The B<-onChange>, B<-onFocus>, B<-onBlur>
and B<-onSelect> parameters are recognized. See textfield()
for details.
=head2 CREATING A POPUP MENU
print $query->popup_menu('menu_name',
['eenie','meenie','minie'],
'meenie');
-or%labels = ('eenie'=>'your first choice',
'meenie'=>'your second choice',
'minie'=>'your third choice');
print $query->popup_menu('menu_name',
['eenie','meenie','minie'],
'meenie',\%labels);
-or (named parameter style)print $query->popup_menu(-name=>'menu_name',
-values=>['eenie','meenie','minie'],
-default=>'meenie',
-labels=>\%labels);
popup_menu() creates a menu.
=over 4
=item 1.
The required first argument is the menu's name (-name).
=item 2.
The required second
containing the list
method an anonymous
a named array, such
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
=item 3.
The optional third parameter (-default) is the name of the default
menu choice. If not specified, the first item will be the default.
The values of the previous choice will be maintained across queries.
=item 4.
The optional fourth parameter (-labels) is provided for people who
want to use different values for the user-visible label inside the
popup menu nd the value returned to your script. It's a pointer to an
associative array relating menu values to user-visible labels. If you
leave this parameter blank, the menu values will be displayed by
default. (You can also leave a label undefined if you want to).
=back
When the form is processed, the selected value of the popup menu can
be retrieved using:
$popup_menu_value = $query->param('menu_name');
JAVASCRIPTING: popup_menu() recognizes the following event handlers:
B<-onChange>, B<-onFocus>, and B<-onBlur>. See the textfield()
section for details on when these handlers are called.
=head2 CREATING A SCROLLING LIST
print $query->scrolling_list('list_name',
['eenie','meenie','minie','moe'],
['eenie','moe'],5,'true');
-orprint $query->scrolling_list('list_name',
['eenie','meenie','minie','moe'],
['eenie','moe'],5,'true',
\%labels);
-orprint $query->scrolling_list(-name=>'list_name',
-values=>['eenie','meenie','minie','moe'],
-default=>['eenie','moe'],
-size=>5,
-multiple=>'true',
-labels=>\%labels);
scrolling_list() creates a scrolling list.
=over 4
=item B<Parameters:>
=item 1.
The first and second arguments are the list name (-name) and values
(-values). As in the popup menu, the second argument should be an
array reference.
=item 2.
The optional third argument (-default) can be either a reference to a
list containing the values to be selected by default, or can be a
single value to select. If this argument is missing or undefined,
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (60 of 79) [22/07/2003 5:57:03 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
then nothing is selected when the list first appears. In the named
parameter version, you can use the synonym "-defaults" for this
parameter.
=item 3.
The optional fourth argument is the size of the list (-size).
=item 4.
The optional fifth argument can be set to true to allow multiple
simultaneous selections (-multiple). Otherwise only one selection
will be allowed at a time.
=item 5.
The optional sixth argument is a pointer to an associative array
containing long user-visible labels for the list items (-labels).
If not provided, the values will be displayed.
When this form is processed, all selected list items will be returned as
a list under the parameter name 'list_name'. The values of the
selected items can be retrieved with:
@selected = $query->param('list_name');
=back
JAVASCRIPTING: scrolling_list() recognizes the following event handlers:
B<-onChange>, B<-onFocus>, and B<-onBlur>. See textfield() for
the description of when these handlers are called.
=head2 CREATING A GROUP OF RELATED CHECKBOXES
print $query->checkbox_group(-name=>'group_name',
-values=>['eenie','meenie','minie','moe'],
-default=>['eenie','moe'],
-linebreak=>'true',
-labels=>\%labels);
print $query->checkbox_group('group_name',
['eenie','meenie','minie','moe'],
['eenie','moe'],'true',\%labels);
HTML3-COMPATIBLE BROWSERS ONLY:
print $query->checkbox_group(-name=>'group_name',
-values=>['eenie','meenie','minie','moe'],
-rows=2,-columns=>2);
checkbox_group() creates a list of checkboxes that are related
by the same name.
=over 4
=item B<Parameters:>
=item 1.
The first and second arguments are the checkbox name and values,
respectively (-name and -values). As in the popup menu, the second
argument should be an array reference. These values are used for the
user-readable labels printed next to the checkboxes as well as for the
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
print $query->checkbox(-name=>'checkbox_name',
-checked=>'checked',
-value=>'ON',
-label=>'CLICK ME');
-orprint $query->checkbox('checkbox_name','checked','ON','CLICK ME');
checkbox() is used to create an isolated checkbox that isn't logically
related to any others.
=over 4
=item B<Parameters:>
=item 1.
The first parameter is the required name for the checkbox (-name).
will also be used for the user-readable label printed next to the
checkbox.
It
=item 2.
The optional second parameter (-checked) specifies that the checkbox
is turned on by default. Synonyms are -selected and -on.
=item 3.
The optional third parameter (-value) specifies the value of the
checkbox when it is checked. If not provided, the word "on" is
assumed.
=item 4.
The optional fourth parameter (-label) is the user-readable label to
be attached to the checkbox. If not provided, the checkbox name is
used.
=back
The value of the checkbox can be retrieved using:
$turned_on = $query->param('checkbox_name');
JAVASCRIPTING: checkbox() recognizes the B<-onClick>
parameter. See checkbox_group() for further details.
=head2 CREATING A RADIO BUTTON GROUP
print $query->radio_group(-name=>'group_name',
-values=>['eenie','meenie','minie'],
-default=>'meenie',
-linebreak=>'true',
-labels=>\%labels);
-orprint $query->radio_group('group_name',['eenie','meenie','minie'],
'meenie','true',\%labels);
HTML3-COMPATIBLE BROWSERS ONLY:
print $query->radio_group(-name=>'group_name',
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (63 of 79) [22/07/2003 5:57:03 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
-values=>['eenie','meenie','minie','moe'],
-rows=2,-columns=>2);
radio_group() creates a set of logically-related radio buttons
(turning one member of the group on turns the others off)
=over 4
=item B<Parameters:>
=item 1.
The first argument is the name of the group and is required (-name).
=item 2.
The second argument (-values) is the list of values for the radio
buttons. The values and the labels that appear on the page are
identical. Pass an array I<reference> in the second argument, either
using an anonymous array, as shown, or by referencing a named array as
in "\@foo".
=item 3.
The optional third parameter (-default) is the name of the default
button to turn on. If not specified, the first item will be the
default. You can provide a nonexistent button name, such as "-" to
start up with no buttons selected.
=item 4.
The optional fourth parameter (-linebreak) can be set to 'true' to put
line breaks between the buttons, creating a vertical list.
=item 5.
The optional fifth parameter (-labels) is a pointer to an associative
array relating the radio button values to user-visible labels to be
used in the display. If not provided, the values themselves are
displayed.
=item 6.
B<HTML3-compatible browsers> (such as Netscape) can take advantage
of the optional
parameters B<-rows>, and B<-columns>. These parameters cause
radio_group() to return an HTML3 compatible table containing
the radio group formatted with the specified number of rows
and columns. You can provide just the -columns parameter if you
wish; radio_group will calculate the correct number of rows
for you.
To include row and column headings in the returned table, you
can use the B<-rowheader> and B<-colheader> parameters. Both
of these accept a pointer to an array of headings to use.
The headings are just decorative. They don't reorganize the
interpetation of the radio buttons -- they're still a single named
unit.
=back
When the form is processed, the selected radio button can
be retrieved using:
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$which_radio_button = $query->param('group_name');
The value returned by radio_group() is actually an array of button
elements. You can capture them and use them within tables, lists,
or in other creative ways:
@h = $query->radio_group(-name=>'group_name',-values=>\@values);
&use_in_creative_way(@h);
=head2 CREATING A SUBMIT BUTTON
print $query->submit(-name=>'button_name',
-value=>'value');
-orprint $query->submit('button_name','value');
submit() will create the query submission button.
should have one of these.
Every form
=over 4
=item B<Parameters:>
=item 1.
The first argument (-name) is optional. You can give the button a
name if you have several submission buttons in your form and you want
to distinguish between them. The name will also be used as the
user-visible label. Be aware that a few older browsers don't deal with this correctly and
B<never> send back a value from a button.
=item 2.
The second argument (-value) is also optional. This gives the button
a value that will be passed to your script in the query string.
=back
You can figure out which button was pressed by using different
values for each one:
$which_one = $query->param('button_name');
JAVASCRIPTING: radio_group() recognizes the B<-onClick>
parameter. See checkbox_group() for further details.
=head2 CREATING A RESET BUTTON
print $query->reset
reset() creates the "reset" button. Note that it restores the
form to its value from the last time the script was called,
NOT necessarily to the defaults.
=head2 CREATING A DEFAULT BUTTON
print $query->defaults('button_label')
defaults() creates a button that, when invoked, will cause the
form to be completely reset to its defaults, wiping out all the
changes the user ever made.
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
The first argument (-name) is required and specifies the name of this
field.
=item 2.
The second argument (-src) is also required and specifies the URL
=item 3.
The third option (-align, optional) is an alignment type, and may be
TOP, BOTTOM or MIDDLE
=back
Fetch the value of the button this way:
$x = $query->param('button_name.x');
$y = $query->param('button_name.y');
=head2 CREATING A JAVASCRIPT ACTION BUTTON
print $query->button(-name=>'button_name',
-value=>'user visible label',
-onClick=>"do_something()");
-orprint $query->button('button_name',"do_something()");
button() produces a button that is compatible with Netscape 2.0's
JavaScript. When it's pressed the fragment of JavaScript code
pointed to by the B<-onClick> parameter will be executed. On
non-Netscape browsers this form element will probably not even
display.
=head1 NETSCAPE COOKIES
Netscape browsers versions 1.1 and higher support a so-called
"cookie" designed to help maintain state within a browser session.
CGI.pm has several methods that support cookies.
A cookie is a name=value pair much like the named parameters in a CGI
query string. CGI scripts create one or more cookies and send
them to the browser in the HTTP header. The browser maintains a list
of cookies that belong to a particular Web server, and returns them
to the CGI script during subsequent interactions.
In addition to the required name=value pair, each cookie has several
optional attributes:
=over 4
=item 1. an expiration time
This is a time/date string (in a special GMT format) that indicates
when a cookie expires. The cookie will be saved and returned to your
script until this expiration date is reached if the user exits
Netscape and restarts it. If an expiration date isn't specified, the cookie
will remain active until the user quits Netscape.
=item 2. a domain
This is a partial or complete domain name for which the cookie is
valid. The browser will return the cookie to any host that matches
the partial domain name. For example, if you specify a domain name
of ".capricorn.com", then Netscape will return the cookie to
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (67 of 79) [22/07/2003 5:57:03 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
=over 4
=item B<-name>
The name of the cookie (required). This can be any string at all.
Although Netscape limits its cookie names to non-whitespace
alphanumeric characters, CGI.pm removes this restriction by escaping
and unescaping cookies behind the scenes.
=item B<-value>
The value of the cookie. This can be any scalar value,
array reference, or even associative array reference. For example,
you can store an entire associative array into a cookie this way:
$cookie=$query->cookie(-name=>'family information',
-value=>\%childrens_ages);
=item B<-path>
The optional partial path for which this cookie will be valid, as described
above.
=item B<-domain>
The optional partial domain for which this cookie will be valid, as described
above.
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
=item B<-expires>
The optional expiration date for this cookie.
in the section on the B<header()> method:
"+1h"
=item B<-secure>
If set to true, this cookie will only be used within a secure
SSL session.
=back
The cookie created by cookie() must be incorporated into the HTTP
header within the string returned by the header() method:
print $query->header(-cookie=>$my_cookie);
To create multiple cookies, give header() an array reference:
$cookie1 = $query->cookie(-name=>'riddle_name',
-value=>"The Sphynx's Question");
$cookie2 = $query->cookie(-name=>'answers',
-value=>\%answers);
print $query->header(-cookie=>[$cookie1,$cookie2]);
To retrieve a cookie, request it by name by calling cookie()
method without the B<-value> parameter:
use CGI;
$query = new CGI;
%answers = $query->cookie(-name=>'answers');
# $query->cookie('answers') will work too!
The cookie and CGI namespaces are separate. If you have a parameter
named 'answers' and a cookie named 'answers', the values retrieved by
param() and cookie() are independent of each other. However, it's
simple to turn a CGI parameter into a cookie, and vice-versa:
# turn a CGI parameter into a cookie
$c=$q->cookie(-name=>'answers',-value=>[$q->param('answers')]);
# vice-versa
$q->param(-name=>'answers',-value=>[$q->cookie('answers')]);
See the B<cookie.cgi> example script for some ideas on how to use
cookies effectively.
B<NOTE:> There appear to be some (undocumented) restrictions on
Netscape cookies. In Netscape 2.01, at least, I haven't been able to
set more than three cookies at a time. There may also be limits on
the length of cookies. If you need to store a lot of information,
it's probably better to create a unique session ID, store it in a
cookie, and use the session ID to locate an external file/database
saved on the server's side of the connection.
=head1 WORKING WITH NETSCAPE FRAMES
It's possible for CGI.pm scripts to write into several browser
panels and windows using Netscape's frame mechanism.
There are three techniques for defining new frames programmatically:
=over 4
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
With
print $q->startform(-target=>'ResultsWindow');
When your script is reinvoked by the form, its output will be loaded
into the frame named "ResultsWindow". If one doesn't already exist
a new window will be created.
=back
The script "frameset.cgi" in the examples directory shows one way to
create pages in which the fill-out form and the response live in
side-by-side frames.
=head1 DEBUGGING
If you are running the script
from the command line or in the perl debugger, you can pass the script
a list of keywords or parameter=value pairs on the command line or
from standard input (you don't have to worry about tricking your
script into reading from environment variables).
You can pass keywords like this:
your_script.pl keyword1 keyword2 keyword3
or this:
your_script.pl keyword1+keyword2+keyword3
or this:
your_script.pl name1=value1 name2=value2
or this:
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
your_script.pl name1=value1&name2=value2
or even as newline-delimited parameters on standard input.
When debugging, you can use quotes and backslashes to escape
characters in the familiar shell manner, letting you place
spaces and other funny characters in your parameter=value
pairs:
your_script.pl "name1='I am a long value'" "name2=two\ words"
=head2 DUMPING OUT ALL THE NAME/VALUE PAIRS
The dump() method produces a string consisting of all the query's
name/value pairs formatted nicely as a nested list. This is useful
for debugging purposes:
print $query->dump
Produces something that looks like:
<UL>
<LI>name1
<UL>
<LI>value1
<LI>value2
</UL>
<LI>name2
<UL>
<LI>value1
</UL>
</UL>
You can pass a value of 'true' to dump() in order to get it to
print the results out as plain text, suitable for incorporating
into a <PRE> section.
As a shortcut, as of version 1.56 you can interpolate the entire CGI
object into a string and it will be replaced with the a nice HTML dump
shown above:
$query=new CGI;
print "<H2>Current Values</H2> $query\n";
=head1 FETCHING ENVIRONMENT VARIABLES
Some of the more useful environment variables can be fetched
through this interface. The methods are as follows:
=over 4
=item B<accept()>
Return a list of MIME types that the remote browser
accepts. If you give this method a single argument
corresponding to a MIME type, as in
$query->accept('text/html'), it will return a
floating point value corresponding to the browser's
preference for this type from 0.0 (don't want) to 1.0.
Glob types (e.g. text/*) in the browser's accept list
are handled correctly.
=item B<raw_cookie()>
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (71 of 79) [22/07/2003 5:57:03 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
name.
=item B<virtual_host ()>
When using virtual hosts, returns the name of the host that
the browser attempted to contact
=item B<server_software ()>
Returns the server software and version number.
=item B<remote_user ()>
Return the authorization/verification name used for user
verification, if this script is protected.
=item B<user_name ()>
Attempt to obtain the remote user's name, using a variety
of different techniques. This only works with older browsers
such as Mosaic. Netscape does not reliably report the user
name!
=item B<request_method()>
Returns the method used to access your script, usually
one of 'POST', 'GET' or 'HEAD'.
=back
=head1 CREATING HTML ELEMENTS:
In addition to its shortcuts for creating form elements, CGI.pm
defines general HTML shortcut methods as well. HTML shortcuts are
named after a single HTML element and return a fragment of HTML text
that you can then print or manipulate as you like.
This example shows how to use the HTML methods:
$q = new CGI;
print $q->blockquote(
"Many years ago on the island of",
$q->a({href=>"http://crete.org/"},"Crete"),
"there lived a minotaur named",
$q->strong("Fred."),
),
$q->hr;
This results in the following HTML code (extra newlines have been
added for readability):
<blockquote>
Many years ago on the island of
<a HREF="http://crete.org/">Crete</a> there lived
a minotaur named <strong>Fred.</strong>
</blockquote>
<hr>
If you find the syntax for calling the HTML shortcuts awkward, you can
import them into your namespace and dispense with the object syntax
completely (see the next section for more details):
use CGI shortcuts;
print blockquote(
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
If you
print hr;
# gives "<hr>"
If you provide one or more string arguments, they are concatenated
together with spaces and placed between opening and closing tags:
print h1("Chapter","1");
# gives "<h1>Chapter 1</h1>"
If the first argument is an associative array reference, then the keys
and values of the associative array become the HTML tag's attributes:
print a({href=>'fred.html',target=>'_new'},
"Open a new frame");
# gives <a href="fred.html",target="_new">Open a new frame</a>
You are free to use CGI.pm-style dashes in front of the attribute
names if you prefer:
print img {-src=>'fred.gif',-align=>'LEFT'};
# gives <img ALIGN="LEFT" SRC="fred.gif">
=head2 Generating new HTML tags
Since no mere mortal can keep up with Netscape and Microsoft as they
battle it out for control of HTML, the code that generates HTML tags
is general and extensible. You can create new HTML tags freely just
by referring to them on the import line:
use CGI shortcuts,winkin,blinkin,nod;
Now, in addition to the standard CGI shortcuts, you've created HTML
tags named "winkin", "blinkin" and "nod". You can use them like this:
print blinkin {color=>'blue',rate=>'fast'},"Yahoo!";
# <blinkin COLOR="blue" RATE="fast">Yahoo!</blinkin>
=head1 IMPORTING CGI METHOD CALLS INTO YOUR NAME SPACE
As a convenience, you can import most of the CGI method calls directly
into your name space. The syntax for doing this is:
use CGI <list of methods>;
The listed methods will be imported into the current package; you can
call them directly without creating a CGI object first. This example
shows how to import the B<param()> and B<header()>
methods, and then use them directly:
use CGI param,header;
print header('text/plain');
$zipcode = param('zipcode');
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt (74 of 79) [22/07/2003 5:57:03 PM]
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
checkbox_group(-name=>'words',
-values=>['eenie','meenie','minie','moe'],
-defaults=>['eenie','moe']),p,
"What's your favorite color?",
popup_menu(-name=>'color',
-values=>['red','green','blue','chartreuse']),p,
submit,
end_form,
hr,"\n";
if (param) {
print
"Your name is ",em(param('name')),p,
"The keywords are: ",em(join(", ",param('words'))),p,
"Your favorite color is ",em(param('color')),".\n";
}
print end_html;
=head1 USING NPH SCRIPTS
NPH, or "no-parsed-header", scripts bypass the server completely by
sending the complete HTTP header directly to the browser. This has
slight performance benefits, but is of most use for taking advantage
of HTTP extensions that are not directly supported by your server,
such as server push and PICS headers.
Servers use a variety of conventions for designating CGI scripts as
NPH. Many Unix servers look at the beginning of the script's name for
the prefix "nph-". The Macintosh WebSTAR server and Microsoft's
Internet Information Server, in contrast, try to decide whether a
program is an NPH script by examining the first line of script output.
CGI.pm supports NPH scripts with a special NPH mode. When in this
mode, CGI.pm will output the necessary extra header information when
the header() and redirect() methods are
called.
The Microsoft Internet Information Server requires NPH mode. As of version
2.30, CGI.pm will automatically detect when the script is running under IIS
and put itself into this mode. You do not need to do this manually, although
it won't hurt anything if you do.
There are a number of ways to put CGI.pm into NPH mode:
=over 4
=item In the B<use> statement
Simply add ":nph" to the list of symbols to be imported into your script:
use CGI qw(:standard :nph)
=item By calling the B<nph()> method:
Call B<nph()> with a non-zero parameter at any point after using CGI.pm in your program.
CGI->nph(1)
=item By using B<-nph> parameters in the B<header()> and B<redirect()>
print $q->header(-nph=>1);
=back
statements:
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
$query->startform;
"<EM>What's your name?</EM><BR>";
$query->textfield('name');
$query->checkbox('Not my real name');
"<P>",$query->reset;
$query->submit('Action','Shout');
$query->submit('Action','Scream');
$query->endform;
"<HR>\n";
}
sub do_work {
my($query) = @_;
my(@values,$key);
http://www.eskimo.com/cgihelp/scripts/CGI.pm.txt
This form will allow to edit, reset or even create the personalized data file used by
Eskimo North's CGI counter. For more information on the counter please referrer to its
documentation.
Submit Form
You must have a valid Eskimo account in order to use this form. If you are an Eskimo
subscriber and keep on getting error messages, your browser is most likely not sending
the correct remote identification information. Don't worry about it; simply use the lynx
browser available on your Eskimo shell account.
This page is maintained by ECT, erik@skytouch.com (aka. webmaster@eskimo.com)
Copyright 1997 SkyTouch Communications. All rights reserved.
http://www.eskimo.com/cgihelp/scripts/date.txt
#!/bin/sh
DATE=/bin/date
echo Content-type: text/plain
echo
if [ -x $DATE ]; then
$DATE
else
echo Cannot find date command on this system.
fi
http://www.eskimo.com/cgi-bin/date
$heading
#!/usr/local/bin/perl5 # -*- Mode: Perl -*- # # lib5-cgi.pl A perl library of CGI functions. # # copyright
1995 David Walker # # This program is free software; you can redistribute it and/or modify # it under the
terms of the GNU General Public License as published by # the Free Software Foundation; either version
2, or (at your option) # any later version. # # You may also redistribute it and/or modify it under # the
terms of the Artistic License under which Perl is distributed. # # This program is distributed in the hope
that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of #
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public
License for more details. # # You should have received a copy of the GNU General Public License #
along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge,
MA 02139, USA. # # Ideas used in this program came from the cgi-lib.pl perl library # by Steven
Brenner's and # "Managing Internet Information Services" by Lui, Peek, Jones, # Buus & Nye. Published
by O'Reilly and Associates, Inc. # # Sat May 6 11:36:18 GMT-0700 1995 version 1.0.b # Fri Oct 20
12:30:04 1995 checked syntax for Perl 5, dw # Modified by Erik C. Thauvin (erik@skytouch.com) # Sun
Apr 27 01:56:36 PDT 1997 # # Added FILEPATH_INFO support. package lib_cgi ; sub
print_doc_header { local($content_type) = @_ ; $* = 1 ; # why do we need this? if
(defined($content_type)) { print "Content-type: $content_type\n\n" ; } else { print "Content-type:
text/html\n\n" ; } } sub print_page_header { local($heading) = @_ ; print "\n\n" ; print "\n" ; print "\n" ;
print "
$heading
\n" ; } sub print_page_trailer { print "\n\n" ; } sub error_message { local($message) = @_ ;
&print_page_header("ERROR") ; print "\n
$message
\n" ; print "
\n" ; &print_page_trailer ; exit 0 ; } sub get_args_post { local($buffer) ; # $CONTENT_TYPE is only set
for POST calls. &error_message('This script can only process form results.') unless
($ENV{'CONTENT_TYPE'} eq 'application/x-www-form-urlencoded') ; read(STDIN, $buffer,
$ENV{'CONTENT_LENGTH'}) ; # Split the name-value pairs @pairs = split(/&/, $buffer) ; foreach
$pair (@pairs) { ($name, $value) = split(/=/, $pair) ; $value =~ tr/+/ / ; $value =~ s/%([a-fA-F0-9][a-fAF0-9])/pack("C", hex($1))/eg ; $main::FORM{$name} = $value ; } } sub get_args_get {
local($path_info) = $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'}; @args = split('/', $path_info) ;
foreach (@args) { tr/+/ / ; ($name, $value) = split(/=/, $_) ; $main'FORM{$name} = $value ; } } sub
get_args { if ($ENV{'REQUEST_METHOD'} eq 'POST') { &get_args_post ; } elsif
($ENV{'REQUEST_METHOD'} eq 'GET') { &get_args_get ; } else { &error_message("Illegal Request
Method $ENV{REQUEST_METHOD}, " . "This script must be referenced using METHOD GET or
POST.") ; } } # Look for evil characters sub check_for_evil_characters { # $my_character_set must be
enclosed in single (') quotes. local($str,$variable_name,$my_character_set) = @_ ; if
(defined($my_character_set)) { $ok_character_set = $my_character_set ; } else { $ok_character_set = '[ahttp://www.eskimo.com/cgihelp/scripts/lib5-cgi.pl.txt (1 of 2) [22/07/2003 5:57:17 PM]
$heading
$heading
#!/usr/local/bin/perl # -*- Mode: Perl -*- # # lib-cgi.pl A perl library of CGI functions. # # copyright
1995 David Walker # # This program is free software; you can redistribute it and/or modify # it under the
terms of the GNU General Public License as published by # the Free Software Foundation; either version
2, or (at your option) # any later version. # # You may also redistribute it and/or modify it under # the
terms of the Artistic License under which Perl is distributed. # # This program is distributed in the hope
that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of #
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public
License for more details. # # You should have received a copy of the GNU General Public License #
along with this program; if not, write to the Free Software # Foundation, Inc., 675 Mass Ave, Cambridge,
MA 02139, USA. # # Ideas used in this program came from the cgi-lib.pl perl library # by Steven
Brenner's and # "Managing Internet Information Services" by Lui, Peek, Jones, # Buus & Nye. Published
by O'Reilly and Associates, Inc. # # Sat May 6 11:36:18 GMT-0700 1995 version 1.0.b # Fri Oct 20
12:30:04 1995 checked syntax for Perl 5, dw # Modified by Erik C. Thauvin (erik@skytouch.com) # Sun
Apr 27 01:56:36 PDT 1997 # # Added FILEPATH_INFO support. package lib_cgi ; sub
print_doc_header { local($content_type) = @_ ; $* = 1 ; # why do we need this? if
(defined($content_type)) { print "Content-type: $content_type\n\n" ; } else { print "Content-type:
text/html\n\n" ; } } sub print_page_header { local($heading) = @_ ; print "\n\n" ; print "\n" ; print "\n" ;
print "
$heading
\n" ; } sub print_page_trailer { print "\n\n" ; } sub error_message { local($message) = @_ ;
&print_page_header("ERROR") ; print "\n
$message
\n" ; print "
\n" ; &print_page_trailer ; exit 0 ; } sub get_args_post { local($buffer) ; # $CONTENT_TYPE is only set
for POST calls. &error_message('This script can only process form results.') unless
($ENV{'CONTENT_TYPE'} eq 'application/x-www-form-urlencoded') ; read(STDIN, $buffer,
$ENV{'CONTENT_LENGTH'}) ; # Split the name-value pairs @pairs = split(/&/, $buffer) ; foreach
$pair (@pairs) { ($name, $value) = split(/=/, $pair) ; $value =~ tr/+/ / ; $value =~ s/%([a-fA-F0-9][a-fAF0-9])/pack("C", hex($1))/eg ; $main'FORM{$name} = $value ; } } sub get_args_get { local($path_info)
= $ENV{'FILEPATH_INFO'} || $ENV{'PATH_INFO'}; @args = split('/', $path_info) ; foreach (@args)
{ tr/+/ / ; ($name, $value) = split(/=/, $_) ; $main'FORM{$name} = $value ; } } sub get_args { if
($ENV{'REQUEST_METHOD'} eq 'POST') { &get_args_post ; } elsif ($ENV{'REQUEST_METHOD'}
eq 'GET') { &get_args_get ; } else { &error_message("Illegal Request Method
$ENV{REQUEST_METHOD}, " . "This script must be referenced using METHOD GET or POST.") ; }
} # Look for evil characters sub check_for_evil_characters { # $my_character_set must be enclosed in
single (') quotes. local($str,$variable_name,$my_character_set) = @_ ; if (defined($my_character_set)) {
$ok_character_set = $my_character_set ; } else { $ok_character_set = '[a-zA-Z0-9_\-+. \t\/@%]' ; } if
http://www.eskimo.com/cgihelp/scripts/lib-cgi.pl.txt (1 of 2) [22/07/2003 5:57:19 PM]
$heading
RANDOM: ERROR!
#!/usr/local/bin/taintperl -- -*-perl-*- # # RANDOM - Generate random URL from a text file 'database.' #
By Jason A. Dour # # (c) 1995 Jason A. Dour & the University of Louisville # # Free for distribution,
copying, editing, and hacking under the GNU public # license. See file 'gnu.public.license.2.0' for specific
information. # This software comes with no guarantees implicit or implied, and the # author(s) of this
software cannot be held responsible for loss, damage, # acts of god(s), large amounts of small rodentia,
deafness, plague, # baldness, or nose-bleeds occurring as a direct -- or indirect -- # result of the use of
'random.' This software is to be used for # randomizin, peace, love, and spreading genuine feelings of well
being. # All other uses are denounced by the author(s). # # Love, Peace, Gerbils, & Hair Grease, # Jason
A. Dour # # Last modified on 08/14/97 by Erik C. Thauvin # # To call random, use something like: # #
http://www.eskimo.com/cgi-bin/random/~yourlogin/filename # # where "filename" is a world readable
file located in your public_html # directory, and containing the listing of URLs. #------------------------------------------------------------------------- # $ENV{'PATH'}="/bin:/usr/bin:/usr/lib"; $ENV{'IFS'} = ""; # #
Email address/name of contact for server problems. # $contact = "webmaster\@eskimo.com"; # # HREF
of the mail URL for the server contact. # $contact_url = "mailto:webmaster\@eskimo.com"; # # Get the
filename by reading the CGI env. variable. # &system_error("Incorrect CGI method! Use GET for proper
operation.") if ( $ENV{'REQUEST_METHOD'} eq "POST" ); &system_error("No list file provided!") if
( !$ENV{'PATH_TRANSLATED'} ); &system_error("List file does not exist!") if ( ! -e
$ENV{'PATH_TRANSLATED'} ); &system_error("List file cannot be accessed!") # if ( ! -r
$ENV{'PATH_TRANSLATED'} or ! -R $ENV{'PATH_TRANSLATED'} ); # Changed "or" to "||" -ECT 04/19/95 if ( ! -r $ENV{'PATH_TRANSLATED'} || ! -R $ENV{'PATH_TRANSLATED'} );
&system_error("List file has nothing in it!") if ( -z $ENV{'PATH_TRANSLATED'} ); # # Open the file
and read in all of its contents. So far, this has been # effective on files up to 1 Meg in size. Anything
larger, I am not certain. # open(LIST_FILE,$ENV{'PATH_TRANSLATED'}) || &system_error("File
cannot be opened!"); read(LIST_FILE,$list_buffer,(-s $ENV{'PATH_TRANSLATED'}));
close(LIST_FILE); # # Split the URL list. # @urls = split(/\n/,$list_buffer); # # Get a random array
number; the random seed being based upon time, and # current and parent process id's. # srand(
(time/$$)*getppid ); $location = int(rand($#urls+1)); # # Print out the random location if it is a valid
URL. # if ( grep (/^http/,$urls[$location]) || grep (/^file/,$urls[$location]) || grep (/^ftp/,$urls[$location]) ||
grep (/^gopher/,$urls[$location]) || grep (/^news/,$urls[$location]) || grep (/^telnet/,$urls[$location]) || grep
(/^tn3270/,$urls[$location]) || grep (/^mailto/,$urls[$location]) || grep (/^nntp/,$urls[$location]) || grep
(/^wais/,$urls[$location]) || grep (/^prospero/,$urls[$location]) ) { print "Location: $urls[$location]\n\n";
exit(0); } else { $error = $location+1; # &system_error("Invalid URL:
$urls[$location]
at line $error of $ENV{'PATH_TRANSLATED'}. Please let the maintainer of this file know this error
occurred."); # Changed to match standard CERN error format -- ECT 04/25/95 $urlval = $urls[$location];
$urlval =~ s/\&/\&\;/g; $urlval =~ s/\/\>\;/g; &system_error("Invalid URL -- $urlval at line $error of
$ENV{'PATH_TRANSLATED'}. Please let the maintainer of this file know this error occurred."); }; # #
Generic system error page. Will print out whatever error text you provide. # sub system_error { print
"Content-type: text/html\n\n"; # print ""; # print "
ERROR!
http://www.eskimo.com/cgihelp/scripts/random.txt (1 of 2) [22/07/2003 5:57:22 PM]
RANDOM: ERROR!
Error
\n"; print "$_[0]\n"; print "
\n"; print "If you feel this error is server related, please contact $contact.\n"; print "
\n"; print "
\n"; print "
RANDOM 1995 Jason A. Dour & the University of Louisville.
\n"; print "
\n"; print "\n"; exit(0); }
File Not
Found.
Eskimo North
Linux Friendly
Canada
Alberta
British Columbia
Ontario
Manitoba
Quebec
Alabama
Arizona
Connecticut
Delaware
Hawaii
Kansas
Massachusetts
Montana
New Mexico
Oklahoma
South Dakota
Utah
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Arkansas
District of
Columbia
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Colorado
Florida
Georgia
Indiana
Maryland
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Iowa
Maine
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Host Services
Game Hosting
NNTP
read/post
Shell Accounts
Virtual
Domains
Linux
MacIntosh
OS/2
Windows
Client Setup
I started Eskimo North as a single line BBS in 1982. Over
Access Services the years it has gradually evolved into the full featured
56k Dial
Access
Dual 56k Dial
Access
ISDN Access
DSL Access
Western
Washington Only While you are here, please take advantage of our two-week
free trial of our 56k, dual 56k, ISDN, and remote shell
Postal
Mail
Telephone
Numbers
Telephone: (206)
812-0051
Toll Free: (800)
246-6874
Facsimile: (206)
812-0054
E-Mail
Usenet News
Making a Website
CGI Help
Java Development
Kit
WWW info
If you are an Eskimo North subscriber, and you would like to add your business or organization to our directory,
please send the following information to support@eskimo.com.
Name
Organization or Business Name
URL for the site you want linked
URL for your logo
The (existing or new) category you would like us to list your link under
A short description of your business/organization
NOTE: Your logo MUST be no bigger than 100x100 pixels.
Have you ever wondered where the name "Eskimo North" came from?
Why does the owner call himself "Nanook?"
Here's your chance to find out the answer to those questions and
to learn more about the origin and evolution of Eskimo.
Dialup Settings
[Linux] [OS/2] [Mac] [Windows] [Dreamcast]
Server Settings
[Web Hosting] [PostgreSQL] [Email] [Usenet News]
Your Name:
Your Email Address:
Your Message to Eskimo North Support:
BCC Yourself
Clear Form
Send Mail
If you have Linux, connecting to the Internet is simple... Ok, after you get through
rolling around the floor we'll tell you how.
Requirements
Configuration
Dialing
Extra Tweaks
Alternately, O'Reilly & Associates Inc. has made chapter 6 ("Dial-out PPP Setup")
of their Using & Managing PPP book available on their website, including Linux
instructions:
http://www.oreilly.com/catalog/umppp/chapter/ch06.html
Requirements
First, if you need help properly configuring your modem, please see
http://www.redhat.com/mirrors/LDP/HOWTO/Modem-HOWTO.html to get your
modem properly configured. After that, it actually is pretty simple.
You must be sure you have a kernel with PPP support. Typically, kernels will have
PPP configured in as a module. If this is the case, you will need to type modprobe
ppp.o to load the module.
You will need copies of (both come with most recent distributions):
With these utilities, dialup connections can be setup either through the shells (as
root) or with graphical front-ends (in which case the rest of this page can be
bypassed) such as:
GNOME-PPP
KPPP
Another method of connecting with Linux uses 'pppd' directly. These instructions
are available as Linux-pppd.html.
Configuration
To edit configuration files manually, or to connect without running X in the above
manner, the files you will be concerned with are:
/etc/resolv.conf
/etc/wvdial.conf
To create the /etc/wvdial.conf file with the proper initial settings, you'll need to run
the following command (as root):
/usr/bin/wvdialconf
(Your distribution or installation may have placed the command in other locations;
if the above is "not found", try it under /usr/local/bin or others.)
Now edit /etc/wvdial.conf to change the following lines, removing the leading
colons and replacing your access number, username, and password in the
http://www.eskimo.com/support/Linux.html (2 of 5) [22/07/2003 5:58:30 PM]
appropriate locations:
; Phone = <Target Phone Number>
; Username = <Your Login Name>
; Password = <Your Password>
You can place multiple access numbers or accounts in the same file, in order to
setup alternate numbers, etc. To do this, simply add the following set of lines to
the file, choosing a name for the alternate connection:
[Dialer Alternate]
Phone = 555-1234
Username = en55555@usb.com
Password = mypassword
Since this file has passwords in it, be sure to set the permissions on this file so
that only root can read it:
chown root /etc/wvdial.conf
chmod 600 /etc/wvdial.conf
Dialing
http://www.eskimo.com/support/Linux.html (3 of 5) [22/07/2003 5:58:30 PM]
wvdial works differently than pppd itself; when run from a shell, it will continue to
log information from the connection to the shell window and terminate the call
when the user sends a break signal ("^C").
To run the connection manually, open a shell as root, and type:
/usr/bin/wvdial
If all is working, you will see the modem communication and connection
information on the terminal. Leave this terminal running (perhaps minimized if
running under X), and to disconnect the connection, come back to this terminal
and type "^C" (Ctrl-C).
To connect an alternate connection setting, add the name of the alternate after the
command (case-insensitive):
/usr/bin/wvdial alternate
Extra Tweaks
Automatic Redial
Since auto-redial can play havok with the lines when you're not actively using the
connection (idle disconnect, redial, idle disconnect, redial, etc.), please place the
mentioned line in the "Defaults" section above to disable it and use a separate
section to specifically enable it when you're closely montitoring the connection:
[Dialer Redial]
Auto Reconnect = on
In this way, the following will dial the same "Default" or "Alternate" numbers and
username/password combinations, but will redial if disconnected:
/usr/bin/wvdial redial
/usr/bin/wvdial alternate redial
Remember to "kill" the session ("^C") if leaving the computer idle or unmonitored
http://www.eskimo.com/support/Linux.html (4 of 5) [22/07/2003 5:58:30 PM]
for long periods, as this would then tie up your own phone lines as well as ours.
[Linux Links]
[Home]
http://www.apache.org/
Apache HTTP Server Project
Web server we use under UltraLinux
http://www.applix.com/
Applix
Applix makes an office suite for Linux.
Linux and
Unix
Resources
Technically and legally
speaking Linux is not Unix.
However, we run both side
by side, the operational
differences are not
significant. This section
contains links for resources
applicable to both.
If you have
recommendations for sites to
be linked to this section or if
you run into a broken link,
please e-mail
nanook@eskimo.com.
The web server providing
this document to you is a
Sun Ultra-1 with a single
http://linux.corel.com/linux8/index.htm
Corel
Corel Word Perfect 8 for Linux
http://www.cygnus.com/
Cygnus Solutions Development Tools
ANSI C and Java Development Tools for Linux
http://www.debian.org/
Debian
Debian Linux Distribution Home Page
http://www.eskimo.com/~dwalker/manpage.html
Eskimo North Man Pages
Manual Pages for Unix and Applications
http://www.qsl.net/f3wm/
F3WM Web Site
Amateur Radio, Linux for Novices, Humour... and more. This site
is mostly in French
http://www.gnu.org/
GNU software
GNU Free Software Foundation
http://howto.linuxberg.com/faq.html
HowTo!
TuCows Linux Resource Center
http://java.sun.com/
Java Home Page
Sun's Home Page for Java Language
http://www.wagoneers.com/UNIX
John's pages on Unix, Linux, and System Administration
John Meister provided many of the URL's on this site. There are a
lot of good resources at this site but John apparently dislikes
writing Index files so you'll have to navigate through directories.
http://linux3d.netpedia.net/
Linux 3D
A page for 3D modeling and rendering enthusiasts using Linux
http://www.kalug.lug.net/linux-admin-FAQ/Linux-AdminFAQ.html
Linux Administrators FAQ
This is the list of Frequently Asked Questions on the linux-admin
mailing list at vger.rutgers.edu.
http://linuxcentral.com/
Linux Central
Mostly commercial Linux resources
http://metalab.unc.edu/LDP/
Linux Documentation Project
The Linux Documentation Project (LDP) is working on developing
good, reliable documentation for the Linux operating system.
http://www.cl.cam.ac.uk/users/iwj10/linux-faq/index.html
Linux Frequently Asked Questions with Answers
This is the list of Frequently Asked Questions about Linux, the free
Unix for 386/486/586
http://www.linuxgazette.com/
Linux Gazette
Making Linux just a little more fun and sharing ideas and
discoveries.
http://www.linux-howto.com/
Linux How-To
Linux Resource Center
http://www.li.org/
Linux International
Linux International is a non-profit association of groups,
corporations and others that work towards the promotion of and
helping direct the growth of the Linux operating system and the
Linux community.
http://www.linuxjournal.com/
Linux Journal
The Monthly Magazine of the Linux Community.
http://www.linuxbazis.hu/en/
Linux Links Base
Organized Links to Linux Resources on the Web.
http://www.croftj.net/~goob/
Linux Links by Goob
Another collection of links to various Linux resources
http://www.linuxmall.com/
Linux Mall
Commercial Linux related Products and Resources
http://www.linux-me.org/
Linux Middle East
A place where Linux resources can collaborate to create an
infrastructure to help your own Middle East Linux community
while providing a help and service to all other Linux users from all
over the world.
http://www.linuxnow.com/
Linux Now
Linux Reference
http://www.linux.org/
Linux Online
Almost everything you wanted to know about Linux
http://www.internetwk.com/linux
Linux Resource Center
Internet Week's Linux Resource Center - Good to see the
mainstream media is acknowledging that there IS an alternative to
NT.
http://www.ecst.csuchico.edu/~jtmurphy/
Linux Security Homepage
Lots of good Linux security information
http://webwatcher.org/
Linux Web Watcher
Watching 15 Categories, 69 Groups, and 466 Pages. Automatically
updated daily.
http://www.performance-computing.com/Linux-IT/
Linux-IT
News, columns, and exclusive features focusing on Linux and its
role in your enterprise.
http://www.linuxweb.com/
LinuxWeb
Linux Information
http://www.redhat.com
RedHat
RedHat Linux distribution home page. We use RedHat Linux on
our Sun Sparc and PC systems here.
http://www.redhat.com/mirrors/LDP/HOWTO/
RedHat
RedHat Linux HOWTO archives; manuals and tutorials.
http://www.slackware.com/
Slackware
Slackware Linux distribution home page.
http://www.ssc.com/
Specialized Systems Consultants, Inc.
Great pocket references on various Unix related topics, books, and
numerous other publications. They really helped us out early on by
supplying boxes of references free that we were able to give out to
our customers. Lots of good Linux and Unix info.
http://www.samag.com/
Sys Admin
The journal for Unix system administrators
http://www.gimp.org/
http://www.eskimo.com/links/linux.html (4 of 5) [22/07/2003 5:58:34 PM]
The Gimp
GNU Image Manipulation Program - Most of the graphics here
were produced with Gimp.
http://www.cs.uml.edu/~acahalan/linux/22wishlist.html
The Linux 2.1/2.2 Wish List
Tradition requires a great big list of features that might be nice for
Linux to have. Well, here is one collected from the Linux kernel
mailing list.
http://www.ultralinux.org/
UltraLinux HomePage
Linux for Sparc Processors
http://pubweb.bnl.gov/people/hoff/
Unix MIDI Plug-In
MIDI Plug-in for Netscape running under Linux/Unix
http://www.vierra.net/linux.htm
VIERRA.Net
Builds Linux Servers and has links to Linux Resources
http://vancouver-webpages.com/vanlug/
Vancouver Linux Users Group
Linux Users Group located in Vancouver BC
http://linuxresources.com/what.html
What is Linux?
A brief overview of Linux by the Linux Journal
http://www.yggdrasil.com/
Yggdrasil
Support for Free Software
http://egcs.cygnus.com/
egcs project home page
New maintainers of GCC Compiler
CAROLINE'S
2002
CALENDAR
January
February
March
April
May
June
July
August
September
October
November
December
EMAIL Caroline
Click on the rolodex image above to access a random Eskimo North User Page.
NOTE: This link will bring up a frames version of the randodex. This will allow you
to hit one random URL after another - simply by clicking on the "next" link.
However, this will not allow you to readily bookmark a site you get to. (If you have a
recently released browser (e.g. Netscape Communicator) you should be able to
right click inside the main frame to bookmark that frame). To bring up the randodex
in a site without frames, just use the "Random URL" link at the bottom of this page.
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
Linux Friendly 56k V.90 and Dual-56k (112k MLP) Dial Access
Included Services
56K or 112K dial access includes, Shell access, E-mail, 2-mail lists, Usenet News, web hosting with CGI and
SSL, FTP hosting, and two IRC bots.
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
British Columbia
Manitoba
Ontario
Quebec
Arkansas
District of Columbia
Illinois
Louisiana
Minnesota
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
California
Florida
Indiana
Maine
Mississippi
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
Colorado
Georgia
Iowa
Maryland
Missouri
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Arizona
Delaware
Idaho
Kentucky
Michigan
Nebraska
New York
Oregon
Texas
Washington
Setup
You will be called to confirm the information on your free trial account application. A 56k or dual-56k dial access
Username and Password will be e-mailed to you at the address specified on your application after successful
confirmation. The dial access Username and Password is used only for establishing your dial-up connection.
The login and password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP
read/post access, shell access, ftp access to your account, etc.
For help with specific operating system setup, click on one of the following links:
Linux
Windows 95/98
MacOS
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no refund.
56K Dial Access Linux Friendly! ISDN, Shells, USA & Canada
Since a free trial period is offered and lower rates are given for long term accounts, we do not provide refunds.
This is true even if your account is terminated for abuse (activities listed above).
Dial Plan L
City
403-4101065
780-990Edmonton
1234
403-380Lethbridge
5325
Medicine 403-528Hat
1900
403-309Red Deer
1100
Calgary
Monthly
User Realm
Hours
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
403-781V.90
5200
780-423Edmonton
V.90
5600
403-380Lethbridge
V.90
5325
Calgary
1
1
1
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Medicine 403-528V.90
Hat
1900
403-309Red Deer
V.90
1100
1
1
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
403-410V.90
1065
403-781Calgary
V.90
5200
Calgary
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Calgary
403-781V.90
5200
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
For Individuals
NNTP read / post Usenet access is provided with remote, 56k, dual 56k, ISDN dial
access, DSL, and frame relay accounts. Individuals looking for Usenet read/post
access should choose one of these account types. If you get internet access
elsewhere, consider a remote shell account. With frame relay and business class
DSL, access is provided for an entire domain.
For Organizations
NNTP Read/Post Access (Per Domain)
Month
Quarter
Half-Year
One Year
Two Year
$70
$180
$330
$600
$1080
Longer period subscriptions cost less and reduce time that must be spent doing accounting
each day allowing more time to be spent improving services and addressing customer concerns.
We provide NNTP read/post access to other sites, ISPs, businesses, BBSs, and
other organizations, for $600/year, or $180/quarter, or $70/month per domain. If
you're looking for a way to provide Usenet News services to your customers
without the expense of maintaining your own news server and you are looking for
a solution that doesn't impose a per subscriber or per megabyte charge consider
using our news service as an option.
Server Hardware
Our news server is a quad-CPU (4 CPU) Sun 4/670MP equipped with 384MB
RAM, Ross-RTK625 90mhz Sparc processors and a Performance Technologies
SBS450 ultra-wide disk controller for the news spool drives. We presently have 4
Seagate 47GB ultra-wide SCSI drives stripped for the news spool. After formatting
and file system overhead, this provides a 180 gigabyte spool.
Server Software
We are using INN version 2.2.2 news server software running under Linux version
2.2.14.
Planned Upgrades
We are planning to upgrade the CPU to an UltraSparc system in the near future
(hardware is purchased, here, and we are in the process of getting it setup).
Newsgroups and Retention
We carry a full news feed with more than 64,000 news groups. There is a two day
retention on alt.binaries, a two week retention on other groups, with the exception
of local groups and some groups used for FAQ's which are retained for one
month, and test groups and control groups which are retained for a fraction of a
day.
Newsfeeds
We have a satellite and terrestrial feed from Cidera, (aka
Skycache) and we also have a newsfeed from Sprint. I can't
say enough good things about the folks from Cidera. The
installation was painless and it has worked flawlessly. They
are the most pleasant and technically competent folks I've encountered in all my
professional dealings. Approximately 85% of the news articles arrive via Cidera
and the bulk of remaining 15% arrive from Sprint.
News Related Shell Services
News Readers
Nn
Nn-Tk
Pine
Readnews
Rn
Tass
Tin
Trn
Vnews
Xnews
News Posters
Postnews
Pnews
QWK
(uqwk)
SOUP
(uqwk)
ZipNews
(uqwk)
Aub
There is a large collection of other utilities that can be used for news manipulation
and processing. All shell accounts include crontab access so you can do things
like schedualing automatic binary extraction and ftp of those files to your system.
There are utilities for gatewaying news to mail, image conversion and
manipulation, text processing, etc.
Try it Free!
Apply for a Free Two-Week Trial of our remote shell account or dial or ISDN
access service. This will allow you to try out the news server and our other host
services without any cost or obligation. Just click on "Free Two-Week Trial" and fill
out the trial application today.
British Columbia
Manitoba
Ontario
Quebec
Arkansas
District of Columbia
Illinois
Louisiana
Minnesota
California
Florida
Indiana
Maine
Mississippi
Colorado
Georgia
Iowa
Maryland
Missouri
Arizona
Delaware
Idaho
Kentucky
Michigan
Montana
New Mexico
Oklahoma
South Dakota
Utah
Nebraska
New York
Oregon
Texas
Washington
Nevada
North Carolina
Pennsylvania
Tennessee
West Virginia
New Hampshire
North Dakota
Rhode Island
Vermont
Wisconsin
New Jersey
Ohio
South Carolina
Virginia
Wyoming
Setup
You will be called to confirm the information on your free trial account application. An ISDN Username and
Password will be e-mailed to you at the address specified on your application after successful confirmation.
The ISDN Username and Password is used only for establishing your dial-up connection. The login and
password you supplied will be used for all host related functions such as pop-3 e-mail, NNTP read/post access,
shell access, ftp access to your account, etc.
For help with specific operating system setup, click on one of the following links:
Linux
Windows 95/98
MacOS
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no refund.
Excepting dial plan L 24x7 service, dial acess is not intended for running servers. We bundle hosting
services free with your dial-up account for those needs. Client-Server applications like Napster, WinMX,
Net Meeting, are permissable so long as you only leave them up while you are actively using them.
Since a free trial period is offered and lower rates are given for long term accounts, we do not provide refunds.
This is true even if your account is terminated for abuse (activities listed above).
Dial Plan L
City
780-990V.90
1234
780-423Edmonton
V.90
5600
Edmonton
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Edmonton
780-423V.90
5600
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
100 Mile
House
250-3952740
604-5040577
250-2866315
250-3654957
250-7882835
604-7952970
250-3346220
250-3385726
250-4170276
250-4026803
604-8945936
250-7826018
Abbotsford
Campbell
River
Castlegar
Chetwynd
Chilliwack
Courtenay
Courtney
Cranbrook
Creston
D'Arcy
Dawson
Creek
Monthly
User Realm
Hours
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@eskimo.com
Canada
V.90
V.90
V.90
V.90
V.90
Duncan
Fernie
Fort St.
John
Grand
Forks
250-715V.90
1841
250-529V.90
7533
250-785V.90
8613
250-4422752
604-869Hope
3508
250-374Kamloops
9595
250-762Kelowna
8623
250-632Kitimat
6497
250-256Lillooet
0379
250-997Mackenzie
5762
250-378Merritt
3454
250-753Nanaimo
8497
250-352Nelson
2061
250-468Parksville
5759
250-492Penticton
8539
Port
250-724Alberni
6045
Prince
250-562George
2160
Prince
250-627Rupert
5334
250-295Princeton
0312
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
250-9925936
Salt Spring 250-537Island
8795
604-740Sechelt
8249
250-847Smithers
8013
604-892Squamish
5793
250-635Terrace
4624
250-368Trail
6721
604-688Vancouver
1746
250-567Vanderhoof
4831
250-542Vernon
9562
250-361Victoria
4138
604-932Whistler
4094
Williams 250-392Lake
4099
Quesnel
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Dial Plan W
City
604-557V.90
5780
604-702Chilliwack
V.90
3080
250-334Courtenay
V.90
6220
Abbotsford
1
1
1
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
250-5715680
250-717Kelowna
2380
250-741Nanaimo
9470
Prince
250-614George
7980
604-609Vancouver
3300
250-503Vernon
3330
250-361Victoria
9222
Kamloops
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
100
250-395Mile
V.90
2740
House
Monthly
User Realm
Hours
Dial Plan W
City
Monthly User
Hours Realm
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
FreePPP FAQ
Rockstar Software
MIT Info-Mac HyperArchive
Massachusetts Institute of Technology
UMich Mac Archive
University of Michigan
Helpful Tools
Fetch FTP Client
Dartmouth College
BetterTelnet
Sassy Software
Telnet3
Kevin Grant
[Home]
The following are comprehensive instructions on using the iMac and G3 "Internet
Setup Assistant" to connect to Eskimo North.
The included screenshots walk through the setup for 56k dialup connections. For
connections on the slower-speed lines (Western WA only), a dialin script is
needed, and we'll be happy to walk you through the recording and use of the
script(s) either in email to support@eskimo.com or at the support lines 206-8120051 or 800-246-6874.
56k and ISDN dialup accounts need no script for the connection. Click 'No' here,
and click the right-arrow to continue.
Our Domain Name Server (DNS) are 204.122.16.8 and 204.122.16.9. You can hit
enter in the large box to get a second line. Leave "Domain Name" blank, and click
the right-arrow to continue.
Our POP3 server for downloading mail is called pop3.eskimo.com; here, you'll
need to place your login name and an @ symbol before that, as shown below. The
SMTP server for sending mail is mail.eskimo.com -- you may need to use
smtp.eskimo.com if this doesn't work on your connections. Click the right-arrow
to continue.
We do not run any proxies on our dialups. Click 'No' here, and the right-arrow to
continue. One window to go...
That's it. You're now ready to connect. If you're not connecting during this setup,
or need to connect in the future, you'd need to go to the Remote Access control in
the Control Panel.
Happy surfing!
[Mac Setup]
[Home]
http://www.eskimo.com/support/MacOS/isa-intro1.gif
http://www.eskimo.com/support/MacOS/isa-intro2.gif
http://www.eskimo.com/support/MacOS/isa-intro3.gif
http://www.eskimo.com/support/MacOS/isa-1.gif
http://www.eskimo.com/support/MacOS/isa-2.gif
http://www.eskimo.com/support/MacOS/isa-3.gif
http://www.eskimo.com/support/MacOS/isa-4.gif
http://www.eskimo.com/support/MacOS/isa-5.gif
http://www.eskimo.com/support/MacOS/isa-6.gif
http://www.eskimo.com/support/MacOS/isa-7.gif
http://www.eskimo.com/support/MacOS/isa-8.gif
http://www.eskimo.com/support/MacOS/isa-9.gif
http://www.eskimo.com/support/MacOS/isa-10.gif
http://www.eskimo.com/support/MacOS/isa-11.gif
The following are comprehensive instructions on using the MacTCP (or TCP/IP)
found in the control panel and ConfigPPP. This combination is used commonly on
Mac systems before 7.5.3.
Click on the PPP icon. Then click on More. Our Name Server addresses are
204.122.16.8 and 204.122.16.9; the domain for each can be set to eskimo.com.
Other settings will be set by the server once you connect. Click OK to continue on.
http://www.eskimo.com/support/MacOS/ConfigPPP.html (1 of 5) [22/07/2003 6:01:25 PM]
Click the "New" button, and type "Eskimo North" as the connection name ("POP"),
and click OK. Then choose "Config". For "Port Speed", use a speed just larger
than your modem speed (28.8k should be 38400 or 57600; 56k should be
115200). "Flow Control" should be "CTS & RTS (DTR)". Access lines can be found
at www.eskimo.com.
Emulated PPP:
NOT NEEDED FOR 56k or ISDN
Click the "Authentication" button. The name and password to use is one of the
following pairs of information:
Your 56k/ISDN login and password for accessing the 56k/ISDN lines,
Your email login and Non-Dedicated PPP password for those with the PPP
setup on the v.34 lines (Western WA only), or
Your email login and email password for those using Emulated SLIP/PPP
on the v.34 lines (Western WA only) or through a remote shell.
Click OK, then Done. You will now be back to the ConfigPPP window, where you
can click the "Open" button to connect.
Happy surfing!
[Mac Setup]
[Home]
http://www.eskimo.com/support/MacOS/mactcp.gif
http://www.eskimo.com/support/MacOS/tcp-address.gif
http://www.eskimo.com/support/MacOS/configppp.gif
http://www.eskimo.com/support/MacOS/ppp-server.gif
http://www.eskimo.com/support/MacOS/script-emu.gif
http://www.eskimo.com/support/MacOS/script-real.gif
http://www.eskimo.com/support/MacOS/ppp-login.gif
If you already have an Eskimo Dial-Up entry, click on "Modify Entry" and select the Eskimo
Entry. If you do not yet have an Eskimo Dial-Up entry, select "Add Entry"...
Enter your Login ID (in all lowercase) and your password (which is case sensitive) in the
appropriate fields.
Enter your local Eskimo Dial-Up number.
The login sequence is very important. What you enter here depends on the type of account you
have.
If you have an Emulated SLIP/PPP Connection, you will need to enter the following login
sequence:
4
ogin:
[LOGINID]
word:
[PASSWORD]
ommand
i.s\s-P
eady
If you have an Non-Dedicated SLIP/PPP Connection, you will need to enter the following login
sequence:
p
name:
[LOGINID]
word:
[PASSWORD]
PPP
If you have purchased a 56k Dial-Up account, leave the login sequence field blank.
Under "Connection Type," select "PPP"...
The "Inactivity Timeout Option" doesn't really matter - because we have our own 12-minute idle
timeout here. Our modems will log you out after 12 minutes of inactivity.
If you have purchased a 56k Dial-Up account, enter "yourlogin.ndip.eskimo.net" in the "Your
Domain Name" field.
NOTE: Your login is the same as your Eskimo login name. Your password is the same as your
Eskimo password. If you have purchased a 56k Dial-Up account, use your other
(shell/emulated password) here.
[Home]
http://www.eskimo.com/support/OS2/os2-1.gif
Emulated SLIP/PPP
Emulated SLIP/PPP is included with all shell accounts. Real SLIP/PPP is available
with all dial access accounts.
Emulated SLIP/PPP does not provide your host with it's own IP address and name on
the net. Consequently some services which require incoming connections to the host
can not be supported, especially if they involve connections to random ports.
Additionally the performance of emulated SLIP/PPP will vary with the load on the
server that is providing the emulation. With emulated SLIP or PPP all connections
appear to originate from our hosts.
Non-Dedicated SLIP/PPP
With V.34 dial-up service we offer non-dedicated SLIP/PPP. Non-dedicated simply
means that the service is not dedicated to any one modem, you connect through a
shared modem pool. We also offer dedicated PPP service in Western Washington that
provides a line and modem dedicated to your use and the ability to route multiple IP
addresses and maintain a full-time dialup connection.
time permits, which may be days or even weeks if things are busy.
This service is bundled into your v.34 dial access account. 56k, dual channel bonded
56k, and ISDN accounts connect via PPP only. For 56k and faster services dynamic
IP addresses are used. That means that each time you connect an IP address is
assigned for that session only. Static IP addresses are available for 56k and faster
services for an additional $5/month. Only one IP address can be routed and it is
restricted to your "home" POP.
Dedicated SLIP/PPP
Dedicated SLIP and PPP allows full-time Internet connectivity.
Rates
There is a one-time setup fee of $100, if you wish a domain name that is not a sub-net of
eskimo.net, then there is an additional $70 fee charged by the NIC for domain registration. See
domain registration information for details.
Dedicated SLIP/PPP
Month
Quarter
$100
$270
Half-Year
$540
One Year
$1020
Two Year
$1920
There is also a fee of $100/month for this service, payment for the first month must be included.
Dedicated PPP is also available at a quarterly rate of $270/quarter and an annual rate of
$1020/year, $1920 for 2 years.
With this account you also get an interactive login here. If you have an existing login that you
would like to be included, please include your existing login. Otherwise please include a
preferred login and password. Note that logins must be 3-8 characters in length, start with an
alphabetic character and contain only alphabetic and numeric characters. Logins must be typed
in all lower case.
Passwords may be 5 characters or longer, and may contain any printable character. The richer
the character set the better. That is, a mix of lower and UPPER case letters, numbers, and
punctuation is recommended. Your password should not be the same as your login, nor should
it be derived from your login, name, phone number, or anything else commonly known about
you.
Dedicated SLIP and PPP is available in the Seattle telephone area only.
fee.
Carolyn Blalock
Phone: 206-224-5562
Email: cblaloc@uswest.com
Eskimo North
Post Office Box 55816
Shoreline, WA 98155-0816
Telephone
Telephone: 206-812-0051
Toll Free: 800-246-6874
Facsimile: 206-812-0054
Or by E-mail
Email: support@eskimo.com
Coming soon!
Eskimo North will soon be providing Web Desgn Services and possibly some new
design/hosting packages. Keep an eye on this page for updates!
Game Hosting
Net Game Hosting
Month
Quarter
Half-Year
One Year
Two Year
Five Year
$25
$72
$138
$264
$504
$960
Longer period subscriptions cost less and reduce time that must be spent doing accounting
each day allowing more time to be spent improving services and addressing customer concerns.
A game account hosts a Unix network game such as a MOO, MUCK, MUSH, or MUD or a
telnettable BBS.
One remote shell account with 1GB of disk quota plus web hosting with CGI so you can create a
web page or interface for your game. GNU C, and C++, Fortran, and JDK Java. Non-terminating
game processes are permitted on game accounts. And by request your game will be
schedualed for automatic startup after a system re-boot.
The environment is a Sun SS-10 with 512MB of RAM, and quad Ross Hypersparc RTK-625
CPUs with a FDDI backbone connection running SunOS 4.1.4.
Acceptable Use
The following activities are expressly prohibited and will result in account termination with no
refund.
Since a free trial period is offered and lower rates are given for long term accounts, we do not
provide refunds. This is true even if your account is terminated for abuse (activities listed
above).
Please send payment with your login, real name, telephone number, address, and
type of service desired, to:
Eskimo North
P.O. Box 55816
Shoreline, WA. 98155-0816
If you are sending payment for a new account, please include both a login and a
password.
There will be a $25 charge for any returned checks. We reserve the right to refer bad
checks to a collection agency.
Your login:
Your password:
Alphabetic
characters [a-z]
Numeric characters
[0-9]
Dash [-]
We will accept cash but checks or money orders are preferable for mailed payments.
Payments can also be made at Eskimo North user meetings which are held on the
third sunday of every month at Ballard Godfathers starting at 2:30pm.
Your subscriptions make the continued operation and improvement of Eskimo North
possible. Thank you.
Dial Plan L
City
604-504V.90
0577
604-557Abbotsford
V.90
5780
Abbotsford
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Abbotsford
604-557V.90
5780
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
http://www.eskimo.com/support/OS2/os2-2.gif
http://www.eskimo.com/support/OS2/os2-3.gif
http://www.eskimo.com/support/OS2/os2-4.gif
Dial Plan L
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Monthly User
Hours Realm
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Winnipeg
204-956V.90
1440
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Windows 95/98
Windows NT
Windows XP
[Home]
http://www.eskimo.com/support/Win31.html
OS/2
MacOS
Win95
WinNT
Win 3.1
FAQ
http://www.eskimo.com/links/anime.html
Anime and Manga Related Web Sites
http://www.eskimo.com/links/cg.html
Computer Graphics
http://www.cprextreme.com/
Extreme Gaming for Extreme Gamers
http://www.igamer.net/
When It's Time To Get Back @ Reality
http://www.libraryspot.com/
Convenient gateway to more than 2,500 libraries around the world
http://www.eskimo.com/links/linux.html
Linux and Unix Resources
http://www.newnet.net/
Eskimo North founded the NewNet IRC network in 1996.
http://www.eskimo.com/links/science.html
Science - Mainstream & Fringe
http://community.animearchive.org/
Animanga Community
A meeting place for anime and manga fans on the Internet
http://www.animecastle.com/
Anime Castle
Video, Music, Books, Wall Scrolls, Cels
http://anime-genesis.com/
Anime Genesis
Not bad image gallary but banner ads everywhere
http://www.anipike.com/
Anime Web Turnpike
Guide to anime and manga on the Internet
ANI
ME
The character pictured
above is Lum from the
series Urusei Yatsura
(Those Obnoxious Aliens).
If you have
recommendations for sites
to be linked to this section
or if you run into a broken
link, please e-mail
nanook@eskimo.com.
http://www.csclub.uwaterloo.ca/u/mlvanbie/anime-list/
Anime and Manga Resource List
A long list of anime and manga web resources
http://members.aol.com/tprice1995/anime.html
AnimeLink
"Industrial Strength Anime Website"
http://freethought.site.yahoo.net/freethought/
AnimeNation
Anime related merchandise
http://www.animeigo.com/
Animeigo
You can buy Urusei Yatsura and other great anime here.
http://www.appliedfantasy.net/index_nav.html
Applied Fantasy
Comparitive reviews of Anime, another Anime addicts page worth a
look.
http://www.grassroots1776.com/Skuld.html
Digestible Dojinshi of SKULD!
Ah My Goddess Info and Fanfics
http://home.interlink.or.jp/~lum/
LUM the bodyguards maximum cadre society Okawa Branch
Urusei Yatsura Home Page Another site devoted to the awesome
Anime Series Urusei Yatsura
http://www.maison-otaku.net/
Maison Otaku
A creative outlet for Maison otaku members
http://www.geocities.com/Tokyo/Harbor/4672/
Michael's Urusei Yatsura Page
More Urusei Yatsura Information
http://www.eskimo.com/~nekocon/
Neko-Con R Main Access Terminal
Nekocon anime convention site
No annoying
Geocities/Tripod style popups plauging people who
visit your web site here.
You retain the intellectual
rights to your creations. No
data export fees.
Please click on Eskimo
North at the top of the page
to visit our home page.
http://www.tcp.com/doi/seiyuu/nogami-yukana.html
Nogami Yukana
She had a voice roll in Moldiver, Vampire Hunter, and a voice and
live acting role in Hyper Dolls (which was a fun combination of
Amine and live performance).
http://otakuworld.com/
Otaku World
Anime and Manga Theme Guide, Reviews, Import Games, Desktop
Themes, Toys, and Links to other Anime Sites.
http://www.a-kon.com/
Project: A-Kon
An anime convention site
http://www.stormloader.com/two/
Pseudom Studio
This isn't Anime since it's made in America, but it is an attempt to
create an Anime style series.
http://www.geocities.com/Tokyo/Flats/3596/movie.htm
Quicktime Anime Clips
Computer
Graphics
This area is pretty broad and covers
basically any graphics that have been
generated using a computer.
Many of the sites here are drawings
by Japanese artists that resemble
Anime and Manga characters. In
some cases they are Anime or Manga
characters. In part that stems from
the fact that the way I found many of
these sites was by following links
from Anime sites.
It seems to be fairly common for
many of the creators of the graphics
web site to celebrate various hits
milestones with special graphics.
http://www.na.rim.or.jp/~autumn/
Autumn's Web Page **
Mostly cutesy characters
http://www2e.biglobe.ne.jp/~earth/
Earth CG Gallery **
http://www.taremeparadise.com/
Tareme Paradise **
Tarame (droopy eyed) characters
http://www.alchemylab.com/
Alchemy Lab
Alchemy Information
http://www.livelinks.com./sumeria/phys/grav3.html
All about Gravitational Waves
This person theorizes that gravitational waves are responsible
for noise in electronic devices. I am skeptical since the fact that
cooling a device reduces noise supports the more conventional
theory of a thermal origin to electronic noise.
Science
Mainstream
& Fringe
Links to various science related
sites.
If you have recommendations
for sites to be linked to this
section or if you run into a
broken link, please e-mail
nanook@eskimo.com.
http://www.eskimo.com/links/astro.html
Astronomy & Astrophysics
Links to various sites dealing with Astronomy and Astrophysics
http://www.blacklightpower.com/tinsight.html
Blacklight Power Inc.
An alternative explanation for the "cold fusion" phenomena that
explains the energy production as the result of hydrogen being
catalytically dropped to a lower energy level than it's natural
ground state
http://vishnu.nirvana.phys.psu.edu/
Center for Gravitational Physics and Geometry
Pointers to various resources maintained by Penn State's
Department of Phyics. Mostly stuff you can order for a fee.
http://www.fas.org/
Federation of American Scientists
The Federation of American Scientists is engaged in analysis
and advocacy on science, technology and public policy for
global security.
http://www.howstuffworks.com/
How Stuff Works
Ever wondered how stuff works? Here is a site that tells you!
http://www.geocities.com/CapeCanaveral/Lab/1287/
Jims Free Energy Page
Alternative and over-unity energy resources
http://www.trufax.org/
Leading Edge International Research Group
I think there is more entertainment value at this site than
information value but it is an interesting site.
http://www.accessnv.com/nids/
National Institute for Discovery Science
The National Institute for Discovery Science (NIDS) is a
privately funded research organization. It focuses on scientific
exploration that emphasizes emerging, novel, and sometimes
unconventional observations and theories.
http://www.weburbia.demon.co.uk/pg/theories.htm
New and Alternative Theories of Physics
The purpose of this page is to provide a list of links to various
new or alternative theories of physics which appear on the web.
http://www.princeton.edu/~pear/
Princeton Engineering Anomalies Research
The Princeton Engineering Anomalies Research (PEAR)
program was established at Princeton University in 1979 by
Robert G. Jahn, then Dean of the School of Engineering and
Applied Science, to pursue rigorous scientific study of the
interaction of human consciousness with sensitive physical
devices, systems, and processes common to contemporary
engineering practice.
http://www.df.uba.ar/~hprel/
Quantum Theories and Gravitation Group (QTGG)
Interesting material in the "local reprints" section. Most in of
the documents are in postscript format.
http://www.amasci.com/
Science Hobbiest
This is one of the best science related sites on the net, both for
mainstream and fringe science. Lots of good practical nuts and
bolts information for the hobbiest, science for kids, deep
mainstream science and fringe science all very neatly
catagorized, with pointers to many other resources including
mail lists.
http://www.sciam.com/
Scientific American
The magazine is on the web with some extended information
not present in the hardcopy.
http://www.sfgate.com/offbeat/machine.html
Strange Machines
Information about various strange devices and machines that
nobody ever seems to be able to find enough information on to
totally reproduce in the present. I'd love to see a working Searl
Disk, or Hindershot Device, but I haven't yet. It's still fun to
read about them.
http://www.pppl.gov/TFTR/
TFTR Project Home Page
Home page of the now out of service Tokamak Test Fusion
Reactor. Methods of removing the nuclear "ash", materials
research on materials that would withstand the constant neutron
bombardment that would be experienced in a commercial
reactor, and increased knowledge of plasma physics all resulted
from the operation of this reactor.
http://www.eskimo.com/links/ufo.html
UFO
UFOs and related links such as crop circles, alien abductions,
UFO-government conspiracy theories, Area 51, the Roswell
crash, and other related phenomena
http://unisci.com/
UniSci
Daily University Science News
Astronomy &
Astrophysics
Links to astronomy and
astrophysics sites and related
topics.
If you have recommendations for
sites to be linked to this section
or if you run into a broken link,
please e-mail
nanook@eskimo.com.
http://astrosurf.com/neptune/astropix/index.html
Catching the Light
This is a web site of deep-sky astronomical
photographs, tips and techniques for
astrophotography, and digital enhancement in
Photoshop
http://cnn.com/TECH/9706/pathfinder/index.html
CNN Exploring Mars
CNN report about the 1997 Pathfinder mission to
Mars
http://www.eso.org/outreach/info-events/halebopp/comet-hale-bopp.html
Comet Hale-Bopp
ESO Homepage for the unusual Comet 1995 O1
(Hale-Bopp)
http://mpfwww.jpl.nasa.gov/default.html
Mars Pathfinder
JPL's web site covering the Mars Pathfinder
Mission
http://quest.arc.nasa.gov/mars/
Mars Team Online
NASA site dealing with Mars Missions and Data
http://www.solar.ifa.hawaii.edu/MWLT/mwlt.html
Mees White Light Telescope
Observations of the Sun in white light
Dial Plan L
City
705-7225354
613-962Belleville
9198
519-753Brantford
9478
613-342Brockville
3151
519-358Chatham
2245
613-930Cornwall
9164
519-822Guelph
8580
905-521Hamilton
2211
613-547Kingston
5479
Kitchener
519-884Waterloo
0239
519-660London
4600
905-953Newmarket
0217
Barrie
Monthly
User Realm
Hours
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
705-4953117
905-404Oshawa
6515
613-594Ottawa
9044
519-371Owen Sound
6472
613-732Pembroke
5637
705-740Peterborough
9692
519-337Sarnia
2567
Sault Ste
705-253Marie
3280
905-685St Catharines
1021
705-673Sudbury
1028
807-624Thunder Bay
0400
416-603Toronto
4955
519-973Windsor
0134
North Bay
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Barrie
Belleville
Brantford
1
1
1
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
613-3423151
519-358Chatham
2245
613-930Cornwall
9164
519-822Guelph
8580
905-521Hamilton
2211
613-547Kingston
5479
Kitchener
519-884Waterloo
0239
519-858London
8451
905-953Newmarket
0217
705-495North Bay
3117
905-404Oshawa
6515
613-594Ottawa
9044
519-371Owen Sound
6472
613-732Pembroke
5637
705-740Peterborough
9692
519-337Sarnia
2567
Sault Ste
705-253Marie
3280
905-685St Catharines
1021
705-673Sudbury
1028
807-624Thunder Bay
0400
Brockville
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Toronto
Windsor
416-368V.90
2622
519-973V.90
0134
1
1
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City Number Protocol Channels Network
705-722V.90
5354
705-737Barrie
V.90
4293
Barrie
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Barrie
705-737V.90
4293
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Belleville
613-962V.90
9198
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Belleville
613-962V.90
9198
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Brantford
519-753V.90
9478
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Brantford
519-753V.90
9478
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Brockville
613-342V.90
3151
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Brockville
613-342V.90
3151
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Chatham
519-358V.90
2245
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Chatham
519-358V.90
2245
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Cornwall
613-930V.90
9164
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Cornwall
613-930V.90
9164
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Guelph
519-822V.90
8580
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Guelph
519-822V.90
8580
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Hamilton
905-521V.90
2211
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Hamilton
905-521V.90
2211
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Kingston
613-547V.90
5479
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Kingston
613-547V.90
5479
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Kitchener 519-884V.90
Waterloo 0239
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Kitchener 519-884V.90
Waterloo 0239
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Kitchener_Waterloo (1 of 2) [22/07/2003 6:05:59 PM]
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
519-660V.90
4600
519-858London
V.90
8451
London
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
London
519-858V.90
8451
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Newmarket
905-953V.90
0217
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Newmarket
905-953V.90
0217
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City Number Protocol Channels Network
North 705-495V.90
Bay
3117
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
North
Bay
705-495V.90
3117
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=North_Bay (1 of 2) [22/07/2003 6:06:12 PM]
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Oshawa
905-404V.90
6515
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Oshawa
905-404V.90
6515
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Ottawa
613-594V.90
9044
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Ottawa
613-594V.90
9044
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City Number Protocol Channels Network
Owen 519-371V.90
Sound 6472
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Owen 519-371V.90
Sound
6472
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Owen_Sound (1 of 2) [22/07/2003 6:06:31 PM]
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Pembroke
613-732V.90
5637
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Pembroke
613-732V.90
5637
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Peterborough
705-740V.90
9692
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Peterborough
705-740V.90
9692
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City Number Protocol Channels Network
Sarnia
519-337V.90
2567
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Sarnia
519-337V.90
2567
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City Number Protocol Channels Network
Sault
705-253Ste
V.90
3280
Marie
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Sault
Ste
Marie
705-253V.90
3280
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
St
905-685V.90
Catharines 1021
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
St
905-685V.90
Catharines 1021
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=St_Catharines (1 of 2) [22/07/2003 6:06:53 PM]
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Sudbury
705-673V.90
1028
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Sudbury
705-673V.90
1028
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Thunder 807-624V.90
Bay
0400
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Thunder 807-624V.90
Bay
0400
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Thunder_Bay (1 of 2) [22/07/2003 6:07:00 PM]
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
416-603V.90
4955
416-368Toronto
V.90
2622
416-572Toronto
V.90
4911
Toronto
Monthly
User Realm
Hours
1
1
Dial Plan W
City
416-368V.90
2622
416-572Toronto
V.90
4911
Toronto
1
1
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
http://www.eskimo.com/cgi-bin/dialsearch?s=ON&c=Toronto (1 of 2) [22/07/2003 6:07:05 PM]
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Windsor
519-973V.90
0134
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Windsor
519-973V.90
0134
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
418-5457095
450-372Granby
9814
450-755Joliette
5079
514-866Montreal
5278
Quebec
418-647City
9794
819-346Sherbrooke
4029
St
450-771Hyacinthe 1693
450-436St Jerome
4037
Trois
819-691Rivieres
4276
450-373Valleyfield
7398
Chicoutimi
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
http://www.eskimo.com/services/quebec.html (1 of 3) [22/07/2003 6:07:16 PM]
City
418-5457095
450-372Granby
9814
450-755Joliette
5079
514-866Montreal
5278
Quebec
418-647City
9794
819-346Sherbrooke
4029
St
450-771Hyacinthe 1693
450-436St Jerome
4037
Trois
819-691Rivieres
4276
450-373Valleyfield
7398
Chicoutimi
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
V.90
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
Worldcom150/month ...@usb.com
Canada
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Chicoutimi
418-545V.90
7095
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Chicoutimi
418-545V.90
7095
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Granby
450-372V.90
9814
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Granby
450-372V.90
9814
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows
Dial Plan L
City
Joliette
450-755V.90
5079
Monthly
User Realm
Hours
Worldcom150/month ...@eskimo.com
Canada
Dial Plan W
City
Joliette
450-755V.90
5079
Monthly
User Realm
Hours
Worldcom150/month ...@usb.com
Canada
Search Again?
States/Provinces
Numbers
Cities
States/Provinces use two-letter abbreviations (WA, BC, et al.);
Numbers and Cities will search for entries that include your query
Start Search
Clear Form
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
linux 56k, linux isdn, windows 56k, windows isdn, mac 56k, mac isdn, os2 56k, os2 isdn.
Contact Information
Telephone: (206) 812-0051
Toll-Free: (800) 246-6874
Facsimile: (206) 812-0054
E-Mail: support@eskimo.com
Snail Mail: Eskimo North, P.O. Box
55816, Shoreline, WA 98155-0816.
Setup Information
Linux
Mac
OS2
Windows