Advanced Linux Programming
Advanced Linux Programming
Contents At a Glance
I Advanced UNIX Programming
with Linux
Linux Software 17
3 Processes 45
Programming 4 Threads 61
5 Interprocess Communication 95
II Mastering Linux
6 Devices 129
7 The /proc File System 147
8 Linux System Calls 167
9 Inline Assembly Code 189
10 Security 197
11 A Sample GNU/Linux
Application 219
III Appendixes
A Other Development Tools 259
B Low-Level I/O 281
C Table of Signals 301
D Online Resources 303
E Open Publication License
Version 1.0 305
F GNU General Public License 309
00 0430 FM 5/22/01 2:32 PM Page ii
00 0430 FM 5/22/01 2:32 PM Page iii
Advanced Linux
Programming
Mark Mitchell, Jeffrey Oldham,
and Alex Samuel
www.newriders.com
201 West 103rd Street, Indianapolis, Indiana 46290
An Imprint of Pearson Education
Boston • Indianapolis • London • Munich • New York • San Francisco
00 0430 FM 5/22/01 2:32 PM Page iv
Table of Contents
1 Getting Started 3
1.1 Editing with Emacs 4
1.2 Compiling with GCC 6
1.3 Automating the Process with GNU
Make 9
1.4 Debugging with GNU Debugger
(GDB) 11
1.5 Finding More Information 13
3 Processes 45
3.1 Looking at Processes 45
3.2 Creating Processes 48
3.3 Signals 52
3.4 Process Termination 55
4 Threads 61
4.1 Thread Creation 62
4.2 Thread Cancellation 69
4.3 Thread-Specific Data 72
4.4 Synchronization and Critical Sections 77
4.5 GNU/Linux Thread Implementation 92
4.6 Processes Vs.Threads 94
00 0430 FM 5/22/01 2:32 PM Page vii
Contents vii
5 Interprocess Communication 95
5.1 Shared Memory 96
5.2 Processes Semaphores 101
5.3 Mapped Memory 105
5.4 Pipes 110
5.5 Sockets 116
6 Devices 129
6.1 Device Types 130
6.2 Device Numbers 130
6.3 Device Entries 131
6.4 Hardware Devices 133
6.5 Special Devices 136
6.6 PTYs 142
6.7 ioctl 144
viii Contents
10 Security 197
10.1 Users and Groups 198
10.2 Process User IDs and Process
Group IDs 199
10.3 File System Permissions 200
10.4 Real and Effective IDs 205
10.5 Authenticating Users 208
10.6 More Security Holes 211
11 A Sample GNU/Linux
Application 219
11.1 Overview 219
11.2 Implementation 221
11.3 Modules 239
11.4 Using the Server 252
11.5 Finishing Up 255
00 0430 FM 5/22/01 3:18 PM Page ix
Contents ix
x Contents
Index 317
00 0430 FM 5/22/01 2:32 PM Page xi
00 0430 FM 5/22/01 2:32 PM Page xii
Program Listings xv
xvi
00 0430 FM 5/22/01 2:32 PM Page xvii
xvii
00 0430 FM 5/22/01 2:32 PM Page xviii
Acknowledgments
We greatly appreciate the pioneering work of Richard Stallman, without whom
there would never have been the GNU Project, and of Linus Torvalds, without
whom there would never have been the Linux kernel. Countless others have worked
on parts of the GNU/Linux operating system, and we thank them all.
We thank the faculties of Harvard and Rice for our undergraduate educations, and
Caltech and Stanford for our graduate training. Without all who taught us, we would
never have dared to teach others!
W. Richard Stevens wrote three excellent books on UNIX programming, and we have
consulted them extensively. Roland McGrath, Ulrich Drepper, and many others wrote
the GNU C library and its outstanding documentation.
Robert Brazile and Sam Kendall reviewed early outlines of this book and made won-
derful suggestions about tone and content. Our technical editors and reviewers (espe-
cially Glenn Becker and John Dean) pointed out errors, made suggestions, and provided
continuous encouragement. Of course, any errors that remain are no fault of theirs!
Thanks to Ann Quinn, of New Riders, for handling all the details involved in publish-
ing a book; Laura Loveall, also of New Riders, for not letting us fall too far behind on
our deadlines; and Stephanie Wall, also of New Riders, for encouraging us to write
this book in the first place!
xviii
00 0430 FM 5/22/01 2:32 PM Page xix
xix
00 0430 FM 5/22/01 2:32 PM Page xx
Introduction
GNU/Linux has taken the world of computers by storm. At one time, personal com-
puter users were forced to choose among proprietary operating environments and
applications. Users had no way of fixing or improving these programs, could not look
“under the hood,” and were often forced to accept restrictive licenses. GNU/Linux
and other open source systems have changed that—now PC users, administrators, and
developers can choose a free operating environment complete with tools, applications,
and full source code.
A great deal of the success of GNU/Linux is owed to its open source nature.
Because the source code for programs is publicly available, everyone can take part in
development, whether by fixing a small bug or by developing and distributing a com-
plete major application.This opportunity has enticed thousands of capable developers
worldwide to contribute new components and improvements to GNU/Linux, to the
point that modern GNU/Linux systems rival the features of any proprietary system,
and distributions include thousands of programs and applications spanning many CD-
ROMs or DVDs.
The success of GNU/Linux has also validated much of the UNIX philosophy.
Many of the application programming interfaces (APIs) introduced in AT&T and BSD
UNIX variants survive in Linux and form the foundation on which programs are
built.The UNIX philosophy of many small command line-oriented programs working
together is the organizational principle that makes GNU/Linux so powerful. Even
when these programs are wrapped in easy-to-use graphical user interfaces, the under-
lying commands are still available for power users and automated scripts.
A powerful GNU/Linux application harnesses the power of these APIs and com-
mands in its inner workings. GNU/Linux’s APIs provide access to sophisticated fea-
tures such as interprocess communication, multithreading, and high-performance
networking. And many problems can be solved simply by assembling existing com-
mands and programs using simple scripts.
xx
00 0430 FM 5/22/01 2:32 PM Page xxi
The kernel by itself doesn’t provide features that are useful to users. It can’t even
provide a simple prompt for users to enter basic commands. It provides no way for
users to manage or edit files, communicate with other computers, or write other pro-
grams.These tasks require the use of a wide array of other programs, including com-
mand shells, file utilities, editors, and compilers. Many of these programs, in turn, use
libraries of general-purpose functions, such as the library containing standard C library
functions, which are not included in the kernel.
On GNU/Linux systems, many of these other programs and libraries are software
developed as part of the GNU Project.1 A great deal of this software predates the
Linux kernel.The aim of the GNU Project is “to develop a complete UNIX-like
operating system which is free software” (from the GNU Project Web site,
http://www.gnu.org).
The Linux kernel and software from the GNU Project has proven to be a powerful
combination. Although the combination is often called “Linux” for short, the complete
system couldn’t work without GNU software, any more than it could operate without
the kernel. For this reason, throughout this book we’ll refer to the complete system as
GNU/Linux, except when we are specifically talking about the Linux kernel.
xxi
00 0430 FM 5/22/01 2:32 PM Page xxii
xxii
00 0430 FM 5/22/01 2:32 PM Page xxiii
Conventions
This book follows a few typographical conventions:
n A new term is set in italics the first time it is introduced.
n Program text, functions, variables, and other “computer language” are set in a
fixed-pitch font—for example, printf (“Hello, world!\bksl n”).
n Names of commands, files, and directories are also set in a fixed-pitch font—for
example, cd /.
n When we show interactions with a command shell, we use % as the shell prompt
(your shell is probably configured to use a different prompt). Everything after
the prompt is what you type, while other lines of text are the system’s response.
For example, in this interaction
% uname
Linux
the system prompted you with %.You entered the uname command.The system
responded by printing Linux.
n The title of each source code listing includes a filename in parentheses. If you
type in the listing, save it to a file by this name.You can also download the
source code listings from the Advanced Linux Programming Web site
(http://www.newriders.com or http://www.advancedlinuxprogramming.com).
We wrote this book and developed the programs listed in it using the Red Hat 6.2
distribution of GNU/Linux.This distribution incorporates release 2.2.14 of the Linux
kernel, release 2.1.3 of the GNU C library, and the EGCS 1.1.2 release of the GNU
C compiler.The information and programs in this book should generally be applicable
to other versions and distributions of GNU/Linux as well, including 2.4 releases of
the Linux kernel and 2.2 releases of the GNU C library.
xxiii
00 0430 FM 5/22/01 2:32 PM Page xxiv
01 0430 PT01 5/22/01 10:09 AM Page 1
I
Advanced UNIX Programming
with Linux
1 Getting Started
2 Writing Good GNU/Linux Software
3 Processes
4 Threads
5 Interprocess Communication
01 0430 PT01 5/22/01 10:09 AM Page 2
02 0430 CH01 5/22/01 10:19 AM Page 3
1
Getting Started
T HIS CHAPTER SHOWS YOU HOW TO PERFORM THE BASIC steps required to create a
C or C++ Linux program. In particular, this chapter shows you how to create and
modify C and C++ source code, compile that code, and debug the result. If you’re
already accustomed to programming under Linux, you can skip ahead to Chapter 2,
“Writing Good GNU/Linux Software;” pay careful attention to Section 2.3, “Writing
and Using Libraries,” for information about static versus dynamic linking that you
might not already know.
Throughout this book, we’ll assume that you’re familiar with the C or C++ pro-
gramming languages and the most common functions in the standard C library.The
source code examples in this book are in C, except when demonstrating a particular
feature or complication of C++ programming.We also assume that you know how to
perform basic operations in the Linux command shell, such as creating directories and
copying files. Because many Linux programmers got started programming in the
Windows environment, we’ll occasionally point out similarities and contrasts between
Windows and Linux.
02 0430 CH01 5/22/01 10:19 AM Page 4
About Emacs
Emacs is much more than an editor. It is an incredibly powerful program, so much so that at
CodeSourcery, it is affectionately known as the One True Program, or just the OTP for short. You can read
and send email from within Emacs, and you can customize and extend Emacs in ways far too numerous
to discuss here. You can even browse the Web from within Emacs!
If you’re familiar with another editor, you can certainly use it instead. Nothing in the
rest of this book depends on using Emacs. If you don’t already have a favorite Linux
editor, then you should follow along with the mini-tutorial given here.
If you like Emacs and want to learn about its advanced features, you might consider
reading one of the many Emacs books available. One excellent tutorial, Learning
GNU Emacs, is written by Debra Cameron, Bill Rosenblatt, and Eric S. Raymond
(O’Reilly, 1996).
1. If you’re not running in an X Window system, you’ll have to press F10 to access the
menus.
02 0430 CH01 5/22/01 10:19 AM Page 5
If you press the Tab key on the line with the call to printf, Emacs will reformat your
code to look like this:
int main ()
{
printf (“Hello, world\n”);
}
Save the file, exit Emacs, and restart. Now open a C or C++ source file and enjoy!
You might have noticed that the string you inserted into your .emacs looks like
code from the LISP programming language.That’s because it is LISP code! Much of
Emacs is actually written in LISP. You can add functionality to Emacs by writing more
LISP code.
2.Try running the command M-x dunnet if you want to play an old-fashioned text
adventure game.
02 0430 CH01 5/22/01 10:19 AM Page 6
#include <stdio.h>
#include “reciprocal.hpp”
i = atoi (argv[1]);
printf (“The reciprocal of %d is %g\n”, i, reciprocal (i));
return 0;
}
#include <cassert>
#include “reciprocal.hpp”
4. In Windows, executables usually have names that end in .exe. Linux programs, on the
other hand, usually have no extension. So, the Windows equivalent of this program would
probably be called reciprocal.exe; the Linux version is just plain reciprocal.
02 0430 CH01 5/22/01 10:19 AM Page 7
There’s also one header file called reciprocal.hpp (see Listing 1.3).
#ifdef __cplusplus
extern “C” {
#endif
#ifdef __cplusplus
}
#endif
The first step is to turn the C and C++ source code into object code.
The -c option tells g++ to compile the program to an object file only; without it, g++
will attempt to link the program to produce an executable. After you’ve typed this
command, you’ll have an object file called reciprocal.o.
You’ll probably need a couple other options to build any reasonably large program.
The -I option is used to tell GCC where to search for header files. By default, GCC
looks in the current directory and in the directories where headers for the standard
libraries are installed. If you need to include header files from somewhere else, you’ll
need the -I option. For example, suppose that your project has one directory called
src, for source files, and another called include.You would compile reciprocal.cpp
like this to indicate that g++ should use the ../include directory in addition to find
reciprocal.hpp:
% g++ -c -I ../include reciprocal.cpp
02 0430 CH01 5/22/01 10:19 AM Page 8
Sometimes you’ll want to define macros on the command line. For example, in
production code, you don’t want the overhead of the assertion check present in
reciprocal.cpp; that’s only there to help you debug the program.You turn off
the check by defining the macro NDEBUG. You could add an explicit #define to
reciprocal.cpp, but that would require changing the source itself. It’s easier to
simply define NDEBUG on the command line, like this:
% g++ -c -D NDEBUG reciprocal.cpp
If you had wanted to define NDEBUG to some particular value, you could have done
something like this:
% g++ -c -D NDEBUG=3 reciprocal.cpp
If you’re really building production code, you probably want to have GCC optimize
the code so that it runs as quickly as possible.You can do this by using the -O2
command-line option. (GCC has several different levels of optimization; the second
level is appropriate for most programs.) For example, the following compiles
reciprocal.cpp with optimization turned on:
% g++ -c -O2 reciprocal.cpp
Note that compiling with optimization can make your program more difficult to
debug with a debugger (see Section 1.4, “Debugging with GDB”). Also, in certain
instances, compiling with optimization can uncover bugs in your program that did not
manifest themselves previously.
You can pass lots of other options to gcc and g++.The best way to get a complete
list is to view the online documentation.You can do this by typing the following at
your command prompt:
% info gcc
The -o option gives the name of the file to generate as output from the link step.
Now you can run reciprocal like this:
% ./reciprocal 7
The reciprocal of 7 is 0.142857
As you can see, g++ has automatically linked in the standard C runtime library con-
taining the implementation of printf. If you had needed to link in another library
(such as a graphical user interface toolkit), you would have specified the library with
02 0430 CH01 5/22/01 10:19 AM Page 9
the -l option. In Linux, library names almost always start with lib. For example,
the Pluggable Authentication Module (PAM) library is called libpam.a.To link in
libpam.a, you use a command like this:
% g++ -o reciprocal main.o reciprocal.o -lpam
The compiler automatically adds the lib prefix and the .a suffix.
As with header files, the linker looks for libraries in some standard places, including
the /lib and /usr/lib directories that contain the standard system libraries. If you
want the linker to search other directories as well, you should use the -L option,
which is the parallel of the -I option discussed earlier. You can use this line to instruct
the linker to look for libraries in the /usr/local/lib/pam directory before looking in
the usual places:
% g++ -o reciprocal main.o reciprocal.o -L/usr/local/lib/pam -lpam
Although you don’t have to use the -I option to get the preprocessor to search the
current directory, you do have to use the -L option to get the linker to search the
current directory. In particular, you could use the following to instruct the linker to
find the test library in the current directory:
% gcc -o app app.o -L. -ltest
You can convey all that information to make by putting the information in a file
named Makefile. Here’s what Makefile contains:
reciprocal: main.o reciprocal.o
g++ $(CFLAGS) -o reciprocal main.o reciprocal.o
clean:
rm -f *.o reciprocal
You can see that targets are listed on the left, followed by a colon and then any depen-
dencies.The rule to build that target is on the next line. (Ignore the $(CFLAGS) bit
for the moment.) The line with the rule on it must start with a Tab character, or make
will get confused. If you edit your Makefile in Emacs, Emacs will help you with the
formatting.
If you remove the object files that you’ve already built, and just type
% make
You can see that make has automatically built the object files and then linked them.
If you now change main.c in some trivial way and type make again, you’ll see the
following:
% make
gcc -c main.c
g++ -o reciprocal main.o reciprocal.o
You can see that make knew to rebuild main.o and to re-link the program, but it
didn’t bother to recompile reciprocal.cpp because none of the dependencies for
reciprocal.o had changed.
The $(CFLAGS) is a make variable.You can define this variable either in the
Makefile itself or on the command line. GNU make will substitute the value of the
variable when it executes the rule. So, for example, to recompile with optimization
enabled, you would do this:
% make clean
rm -f *.o reciprocal
% make CFLAGS=-O2
gcc -O2 -c main.c
g++ -O2 -c reciprocal.cpp
g++ -O2 -o reciprocal main.o reciprocal.o
02 0430 CH01 5/22/01 10:19 AM Page 11
Note that the -O2 flag was inserted in place of $(CFLAGS) in the rules.
In this section, you’ve seen only the most basic capabilities of make.You can find
out more by typing this:
% info make
In that manual, you’ll find information about how to make maintaining a Makefile
easier, how to reduce the number of rules that you need to write, and how to auto-
matically compute dependencies.You can also find more information in GNU,
Autoconf, Automake, and Libtool by Gary V.Vaughan, Ben Elliston,Tom Tromey, and
Ian Lance Taylor (New Riders Publishing, 2000).
When you compile with -g, the compiler includes extra information in the object files
and executables.The debugger uses this information to figure out which addresses cor-
respond to which lines in which source files, how to print out local variables, and so
forth.
When gdb starts up, you should see the GDB prompt:
(gdb)
The first step is to run your program inside the debugger. Just enter the command run
and any program arguments.Try running the program without any arguments, like
this:
(gdb) run
Starting program: reciprocal
You can see from this display that main called the atoi function with a NULL pointer,
which is the source of the trouble.
You can go up two levels in the stack until you reach main by using the up
command:
(gdb) up 2
#2 0x804863e in main (argc=1, argv=0xbffff5e4) at main.c:8
8 i = atoi (argv[1]);
Note that gdb is capable of finding the source for main.c, and it shows the line where
the erroneous function call occurred.You can view the value of variables using the
print command:
That confirms that the problem is indeed a NULL pointer passed into atoi.
You can set a breakpoint by using the break command:
(gdb) break main
Breakpoint 1 at 0x804862e: file main.c, line 8.
02 0430 CH01 5/22/01 10:19 AM Page 13
This command sets a breakpoint on the first line of main.6 Now try rerunning the
program with an argument, like this:
(gdb) run 7
Starting program: reciprocal 7
You can see that the debugger has stopped at the breakpoint.
You can step over the call to atoi using the next command:
(gdb) next
9 printf (“The reciprocal of %d is %g\n”, i, reciprocal (i));
If you want to see what’s going on inside reciprocal, use the step command like this:
(gdb) step
reciprocal (i=7) at reciprocal.cpp:6
6 assert (i != 0);
6. Some people have commented that saying break main is a little bit funny because
usually you want to do this only when main is already broken.
02 0430 CH01 5/22/01 10:19 AM Page 14
To see the man page for the sleep library function, use this command:
% man 3 sleep
1.5.2 Info
The Info documentation system contains more detailed documentation for many core
components of the GNU/Linux system, plus several other programs. Info pages are
hypertext documents, similar to Web pages.To launch the text-based Info browser, just
type info in a shell window.You’ll be presented with a menu of Info documents
installed on your system. (Press Control+H to display the keys for navigating an Info
document.)
Among the most useful Info documents are these:
n gcc—The gcc compiler
n libc—The GNU C library, including many system calls
n gdb—The GNU debugger
02 0430 CH01 5/22/01 10:19 AM Page 15
Almost all the standard Linux programming tools (including ld, the linker; as, the
assembler; and gprof, the profiler) come with useful Info pages.You can jump directly
to a particular Info document by specifying the page name on the command line:
% info libc
If you do most of your programming in Emacs, you can access the built-in Info
browser by typing M-x info or C-h i.
2
Writing Good GNU/Linux
Software
T HIS CHAPTER COVERS SOME BASIC TECHNIQUES THAT MOST GNU/Linux program-
mers use. By following the guidelines presented, you’ll be able to write programs that
work well within the GNU/Linux environment and meet GNU/Linux users’ expec-
tations of how programs should operate.
The argument list that the ls program receives has three elements.The first one is the
name of the program itself, as specified on the command line, namely ls.The second
and third elements of the argument list are the two command-line arguments, -s and /.
The main function of your program can access the argument list via the argc and
argv parameters to main (if you don’t use them, you may simply omit them).The first
parameter, argc, is an integer that is set to the number of items in the argument list.
The second parameter, argv, is an array of character pointers.The size of the array is
argc, and the array elements point to the elements of the argument list, as NUL-
terminated character strings.
Using command-line arguments is as easy as examining the contents of argc and
argv. If you’re not interested in the name of the program itself, don’t forget to skip the
first element.
Listing 2.1 demonstrates how to use argc and argv.
return 0;
}
You invoke the getopt_long function, passing it the argc and argv arguments to main,
the character string describing short options, and the array of struct option elements
describing the long options.
n Each time you call getopt_long, it parses a single option, returning the short-
option letter for that option, or –1 if no more options are found.
n Typically, you’ll call getopt_long in a loop, to process all the options the user has
specified, and you’ll handle the specific options in a switch statement.
03 0430 CH02 5/22/01 10:20 AM Page 21
n If getopt_long encounters an invalid option (an option that you didn’t specify as
a valid short or long option), it prints an error message and returns the character
? (a question mark). Most programs will exit in response to this, possibly after
displaying usage information.
n When handling an option that takes an argument, the global variable optarg
points to the text of that argument.
n After getopt_long has finished parsing all the options, the global variable optind
contains the index (into argv) of the first nonoption argument.
Listing 2.2 shows an example of how you might use getopt_long to process your
arguments.
continues
03 0430 CH02 5/22/01 10:20 AM Page 22
do {
next_option = getopt_long (argc, argv, short_options,
long_options, NULL);
switch (next_option)
{
case ‘h’: /* -h or --help */
/* User has requested usage information. Print it to standard
output, and exit with exit code zero (normal termination). */
print_usage (stdout, 0);
if (verbose) {
int i;
for (i = optind; i < argc; ++i)
printf (“Argument: %s\n”, argv[i]);
}
return 0;
}
Using getopt_long may seem like a lot of work, but writing code to parse the
command-line options yourself would take even longer.The getopt_long function is
very sophisticated and allows great flexibility in specifying what kind of options to
accept. However, it’s a good idea to stay away from the more advanced features and
stick with the basic option structure described.
These three streams are also accessible with the underlying UNIX I/O commands
(read, write, and so on) via file descriptors.These are file descriptors 0 for stdin, 1 for
stdout, and 2 for stderr.
When invoking a program, it is sometimes useful to redirect both standard output
and standard error to a file or pipe.The syntax for doing this varies among shells; for
Bourne-style shells (including bash, the default shell on most GNU/Linux distribu-
tions), the syntax is this:
% program > output_file.txt 2>&1
% program 2>&1 | filter
The 2>&1 syntax indicates that file descriptor 2 (stderr) should be merged into
file descriptor 1 (stdout). Note that 2>&1 must follow a file redirection (the first exam-
ple) but must precede a pipe redirection (the second example).
03 0430 CH02 5/22/01 10:20 AM Page 24
Note that stdout is buffered. Data written to stdout is not sent to the console
(or other device, if it’s redirected) until the buffer fills, the program exits normally, or
stdout is closed.You can explicitly flush the buffer by calling the following:
fflush (stdout);
In contrast, stderr is not buffered; data written to stderr goes directly to the console.1
This can produce some surprising results. For example, this loop does not print one
period every second; instead, the periods are buffered, and a bunch of them are printed
together when the buffer fills.
while (1) {
printf (“.”);
sleep (1);
}
1. In C++, the same distinction holds for cout and cerr, respectively. Note that the endl
token flushes a stream in addition to printing a newline character; if you don’t want to flush the
stream (for performance reasons, for example), use a newline constant, ‘\n’, instead.
03 0430 CH02 5/22/01 10:20 AM Page 25
A C or C++ program specifies its exit code by returning that value from the main
function.There are other methods of providing exit codes, and special exit codes
are assigned to programs that terminate abnormally (by a signal).These are discussed
further in Chapter 3.
n You can use the export command to export a shell variable into the environ-
ment. For example, to set the EDITOR environment variable, you would use this:
% EDITOR=emacs
% export EDITOR
int main ()
{
char** var;
for (var = environ; *var != NULL; ++var)
printf (“%s\n”, *var);
return 0;
}
Don’t modify environ yourself; use the setenv and unsetenv functions instead.
Usually, when a new program is started, it inherits a copy of the environment of
the program that invoked it (the shell program, if it was invoked interactively). So, for
instance, programs that you run from the shell may examine the values of environment
variables that you set in the shell.
Environment variables are commonly used to communicate configuration informa-
tion to programs. Suppose, for example, that you are writing a program that connects to
an Internet server to obtain some information.You could write the program so that the
server name is specified on the command line. However, suppose that the server name
is not something that users will change very often.You can use a special environment
variable—say SERVER_NAME—to specify the server name; if that variable doesn’t exist, a
default value is used. Part of your program might look as shown in Listing 2.4.
int main ()
{
03 0430 CH02 5/22/01 10:20 AM Page 27
return 0;
}
Suppose that this program is named client. Assuming that you haven’t set the
SERVER_NAME variable, the default value for the server name is used:
% client
accessing server server.my-company.com
Using mkstemp
The mkstemp function creates a unique temporary filename from a filename template,
creates the file with permissions so that only the current user can access it, and opens
the file for read/write.The filename template is a character string ending with
“XXXXXX” (six capital X’s); mkstemp replaces the X’s with characters so that the file-
name is unique.The return value is a file descriptor; use the write family of functions
to write to the temporary file.
Temporary files created with mkstemp are not deleted automatically. It’s up to you
to remove the temporary file when it’s no longer needed. (Programmers should be
very careful to clean up temporary files; otherwise, the /tmp file system will fill up
eventually, rendering the system inoperable.) If the temporary file is for internal use
only and won’t be handed to another program, it’s a good idea to call unlink on the
temporary file immediately.The unlink function removes the directory entry corre-
sponding to a file, but because files in a file system are reference-counted, the file itself
is not removed until there are no open file descriptors for that file, either.This way,
your program may continue to use the temporary file, and the file goes away automat-
ically as soon as you close the file descriptor. Because Linux closes file descriptors
when a program ends, the temporary file will be removed even if your program termi-
nates abnormally.
The pair of functions in Listing 2.5 demonstrates mkstemp. Used together, these
functions make it easy to write a memory buffer to a temporary file (so that memory
can be freed or reused) and then read it back later.
Using tmpfile
If you are using the C library I/O functions and don’t need to pass the temporary file
to another program, you can use the tmpfile function.This creates and opens a tem-
porary file, and returns a file pointer to it.The temporary file is already unlinked, as in
the previous example, so it is deleted automatically when the file pointer is closed
(with fclose) or when the program terminates.
GNU/Linux provides several other functions for generating temporary files and tem-
porary filenames, including mktemp, tmpnam, and tempnam. Don’t use these functions,
though, because they suffer from the reliability and security problems already mentioned.
03 0430 CH02 5/22/01 10:20 AM Page 30
Suppose, for example, that you call a function, do_something, repeatedly in a loop.
The do_something function returns zero on success and nonzero on failure, but you
don’t expect it ever to fail in your program.You might be tempted to write:
for (i = 0; i < 100; ++i)
assert (do_something () == 0);
However, you might find that this runtime check imposes too large a performance
penalty and decide later to recompile with NDEBUG defined.This will remove the
assert call entirely, so the expression will never be evaluated and do_something will
never be called.You should write this instead:
for (i = 0; i < 100; ++i) {
int status = do_something ();
assert (status == 0);
}
Another thing to bear in mind is that you should not use assert to test for invalid
user input. Users don’t like it when applications simply crash with a cryptic error mes-
sage, even in response to invalid input.You should still always check for invalid input
and produce sensible error messages in response input. Use assert for internal run-
time checks only.
Some good places to use assert are these:
n Check against null pointers, for instance, as invalid function arguments.The error
message generated by {assert (pointer != NULL)},
Assertion ‘pointer != ((void *)0)’ failed.
is more informative than the error message that would result if your program
dereferenced a null pointer:
Segmentation fault (core dumped)
This will help you detect misuses of the function, and it also makes it very clear
to someone reading the function’s source code that there is a restriction on the
parameter’s value.
Don’t hold back; use assert liberally throughout your programs.
03 0430 CH02 5/22/01 10:20 AM Page 32
Depending on your program and the nature of the system call, the appropriate action
in case of failure might be to print an error message, to cancel an operation, to abort
the program, to try again, or even to ignore the error. It’s important, though, to
include logic that handles all possible failure modes in some way or another.
2. Actually, for reasons of thread safety, errno is implemented as a macro, but it is used like a
global variable.
03 0430 CH02 5/22/01 10:20 AM Page 34
One possible error code that you should be on the watch for, especially with I/O
functions, is EINTR. Some functions, such as read, select, and sleep, can take signifi-
cant time to execute.These are considered blocking functions because program execu-
tion is blocked until the call is completed. However, if the program receives a signal
while blocked in one of these calls, the call will return without completing the opera-
tion. In this case, errno is set to EINTR. Usually, you’ll want to retry the system call in
this case.
Here’s a code fragment that uses the chown call to change the owner of a file given
by path to the user by user_id. If the call fails, the program takes action depending on
the value of errno. Notice that when we detect what’s probably a bug in the program,
we exit using abort or assert, which cause a core file to be generated.This can be
useful for post-mortem debugging. For other unrecoverable errors, such as out-of-
memory conditions, we exit using exit and a nonzero exit value instead because a
core file wouldn’t be very useful.
rval = chown (path, user_id, -1);
if (rval != 0) {
/* Save errno because it’s clobbered by the next system call. */
int error_code = errno;
/* The operation didn’t succeed; chown should return -1 on error. */
assert (rval == -1);
/* Check the value of errno, and take appropriate action. */
switch (error_code) {
case EPERM: /* Permission denied. */
case EROFS: /* PATH is on a read-only file system. */
case ENAMETOOLONG: /* PATH is too long. */
case ENOENT: /* PATH does not exit. */
case ENOTDIR: /* A component of PATH is not a directory. */
case EACCES: /* A component of PATH is not accessible. */
/* Something’s wrong with the file. Print an error message. */
fprintf (stderr, “error changing ownership of %s: %s\n”,
path, strerror (error_code));
/* Don’t end the program; perhaps give the user a chance to
choose another file... */
break;
case EFAULT:
/* PATH contains an invalid memory address. This is probably a bug. */
abort ();
case ENOMEM:
/* Ran out of kernel memory. */
fprintf (stderr, “%s\n”, strerror (error_code));
exit (1);
default:
/* Some other, unexpected, error code. We’ve tried to handle all
possible error codes; if we’ve missed one, that’s a bug! */
abort ();
};
}
03 0430 CH02 5/22/01 10:20 AM Page 35
You could simply have used this code, which behaves the same way if the call succeeds:
rval = chown (path, user_id, -1);
assert (rval == 0);
But if the call fails, this alternative makes no effort to report, handle, or recover from
errors.
Whether you use the first form, the second form, or something in between
depends on the error detection and recovery requirements for your program.
Linux cleans up allocated memory, open files, and most other resources when a pro-
gram terminates, so it’s not necessary to deallocate buffers and close files before calling
exit.You might need to manually free other shared resources, however, such as tempo-
rary files and shared memory, which can potentially outlive a program.
smaller, easier to upgrade, but harder to deploy.This section explains how to link both
statically and dynamically, examines the trade-offs in more detail, and gives some “rules
of thumb” for deciding which kind of linking is better for you.
2.3.1 Archives
An archive (or static library) is simply a collection of object files stored as a single file.
(An archive is roughly the equivalent of a Windows .LIB file.) When you provide an
archive to the linker, the linker searches the archive for the object files it needs,
extracts them, and links them into your program much as if you had provided those
object files directly.
You can create an archive using the ar command. Archive files traditionally use a .a
extension rather than the .o extension used by ordinary object files. Here’s how you
would combine test1.o and test2.o into a single libtest.a archive:
% ar cr libtest.a test1.o test2.o
The cr flags tell ar to create the archive.3 Now you can link with this archive using
the -ltest option with gcc or g++, as described in Section 1.2.2, “Linking Object
Files,” in Chapter 1, “Getting Started.”
When the linker encounters an archive on the command line, it searches the
archive for all definitions of symbols (functions or variables) that are referenced from
the object files that it has already processed but not yet defined.The object files that
define those symbols are extracted from the archive and included in the final exe-
cutable. Because the linker searches the archive when it is encountered on the com-
mand line, it usually makes sense to put archives at the end of the command line. For
example, suppose that test.c contains the code in Listing 2.7 and app.c contains the
code in Listing 2.8.
3.You can use other flags to remove a file from an archive or to perform other operations on
the archive.These operations are rarely used but are documented on the ar man page.
03 0430 CH02 5/22/01 10:20 AM Page 38
Now suppose that test.o is combined with some other object files to produce the
libtest.a archive.The following command line will not work:
% gcc -o app -L. -ltest app.o
app.o: In function ‘main’:
app.o(.text+0x4): undefined reference to ‘f’
collect2: ld returned 1 exit status
The error message indicates that even though libtest.a contains a definition of f, the
linker did not find it.That’s because libtest.a was searched when it was first encoun-
tered, and at that point the linker hadn’t seen any references to f.
On the other hand, if we use this line, no error messages are issued:
% gcc -o app app.o -L. –ltest
The reason is that the reference to f in app.o causes the linker to include the test.o
object file from the libtest.a archive.
The -fPIC option tells the compiler that you are going to be using test.o as part of a
shared object.
Then you combine the object files into a shared library, like this:
% gcc -shared -fPIC -o libtest.so test1.o test2.o
The -shared option tells the linker to produce a shared library rather than an ordinary
executable. Shared libraries use the extension .so, which stands for shared object. Like
static archives, the name always begins with lib to indicate that the file is a library.
Linking with a shared library is just like linking with a static archive. For example,
the following line will link with libtest.so if it is in the current directory, or one of
the standard library search directories on the system:
% gcc -o app app.o -L. –ltest
Suppose that both libtest.a and libtest.so are available.Then the linker must
choose one of the libraries and not the other.The linker searches each directory (first
those specified with -L options, and then those in the standard directories).When the
linker finds a directory that contains either libtest.a or libtest.so, the linker stops
search directories. If only one of the two variants is present in the directory, the linker
chooses that variant. Otherwise, the linker chooses the shared library version, unless
you explicitly instruct it otherwise.You can use the -static option to demand static
archives. For example, the following line will use the libtest.a archive, even if the
libtest.so shared library is also available:
% gcc -static -o app app.o -L. –ltest
The ldd command displays the shared libraries that are linked into an executable.
These libraries need to be available when the executable is run. Note that ldd will list
an additional library called ld-linux.so, which is a part of GNU/Linux’s dynamic
linking mechanism.
Using LD_LIBRARY_PATH
When you link a program with a shared library, the linker does not put the full path
to the shared library in the resulting executable. Instead, it places only the name of the
shared library.When the program is actually run, the system searches for the shared
library and loads it.The system searches only /lib and /usr/lib, by default. If a shared
library that is linked into your program is installed outside those directories, it will not
be found, and the system will refuse to run the program.
One solution to this problem is to use the -Wl,-rpath option when linking the
program. Suppose that you use this:
% gcc -o app app.o -L. -ltest -Wl,-rpath,/usr/local/lib
Then, when app is run, the system will search /usr/local/lib for any required shared
libraries.
03 0430 CH02 5/22/01 10:20 AM Page 40
If you write a C++ program and link it using the c++ or g++ commands, you’ll also
get the standard C++ library, libstdc++, automatically.
4.You might see a reference to LD_RUN_PATH in some online documentation. Don’t believe
what you read; this variable does not actually do anything under GNU/Linux.
03 0430 CH02 5/22/01 10:20 AM Page 41
Save this source file as tifftest.c.To compile this program and link with libtiff,
specify -ltiff on your link line:
% gcc -o tifftest tifftest.c –ltiff
Static libraries, on the other hand, cannot point to other libraries. If decide to link
with the static version of libtiff by specifying -static on your command line, you
will encounter unresolved symbols:
% gcc -static -o tifftest tifftest.c -ltiff
/usr/bin/../lib/libtiff.a(tif_jpeg.o): In function ‘TIFFjpeg_error_exit’:
tif_jpeg.o(.text+0x2a): undefined reference to ‘jpeg_abort’
/usr/bin/../lib/libtiff.a(tif_jpeg.o): In function ‘TIFFjpeg_create_compress’:
tif_jpeg.o(.text+0x8d): undefined reference to ‘jpeg_std_error’
tif_jpeg.o(.text+0xcf): undefined reference to ‘jpeg_CreateCompress’
...
To link this program statically, you must specify the other two libraries yourself:
% gcc -static -o tifftest tifftest.c -ltiff -ljpeg -lz
Occasionally, two libraries will be mutually dependent. In other words, the first archive
will reference symbols defined in the second archive, and vice versa.This situation
generally arises out of poor design, but it does occasionally arise. In this case, you can
provide a single library multiple times on the command line.The linker will research
the library each time it occurs. For example, this line will cause libfoo.a to be
searched multiple times:
% gcc -o app app.o -lfoo -lbar –lfoo
So, even if libfoo.a references symbols in libbar.a, and vice versa, the program will
link successfully.
One major advantage of a shared library is that it saves space on the system where
the program is installed. If you are installing 10 programs, and they all make use of the
same shared library, then you save a lot of space by using a shared library. If you used a
static archive instead, the archive is included in all 10 programs. So, using shared
libraries saves disk space. It also reduces download times if your program is being
downloaded from the Web.
A related advantage to shared libraries is that users can upgrade the libraries with-
out upgrading all the programs that depend on them. For example, suppose that you
produce a shared library that manages HTTP connections. Many programs might
depend on this library. If you find a bug in this library, you can upgrade the library.
Instantly, all the programs that depend on the library will be fixed; you don’t have to
relink all the programs the way you do with a static archive.
Those advantages might make you think that you should always use shared
libraries. However, substantial reasons exist to use static archives instead.The fact that
an upgrade to a shared library affects all programs that depend on it can be a disadvan-
tage. For example, if you’re developing mission-critical software, you might rather link
to a static archive so that an upgrade to shared libraries on the system won’t affect
your program. (Otherwise, users might upgrade the shared library, thereby breaking
your program, and then call your customer support line, blaming you!)
If you’re not going to be able to install your libraries in /lib or /usr/lib, you
should definitely think twice about using a shared library. (You won’t be able to install
your libraries in those directories if you expect users to install your software without
administrator privileges.) In particular, the -Wl,-rpath trick won’t work if you don’t
know where the libraries are going to end up. And asking your users to set
LD_LIBRARY_PATH means an extra step for them. Because each user has to do this
individually, this is a substantial additional burden.
You’ll have to weigh these advantages and disadvantages for every program you
distribute.
(The second parameter is a flag that indicates how to bind symbols in the shared
library.You can consult the online man pages for dlopen if you want more informa-
tion, but RTLD_LAZY is usually the setting that you want.) To use dynamic loading func-
tions, include the <dlfcn.h> header file and link with the –ldl option to pick up the
libdl library.
The return value from this function is a void * that is used as a handle for the
shared library.You can pass this value to the dlsym function to obtain the address of a
function that has been loaded with the shared library. For example, if libtest.so
defines a function named my_function, you could call it like this:
void* handle = dlopen (“libtest.so”, RTLD_LAZY);
void (*test)() = dlsym (handle, “my_function”);
(*test)();
dlclose (handle);
The dlsym system call can also be used to obtain a pointer to a static variable in the
shared library.
Both dlopen and dlsym return NULL if they do not succeed. In that event, you
can call dlerror (with no parameters) to obtain a human-readable error message
describing the problem.
The dlclose function unloads the shared library.Technically, dlopen actually loads
the library only if it is not already loaded. If the library has already been loaded,
dlopen simply increments the library reference count. Similarly, dlclose decrements
the reference count and then unloads the library only if the reference count has
reached zero.
If you’re writing the code in your shared library in C++, you will probably want
to declare those functions and variables that you plan to access elsewhere with the
extern “C” linkage specifier. For instance, if the C++ function my_function is in a
shared library and you want to access it with dlsym, you should declare it like this:
extern “C” void foo ();
This prevents the C++ compiler from mangling the function name, which would
change the function’s name from foo to a different, funny-looking name that encodes
extra information about the function. A C compiler will not mangle names; it will use
whichever name you give to your function or variable.
03 0430 CH02 5/22/01 10:20 AM Page 44
04 0430 CH03 5/22/01 10:13 AM Page 45
3
Processes
46 Chapter 3 Processes
int main ()
{
printf (“The process ID is %d\n”, (int) getpid ());
printf (“The parent process ID is %d\n”, (int) getppid ());
return 0;
}
Observe that if you invoke this program several times, a different process ID is
reported because each invocation is in a new process. However, if you invoke it every
time from the same shell, the parent process ID (that is, the process ID of the shell
process) is the same.
This invocation of ps shows two processes.The first, bash, is the shell running on this
terminal.The second is the running instance of the ps program itself.The first col-
umn, labeled PID, displays the process ID of each.
For a more detailed look at what’s running on your GNU/Linux system, invoke
this:
% ps -e -o pid,ppid,command
ps Output Formats
With the -o option to the ps command, you specify the information about processes that you want in
the output as a comma-separated list. For example, ps -o pid,user,start_time,command displays
the process ID, the name of the user owning the process, the wall clock time at which the process
started, and the command running in the process. See the man page for ps for the full list of field codes.
You can use the -f (full listing), -l (long listing), or -j (jobs listing) options instead to get three differ-
ent preset listing formats.
Here are the first few lines and last few lines of output from this command on my
system.You may see different output, depending on what’s running on your system.
% ps -e -o pid,ppid,command
PID PPID COMMAND
1 0 init [5]
2 1 [kflushd]
3 1 [kupdate]
...
21725 21693 xterm
21727 21725 bash
21728 21727 ps -e -o pid,ppid,command
Note that the parent process ID of the ps command, 21727, is the process ID of bash,
the shell from which I invoked ps.The parent process ID of bash is in turn 21725, the
process ID of the xterm program in which the shell is running.
1.You can also use the kill command to send other signals to a process.This is described in
Section 3.4, “Process Termination.”
04 0430 CH03 5/22/01 10:13 AM Page 48
48 Chapter 3 Processes
int main ()
{
int return_value;
return_value = system (“ls -l /”);
return return_value;
}
The system function returns the exit status of the shell command. If the shell itself
cannot be run, system returns 127; if another error occurs, system returns –1.
Because the system function uses a shell to invoke your command, it’s subject to
the features, limitations, and security flaws of the system’s shell.You can’t rely on the
availability of any particular version of the Bourne shell. On many UNIX systems,
/bin/sh is a symbolic link to another shell. For instance, on most GNU/Linux sys-
tems, /bin/sh points to bash (the Bourne-Again SHell), and different GNU/Linux
distributions use different versions of bash. Invoking a program with root privilege
with the system function, for instance, can have different results on different
GNU/Linux systems.Therefore, it’s preferable to use the fork and exec method for
creating processes.
copy of its parent process. Linux provides another set of functions, the exec family, that
causes a particular process to cease being an instance of one program and to instead
become an instance of another program.To spawn a new process, you first use fork to
make a copy of the current process.Then you use exec to transform one of these
processes into an instance of the program you want to spawn.
Calling fork
When a program calls fork, a duplicate process, called the child process, is created.The
parent process continues executing the program from the point that fork was called.
The child process, too, executes the same program from the same place.
So how do the two processes differ? First, the child process is a new process and
therefore has a new process ID, distinct from its parent’s process ID. One way for a
program to distinguish whether it’s in the parent process or the child process is to call
getpid. However, the fork function provides different return values to the parent and
child processes—one process “goes in” to the fork call, and two processes “come out,”
with different return values.The return value in the parent process is the process ID of
the child.The return value in the child process is zero. Because no process ever has a
process ID of zero, this makes it easy for the program whether it is now running as the
parent or the child process.
Listing 3.3 is an example of using fork to duplicate a program’s process. Note that
the first block of the if statement is executed only in the parent process, while the
else clause is executed in the child process.
int main ()
{
pid_t child_pid;
return 0;
}
04 0430 CH03 5/22/01 10:13 AM Page 50
50 Chapter 3 Processes
int main ()
{
/* The argument list to pass to the “ls” command. */
char* arg_list[] = {
“ls”, /* argv[0], the name of the program. */
“-l”,
“/”,
NULL /* The argument list must end with a NULL. */
};
return 0;
}
04 0430 CH03 5/22/01 10:13 AM Page 52
52 Chapter 3 Processes
You can use the renice command to change the niceness of a running process from
the command line.
To change the niceness of a running process programmatically, use the nice func-
tion. Its argument is an increment value, which is added to the niceness value of the
process that calls it. Remember that a positive value raises the niceness value and thus
reduces the process’s execution priority.
Note that only a process with root privilege can run a process with a negative nice-
ness value or reduce the niceness value of a running process.This means that you may
specify negative values to the nice and renice commands only when logged in as
root, and only a process running as root can pass a negative value to the nice function.
This prevents ordinary users from grabbing execution priority away from others using
the system.
3.3 Signals
Signals are mechanisms for communicating with and manipulating processes in Linux.
The topic of signals is a large one; here we discuss some of the most important signals
and techniques that are used for controlling processes.
A signal is a special message sent to a process. Signals are asynchronous; when a
process receives a signal, it processes the signal immediately, without finishing the cur-
rent function or even the current line of code.There are several dozen different sig-
nals, each with a different meaning. Each signal type is specified by its signal number,
but in programs, you usually refer to a signal by its name. In Linux, these are defined
in /usr/include/bits/signum.h. (You shouldn’t include this header file directly in
your programs; instead, use <signal.h>.)
2. A method for serializing the two processes is presented in Section 3.4.1, “Waiting for
Process Termination.”
04 0430 CH03 5/22/01 10:13 AM Page 53
3.3 Signals 53
When a process receives a signal, it may do one of several things, depending on the
signal’s disposition. For each signal, there is a default disposition, which determines what
happens to the process if the program does not specify some other behavior. For most
signal types, a program may specify some other behavior—either to ignore the signal
or to call a special signal-handler function to respond to the signal. If a signal handler is
used, the currently executing program is paused, the signal handler is executed, and,
when the signal handler returns, the program resumes.
The Linux system sends signals to processes in response to specific conditions. For
instance, SIGBUS (bus error), SIGSEGV (segmentation violation), and SIGFPE (floating
point exception) may be sent to a process that attempts to perform an illegal opera-
tion.The default disposition for these signals it to terminate the process and produce a
core file.
A process may also send a signal to another process. One common use of this
mechanism is to end another process by sending it a SIGTERM or SIGKILL signal.3
Another common use is to send a command to a running program.Two “user-
defined” signals are reserved for this purpose: SIGUSR1 and SIGUSR2.The SIGHUP signal
is sometimes used for this purpose as well, commonly to wake up an idling program
or cause a program to reread its configuration files.
The sigaction function can be used to set a signal disposition.The first parameter
is the signal number.The next two parameters are pointers to sigaction structures; the
first of these contains the desired disposition for that signal number, while the second
receives the previous disposition.The most important field in the first or second
sigaction structure is sa_handler. It can take one of three values:
n SIG_DFL, which specifies the default disposition for the signal.
n SIG_IGN, which specifies that the signal should be ignored.
n A pointer to a signal-handler function.The function should take one parameter,
the signal number, and return void.
Because signals are asynchronous, the main program may be in a very fragile state
when a signal is processed and thus while a signal handler function executes.
Therefore, you should avoid performing any I/O operations or calling most library
and system functions from signal handlers.
A signal handler should perform the minimum work necessary to respond to the
signal, and then return control to the main program (or terminate the program). In
most cases, this consists simply of recording the fact that a signal occurred.The main
program then checks periodically whether a signal has occurred and reacts accordingly.
It is possible for a signal handler to be interrupted by the delivery of another signal.
While this may sound like a rare occurrence, if it does occur, it will be very difficult to
diagnose and debug the problem. (This is an example of a race condition, discussed in
Chapter 4, “Threads,” Section 4.4, “Synchronization and Critical Sections.”) Therefore,
you should be very careful about what your program does in a signal handler.
3.What’s the difference? The SIGTERM signal asks a process to terminate; the process may
ignore the request by masking or ignoring the signal.The SIGKILL signal always kills the process
immediately because the process may not mask or ignore SIGKILL.
04 0430 CH03 5/22/01 10:13 AM Page 54
54 Chapter 3 Processes
Even assigning a value to a global variable can be dangerous because the assignment
may actually be carried out in two or more machine instructions, and a second signal
may occur between them, leaving the variable in a corrupted state. If you use a global
variable to flag a signal from a signal-handler function, it should be of the special type
sig_atomic_t. Linux guarantees that assignments to variables of this type are per-
formed in a single instruction and therefore cannot be interrupted midway. In Linux,
sig_atomic_t is an ordinary int; in fact, assignments to integer types the size of int or
smaller, or to pointers, are atomic. If you want to write a program that’s portable to
any standard UNIX system, though, use sig_atomic_t for these global variables.
This program skeleton in Listing 3.5, for instance, uses a signal-handler function to
count the number of times that the program receives SIGUSR1, one of the signals
reserved for application use.
sig_atomic_t sigusr1_count = 0;
int main ()
{
struct sigaction sa;
memset (&sa, 0, sizeof (sa));
sa.sa_handler = &handler;
sigaction (SIGUSR1, &sa, NULL);
To send a signal from a program, use the kill function.The first parameter is the tar-
get process ID.The second parameter is the signal number; use SIGTERM to simulate the
default behavior of the kill command. For instance, where child pid contains the
process ID of the child process, you can use the kill function to terminate a child
process from the parent by calling it like this:
kill (child_pid, SIGTERM);
Include the <sys/types.h> and <signal.h> headers if you use the kill function.
By convention, the exit code is used to indicate whether the program executed
correctly. An exit code of zero indicates correct execution, while a nonzero exit code
indicates that an error occurred. In the latter case, the particular value returned may
give some indication of the nature of the error. It’s a good idea to stick with this con-
vention in your programs because other components of the GNU/Linux system
assume this behavior. For instance, shells assume this convention when you connect
multiple programs with the && (logical and) and || (logical or) operators.Therefore,
you should explicitly return zero from your main function, unless an error occurs.
04 0430 CH03 5/22/01 10:13 AM Page 56
56 Chapter 3 Processes
With most shells, it’s possible to obtain the exit code of the most recently executed
program using the special $? variable. Here’s an example in which the ls command is
invoked twice and its exit code is displayed after each invocation. In the first case, ls
executes correctly and returns the exit code zero. In the second case, ls encounters an
error (because the filename specified on the command line does not exist) and thus
returns a nonzero exit code.
% ls /
bin coda etc lib misc nfs proc sbin usr
boot dev home lost+found mnt opt root tmp var
% echo $?
0
% ls bogusfile
ls: bogusfile: No such file or directory
% echo $?
1
Note that even though the parameter type of the exit function is int and the main
function returns an int, Linux does not preserve the full 32 bits of the return code. In
fact, you should use exit codes only between zero and 127. Exit codes above 128 have
a special meaning—when a process is terminated by a signal, its exit code is 128 plus
the signal number.
You can use the WIFEXITED macro to determine from a child process’s exit status
whether that process exited normally (via the exit function or returning from main)
or died from an unhandled signal. In the latter case, use the WTERMSIG macro to extract
from its exit status the signal number by which it died.
Here is the main function from the fork and exec example again.This time, the
parent process calls wait to wait until the child process, in which the ls command
executes, is finished.
int main ()
{
int child_status;
return 0;
}
Several similar system calls are available in Linux, which are more flexible or provide
more information about the exiting child process.The waitpid function can be used
to wait for a specific child process to exit instead of any child process.The wait3 func-
tion returns CPU usage statistics about the exiting child process, and the wait4
function allows you to specify additional options about which processes to wait for.
58 Chapter 3 Processes
A zombie process is a process that has terminated but has not been cleaned up yet. It
is the responsibility of the parent process to clean up its zombie children.The wait
functions do this, too, so it’s not necessary to track whether your child process is still
executing before waiting for it. Suppose, for instance, that a program forks a child
process, performs some other computations, and then calls wait. If the child process
has not terminated at that point, the parent process will block in the wait call until the
child process finishes. If the child process finishes before the parent process calls wait,
the child process becomes a zombie.When the parent process calls wait, the zombie
child’s termination status is extracted, the child process is deleted, and the wait call
returns immediately.
What happens if the parent does not clean up its children? They stay around in the
system, as zombie processes.The program in Listing 3.6 forks a child process, which
terminates immediately and then goes to sleep for a minute, without ever cleaning up
the child process.
int main ()
{
pid_t child_pid;
Try compiling this file to an executable named make-zombie. Run it, and while it’s still
running, list the processes on the system by invoking the following command in
another window:
% ps -e -o pid,ppid,stat,cmd
04 0430 CH03 5/22/01 10:13 AM Page 59
This lists the process ID, parent process ID, process status, and process command
line. Observe that, in addition to the parent make-zombie process, there is another
make-zombie process listed. It’s the child process; note that its parent process ID is
the process ID of the main make-zombie process.The child process is marked as
<defunct>, and its status code is Z, for zombie.
What happens when the main make-zombie program ends when the parent process
exits, without ever calling wait? Does the zombie process stay around? No—try
running ps again, and note that both of the make-zombie processes are gone.When a
program exits, its children are inherited by a special process, the init program, which
always runs with process ID of 1 (it’s the first process started when Linux boots).The
init process automatically cleans up any zombie child processes that it inherits.
60 Chapter 3 Processes
sig_atomic_t child_exit_status;
int main ()
{
/* Handle SIGCHLD by calling clean_up_child_process. */
struct sigaction sigchld_action;
memset (&sigchld_action, 0, sizeof (sigchld_action));
sigchld_action.sa_handler = &clean_up_child_process;
sigaction (SIGCHLD, &sigchld_action, NULL);
return 0;
}
Note how the signal handler stores the child process’s exit status in a global variable,
from which the main program can access it. Because the variable is assigned in a signal
handler, its type is sig_atomic_t.
05 0430 CH04 5/22/01 10:21 AM Page 61
4
Threads
62 Chapter 4 Threads
from or write to that file descriptor. Because a process and all its threads can be exe-
cuting only one program at a time, if any thread inside a process calls one of the exec
functions, all the other threads are ended (the new program may, of course, create new
threads).
GNU/Linux implements the POSIX standard thread API (known as pthreads). All
thread functions and data types are declared in the header file <pthread.h>.The
pthread functions are not included in the standard C library. Instead, they are in
libpthread, so you should add -lpthread to the command line when you link your
program.
4. A thread argument value of type void*. Whatever you pass is simply passed as
the argument to the thread function when the thread begins executing.
A call to pthread_create returns immediately, and the original thread continues exe-
cuting the instructions following the call. Meanwhile, the new thread begins executing
the thread function. Linux schedules both threads asynchronously, and your program
must not rely on the relative order in which instructions are executed in the two
threads.
05 0430 CH04 5/22/01 10:21 AM Page 63
The program in Listing 4.1 creates a thread that prints x’s continuously to standard
error. After calling pthread_create, the main thread prints o’s continuously to standard
error.
int main ()
{
pthread_t thread_id;
/* Create a new thread. The new thread will run the print_xs
function. */
pthread_create (&thread_id, NULL, &print_xs, NULL);
/* Print o’s continuously to stderr. */
while (1)
fputc (‘o’, stderr);
return 0;
}
Try running it to see what happens. Notice the unpredictable pattern of x’s and o’s as
Linux alternately schedules the two threads.
Under normal circumstances, a thread exits in one of two ways. One way, as illus-
trated previously, is by returning from the thread function.The return value from the
thread function is taken to be the return value of the thread. Alternately, a thread can
exit explicitly by calling pthread_exit.This function may be called from within the
thread function or from some other function called directly or indirectly by the thread
function.The argument to pthread_exit is the thread’s return value.
05 0430 CH04 5/22/01 10:21 AM Page 64
64 Chapter 4 Threads
/* Parameters to print_function. */
struct char_print_parms
{
/* The character to print. */
char character;
/* The number of times to print it. */
int count;
};
int main ()
{
pthread_t thread1_id;
05 0430 CH04 5/22/01 10:21 AM Page 65
pthread_t thread2_id;
struct char_print_parms thread1_args;
struct char_print_parms thread2_args;
return 0;
}
But wait! The program in Listing 4.2 has a serious bug in it.The main thread (which
runs the main function) creates the thread parameter structures (thread1_args and
thread2_args) as local variables, and then passes pointers to these structures to the
threads it creates.What’s to prevent Linux from scheduling the three threads in such a
way that main finishes executing before either of the other two threads are done?
Nothing! But if this happens, the memory containing the thread parameter structures
will be deallocated while the other two threads are still accessing it.
continues
05 0430 CH04 5/22/01 10:21 AM Page 66
66 Chapter 4 Threads
The moral of the story: Make sure that any data you pass to a thread by reference is
not deallocated, even by a different thread, until you’re sure that the thread is done with
it.This is true both for local variables, which are deallocated when they go out of
scope, and for heap-allocated variables, which you deallocate by calling free (or using
delete in C++).
1. Note that this is not portable, and it’s up to you to make sure that your value can be cast
safely to void* and back without losing bits.
05 0430 CH04 5/22/01 10:21 AM Page 67
while (1) {
int factor;
int is_prime = 1;
int main ()
{
pthread_t thread;
int which_prime = 5000;
int prime;
68 Chapter 4 Threads
int main ()
{
pthread_attr_t attr;
pthread_t thread;
pthread_attr_init (&attr);
pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED);
pthread_create (&thread, &attr, &thread_function, NULL);
pthread_attr_destroy (&attr);
/* Do work here... */
Even if a thread is created in a joinable state, it may later be turned into a detached
thread.To do this, call pthread_detach. Once a thread is detached, it cannot be made
joinable again.
70 Chapter 4 Threads
What constitutes a cancellation point, and where should these be placed? The most
direct way to create a cancellation point is to call pthread_testcancel.This does
nothing except process a pending cancellation in a synchronously cancelable thread.
You should call pthread_testcancel periodically during lengthy computations in a
thread function, at points where the thread can be canceled without leaking any
resources or producing other ill effects.
Certain other functions are implicitly cancellation points as well.These are listed on
the pthread_cancel man page. Note that other functions may use these functions
internally and thus will indirectly be cancellation points.
05 0430 CH04 5/22/01 10:21 AM Page 71
float* account_balances;
continues
05 0430 CH04 5/22/01 10:21 AM Page 72
72 Chapter 4 Threads
return 0;
}
Note that it’s important to restore the old cancel state at the end of the critical section
rather than setting it unconditionally to PTHREAD_CANCEL_ENABLE.This enables you to
call the process_transaction function safely from within another critical section—in
that case, your function will leave the cancel state the same way it found it.
You may create as many thread-specific data items as you want, each of type void*.
Each item is referenced by a key.To create a new key, and thus a new data item for
each thread, use pthread_key_create.The first argument is a pointer to a
pthread_key_t variable.That key value can be used by each thread to access its own
copy of the corresponding data item.The second argument to pthread_key_t is a
cleanup function. If you pass a function pointer here, GNU/Linux automatically calls
that function when each thread exits, passing the thread-specific value corresponding
to that key.This is particularly handy because the cleanup function is called even if the
thread is canceled at some arbitrary point in its execution. If the thread-specific value
is null, the thread cleanup function is not called. If you don’t need a cleanup function,
you may pass null instead of a function pointer.
After you’ve created a key, each thread can set its thread-specific value correspond-
ing to that key by calling pthread_setspecific.The first argument is the key, and the
second is the void* thread-specific value to store.To retrieve a thread-specific data
item, call pthread_getspecific, passing the key as its argument.
Suppose, for instance, that your application divides a task among multiple threads.
For audit purposes, each thread is to have a separate log file, in which progress mes-
sages for that thread’s tasks are recorded.The thread-specific data area is a convenient
place to store the file pointer for the log file for each individual thread.
Listing 4.7 shows how you might implement this.The main function in this sample
program creates a key to store the thread-specific file pointer and then stores it in
thread_log_key. Because this is a global variable, it is shared by all threads.When each
thread starts executing its thread function, it opens a log file and stores the file pointer
under that key. Later, any of these threads may call write_to_thread_log to write a
message to the thread-specific log file.That function retrieves the file pointer for the
thread’s log file from thread-specific data and writes the message.
Listing 4.7 (tsd.c) Per-Thread Log Files Implemented with Thread-Specific Data
#include <malloc.h>
#include <pthread.h>
#include <stdio.h>
/* The key used to associate a log file pointer with each thread. */
static pthread_key_t thread_log_key;
continues
05 0430 CH04 5/22/01 10:21 AM Page 74
74 Chapter 4 Threads
return NULL;
}
int main ()
{
int i;
pthread_t threads[5];
Observe that thread_function does not need to close the log file.That’s because when
the log file key was created, close_thread_log was specified as the cleanup function
for that key.Whenever a thread exits, GNU/Linux calls that function, passing the
thread-specific value for the thread log key.This function takes care of closing the
log file.
05 0430 CH04 5/22/01 10:21 AM Page 75
void do_some_work ()
{
/* Allocate a temporary buffer. */
continues
05 0430 CH04 5/22/01 10:21 AM Page 76
76 Chapter 4 Threads
Listing 4.9 (cxx-exit.cpp) Implementing Safe Thread Exit with C++ Exceptions
#include <pthread.h>
class ThreadExitException
{
public:
/* Create an exception-signaling thread exit with RETURN_VALUE. */
ThreadExitException (void* return_value)
: thread_return_value_ (return_value)
05 0430 CH04 5/22/01 10:21 AM Page 77
{
}
/* Actually exit the thread, using the return value provided in the
constructor. */
void* DoThreadExit ()
{
pthread_exit (thread_return_value_);
}
private:
/* The return value that will be used when exiting the thread. */
void* thread_return_value_;
};
void do_some_work ()
{
while (1) {
/* Do some useful things here... */
if (should_exit_thread_immediately ())
throw ThreadExitException (/* thread’s return value = */ NULL);
}
}
78 Chapter 4 Threads
The ultimate cause of most bugs involving threads is that the threads are accessing
the same data. As mentioned previously, that’s one of the powerful aspects of threads,
but it can also be dangerous. If one thread is only partway through updating a data
structure when another thread accesses the same data structure, chaos is likely to
ensue. Often, buggy threaded programs contain a code that will work only if one
thread gets scheduled more often—or sooner—than another thread.These bugs are
called race conditions; the threads are racing one another to change the same data
structure.
Listing 4.10 ( job-queue1.c) Thread Function to Process Jobs from the Queue
#include <malloc.h>
struct job {
/* Link field for linked list. */
struct job* next;
Now suppose that two threads happen to finish a job at about the same time, but only
one job remains in the queue.The first thread checks whether job_queue is null; find-
ing that it isn’t, the thread enters the loop and stores the pointer to the job object in
next_job. At this point, Linux happens to interrupt the first thread and schedules the
second.The second thread also checks job_queue and finding it non-null, also assigns
the same job pointer to next_job. By unfortunate coincidence, we now have two
threads executing the same job.
To make matters worse, one thread will unlink the job object from the queue,
leaving job_queue containing null.When the other thread evaluates job_queue->next,
a segmentation fault will result.
This is an example of a race condition. Under “lucky” circumstances, this particular
schedule of the two threads may never occur, and the race condition may never
exhibit itself. Only under different circumstances, perhaps when running on a heavily
loaded system (or on an important customer’s new multiprocessor server!) may the
bug exhibit itself.
To eliminate race conditions, you need a way to make operations atomic. An atomic
operation is indivisible and uninterruptible; once the operation starts, it will not be
paused or interrupted until it completes, and no other operation will take place mean-
while. In this particular example, you want to check job_queue; if it’s not empty,
remove the first job, all as a single atomic operation.
4.4.2 Mutexes
The solution to the job queue race condition problem is to let only one thread access
the queue of jobs at a time. Once a thread starts looking at the queue, no other thread
should be able to access it until the first thread has decided whether to process a job
and, if so, has removed the job from the list.
Implementing this requires support from the operating system. GNU/Linux pro-
vides mutexes, short for MUTual EXclusion locks. A mutex is a special lock that only one
thread may lock at a time. If a thread locks a mutex and then a second thread also tries
to lock the same mutex, the second thread is blocked, or put on hold. Only when the
first thread unlocks the mutex is the second thread unblocked—allowed to resume
execution. GNU/Linux guarantees that race conditions do not occur among threads
attempting to lock a mutex; only one thread will ever get the lock, and all other
threads will be blocked.
Think of a mutex as the lock on a lavatory door.Whoever gets there first enters the
lavatory and locks the door. If someone else attempts to enter the lavatory while it’s
occupied, that person will find the door locked and will be forced to wait outside
until the occupant emerges.
To create a mutex, create a variable of type pthread_mutex_t and pass a pointer to
it to pthread_mutex_init.The second argument to pthread_mutex_init is a pointer
to a mutex attribute object, which specifies attributes of the mutex. As with
05 0430 CH04 5/22/01 10:21 AM Page 80
80 Chapter 4 Threads
struct job {
/* Link field for linked list. */
struct job* next;
82 Chapter 4 Threads
new_job->next = job_queue;
job_queue = new_job;
pthread_mutex_unlock (&job_queue_mutex);
}
pthread_mutexattr_init (&attr);
pthread_mutexattr_setkind_np (&attr, PTHREAD_MUTEX_ERRORCHECK_NP);
pthread_mutex_init (&mutex, &attr);
pthread_mutexattr_destroy (&attr);
05 0430 CH04 5/22/01 10:21 AM Page 83
As suggested by the “np” suffix, the recursive and error-checking mutex kinds are spe-
cific to GNU/Linux and are not portable.Therefore, it is generally not advised to use
them in programs. (Error-checking mutexes can be useful when debugging, though.)
84 Chapter 4 Threads
struct job {
/* Link field for linked list. */
struct job* next;
2. A nonzero value would indicate a semaphore that can be shared across processes, which is
not supported by GNU/Linux for this type of semaphore.
05 0430 CH04 5/22/01 10:21 AM Page 85
void initialize_job_queue ()
{
/* The queue is initially empty. */
job_queue = NULL;
/* Initialize the semaphore which counts jobs in the queue. Its
initial value should be zero. */
sem_init (&job_queue_count, 0, 0);
}
continues
05 0430 CH04 5/22/01 10:21 AM Page 86
86 Chapter 4 Threads
Before taking a job from the front of the queue, each thread will first wait on the
semaphore. If the semaphore’s value is zero, indicating that the queue is empty, the
thread will simply block until the semaphore’s value becomes positive, indicating that a
job has been added to the queue.
The enqueue_job function adds a job to the queue. Just like thread_function, it
needs to lock the queue mutex before modifying the queue. After adding a job to the
queue, it posts to the semaphore, indicating that a new job is available. In the version
shown in Listing 4.12, the threads that process the jobs never exit; if no jobs are avail-
able for a while, all the threads simply block in sem_wait.
whenever the flag is not set, checking and rechecking the flag, each time locking and
unlocking the mutex.What you really want is a way to put the thread to sleep when
the flag is not set, until some circumstance changes that might cause the flag to
become set.
int thread_flag;
pthread_mutex_t thread_flag_mutex;
void initialize_flag ()
{
pthread_mutex_init (&thread_flag_mutex, NULL);
thread_flag = 0;
}
if (flag_is_set)
do_work ();
/* Else don’t do anything. Just loop again. */
}
return NULL;
}
88 Chapter 4 Threads
A condition variable enables you to implement a condition under which a thread exe-
cutes and, inversely, the condition under which the thread is blocked. As long as every
thread that potentially changes the sense of the condition uses the condition variable
properly, Linux guarantees that threads blocked on the condition will be unblocked
when the condition changes.
As with a semaphore, a thread may wait on a condition variable. If thread A waits
on a condition variable, it is blocked until some other thread, thread B, signals the
same condition variable. Unlike a semaphore, a condition variable has no counter or
memory; thread A must wait on the condition variable before thread B signals it. If
thread B signals the condition variable before thread A waits on it, the signal is lost,
and thread A blocks until some other thread signals the condition variable again.
This is how you would use a condition variable to make the previous sample more
efficient:
n The loop in thread_function checks the flag. If the flag is not set, the thread
waits on the condition variable.
n The set_thread_flag function signals the condition variable after changing the
flag value.That way, if thread_function is blocked on the condition variable, it
will be unblocked and will check the condition again.
There’s one problem with this:There’s a race condition between checking the
flag value and signaling or waiting on the condition variable. Suppose that
thread_function checked the flag and found that it was not set. At that moment, the
Linux scheduler paused that thread and resumed the main one. By some coincidence,
the main thread is in set_thread_flag. It sets the flag and then signals the condition
variable. Because no thread is waiting on the condition variable at the time (remember
that thread_function was paused before it could wait on the condition variable), the
signal is lost. Now, when Linux reschedules the other thread, it starts waiting on the
condition variable and may end up blocked forever.
To solve this problem, we need a way to lock the flag and the condition variable
together with a single mutex. Fortunately, GNU/Linux provides exactly this mecha-
nism. Each condition variable must be used in conjunction with a mutex, to prevent
this sort of race condition. Using this scheme, the thread function follows these steps:
1. The loop in thread_function locks the mutex and reads the flag value.
2. If the flag is set, it unlocks the mutex and executes the work function.
3. If the flag is not set, it atomically unlocks the mutex and waits on the condition
variable.
The critical feature here is in step 3, in which GNU/Linux allows you to unlock the
mutex and wait on the condition variable atomically, without the possibility of
another thread intervening.This eliminates the possibility that another thread may
change the flag value and signal the condition variable in between thread_function’s
test of the flag value and wait on the condition variable.
05 0430 CH04 5/22/01 10:21 AM Page 89
90 Chapter 4 Threads
int thread_flag;
pthread_cond_t thread_flag_cv;
pthread_mutex_t thread_flag_mutex;
void initialize_flag ()
{
/* Initialize the mutex and condition variable. */
pthread_mutex_init (&thread_flag_mutex, NULL);
pthread_cond_init (&thread_flag_cv, NULL);
/* Initialize the flag value. */
thread_flag = 0;
}
thread_flag = flag_value;
pthread_cond_signal (&thread_flag_cv);
/* Unlock the mutex. */
pthread_mutex_unlock (&thread_flag_mutex);
}
92 Chapter 4 Threads
int main ()
{
pthread_t thread;
fprintf (stderr, “main thread pid is %d\n”, (int) getpid ());
pthread_create (&thread, NULL, &thread_function, NULL);
/* Spin forever. */
while (1);
return 0;
}
Run the program in the background, and then invoke ps x to display your running
processes. Don’t forget to kill the thread-pid program afterward—it consumes lots of
CPU doing nothing. Here’s what the output might look like:
% cc thread-pid.c -o thread-pid -lpthread
% ./thread-pid &
[1] 14608
main thread pid is 14608
child thread pid is 14610
% ps x
PID TTY STAT TIME COMMAND
14042 pts/9 S 0:00 bash
14608 pts/9 R 0:01 ./thread-pid
05 0430 CH04 5/22/01 10:21 AM Page 93
Notice that there are three processes running the thread-pid program.The first of
these, with pid 14608, is the main thread in the program; the third, with pid 14610, is
the thread we created to execute thread_function.
How about the second thread, with pid 14609? This is the “manager thread,” which
is part of the internal implementation of GNU/Linux threads.The manager thread is
created the first time a program calls pthread_create to create a new thread.
94 Chapter 4 Threads
The Linux clone system call is a generalized form of fork and pthread_create that
allows the caller to specify which resources are shared between the calling process and
the newly created process. Also, clone requires you to specify the memory region for
the execution stack that the new process will use. Although we mention clone here to
satisfy the reader’s curiosity, that system call should not ordinarily be used in programs.
Use fork to create new processes or pthread_create to create threads.
5
Interprocess Communication
the two with a pipe, represented by the “|” symbol. A pipe permits one-way commu-
nication between two related processes.The ls process writes data into the pipe, and
the lpr process reads data from the pipe.
In this chapter, we discuss five types of interprocess communication:
n Shared memory permits processes to communicate by simply reading and
writing to a specified memory location.
n Mapped memory is similar to shared memory, except that it is associated with a
file in the filesystem.
n Pipes permit sequential communication from one process to a related process.
n FIFOs are similar to pipes, except that unrelated processes can communicate
because the pipe is given a name in the filesystem.
n Sockets support communication between unrelated processes even on different
computers.
These types of IPC differ by the following criteria:
nWhether they restrict communication to related processes (processes with a
common ancestor), to unrelated processes sharing the same filesystem, or to any
computer connected to a network
nWhether a communicating process is limited to only write data or only
read data
nThe number of processes permitted to communicate
nWhether the communicating processes are synchronized by the IPC—for
example, a reading process halts until data is available to read
In this chapter, we omit discussion of IPC permitting communication only a limited
number of times, such as communicating via a child’s exit value.
Because the kernel does not synchronize accesses to shared memory, you must pro-
vide your own synchronization. For example, a process should not read from the
memory until after data is written there, and two processes must not write to the same
memory location at the same time. A common strategy to avoid these race conditions
is to use semaphores, which are discussed in the next section. Our illustrative pro-
grams, though, show just a single process accessing the memory, to focus on the shared
memory mechanism and to avoid cluttering the sample code with synchronization
logic.
5.1.3 Allocation
A process allocates a shared memory segment using shmget (“SHared Memory
GET”). Its first parameter is an integer key that specifies which segment to create.
Unrelated processes can access the same shared segment by specifying the same key
value. Unfortunately, other processes may have also chosen the same fixed key, which
could lead to conflict. Using the special constant IPC_PRIVATE as the key value guaran-
tees that a brand new memory segment is created.
06 0430 CH05 5/22/01 10:22 AM Page 98
Its second parameter specifies the number of bytes in the segment. Because seg-
ments are allocated using pages, the number of actually allocated bytes is rounded up
to an integral multiple of the page size.
The third parameter is the bitwise or of flag values that specify options to shmget.
The flag values include these:
n IPC_CREAT—This flag indicates that a new segment should be created.This per-
mits creating a new segment while specifying a key value.
n IPC_EXCL—This flag, which is always used with IPC_CREAT, causes shmget to fail
if a segment key is specified that already exists.Therefore, it arranges for the call-
ing process to have an “exclusive” segment. If this flag is not given and the key
of an existing segment is used, shmget returns the existing segment instead of
creating a new one.
n Mode flags—This value is made of 9 bits indicating permissions granted to
owner, group, and world to control access to the segment. Execution bits are
ignored. An easy way to specify permissions is to use the constants defined in
<sys/stat.h> and documented in the section 2 stat man page.1 For example,
S_IRUSR and S_IWUSR specify read and write permissions for the owner of the
shared memory segment, and S_IROTH and S_IWOTH specify read and write per-
missions for others.
For example, this invocation of shmget creates a new shared memory segment (or
access to an existing one, if shm_key is already used) that’s readable and writeable to
the owner but not other users.
int segment_id = shmget (shm_key, getpagesize (),
IPC_CREAT | S_IRUSR | S_IWUSER);
If the call succeeds, shmget returns a segment identifier. If the shared memory segment
already exists, the access permissions are verified and a check is made to ensure that
the segment is not marked for destruction.
1.These permission bits are the same as those used for files.They are described in Section
10.3, “File System Permissions.”
06 0430 CH05 5/22/01 10:22 AM Page 99
If the call succeeds, it returns the address of the attached shared segment. Children cre-
ated by calls to fork inherit attached shared segments; they can detach the shared
memory segments, if desired.
When you’re finished with a shared memory segment, the segment should be
detached using shmdt (“SHared Memory DeTach”). Pass it the address returned by
shmat. If the segment has been deallocated and this was the last process using it, it is
removed. Calls to exit and any of the exec family automatically detach segments.
int main ()
{
int segment_id;
char* shared_memory;
struct shmid_ds shmbuffer;
int segment_size;
const int shared_segment_size = 0x6400;
continues
06 0430 CH05 5/22/01 10:22 AM Page 100
return 0;
}
5.1.7 Debugging
The ipcs command provides information on interprocess communication facilities,
including shared segments. Use the -m flag to obtain information about shared
memory. For example, this code illustrates that one shared memory segment,
numbered 1627649, is in use:
% ipcs -m
If this memory segment was erroneously left behind by a program, you can use the
ipcrm command to remove it.
% ipcrm shm 1627649
06 0430 CH05 5/22/01 10:22 AM Page 101
union semun {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
};
union semun {
int val;
struct semid_ds *buf;
unsigned short int *array;
struct seminfo *__buf;
};
If sem_op is zero, the operation blocks until the semaphore value becomes zero.
n sem_flg is a flag value. Specify IPC_NOWAIT to prevent the operation from
blocking; if the operation would have blocked, the call to semop fails instead.
If you specify SEM_UNDO, Linux automatically undoes the operation on the
semaphore when the process exits.
06 0430 CH05 5/22/01 10:22 AM Page 104
Listing 5.4 illustrates wait and post operations for a binary semaphore.
Listing 5.4 (sem_pv.c) Wait and Post Operations for a Binary Semaphore
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/sem.h>
/* Wait on a binary semaphore. Block until the semaphore value is positive, then
decrement it by 1. */
Specifying the SEM_UNDO flag permits dealing with the problem of terminating a
process while it has resources allocated through a semaphore.When a process termi-
nates, either voluntarily or involuntarily, the semaphore’s values are automatically
adjusted to “undo” the process’s effects on the semaphore. For example, if a process
that has decremented a semaphore is killed, the semaphore’s value is incremented.
06 0430 CH05 5/22/01 10:22 AM Page 105
return 0;
}
The mmap-write program opens the file, creating it if it did not previously exist.The
third argument to open specifies that the file is opened for reading and writing.
Because we do not know the file’s length, we use lseek to ensure that the file is large
enough to store an integer and then move back the file position to its beginning.
The program maps the file and then closes the file descriptor because it’s no longer
needed.The program then writes a random integer to the mapped memory, and thus
the file, and unmaps the memory.The munmap call is unnecessary because Linux would
automatically unmap the file when the program terminates.
continues
06 0430 CH05 5/22/01 10:22 AM Page 108
return 0;
}
The mmap-read program reads the number out of the file and then writes the doubled
value to the file. First, it opens the file and maps it for reading and writing. Because
we can assume that the file is large enough to store an unsigned integer, we need not
use lseek, as in the previous program.The program reads and parses the value out
of memory using sscanf and then formats and writes the double value using sprintf.
Here’s an example of running these example programs. It maps the file
/tmp/integer-file.
% ./mmap-write /tmp/integer-file
% cat /tmp/integer-file
42
% ./mmap-read /tmp/integer-file
value: 42
% cat /tmp/integer-file
84
Observe that the text 42 was written to the disk file without ever calling write, and
was read back in again without calling read. Note that these sample programs write
and read the integer as a string (using sprintf and sscanf) for demonstration purposes
only—there’s no need for the contents of a memory-mapped file to be text.You can
store and retrieve arbitrary binary in a memory-mapped file.
n MS_INVALIDATE—All other file mappings are invalidated so that they can see the
updated values.
For example, to flush a shared file mapped at address mem_addr of length mem_length
bytes, call this:
msync (mem_addr, mem_length, MS_SYNC | MS_INVALIDATE);
5.4 Pipes
A pipe is a communication device that permits unidirectional communication. Data
written to the “write end” of the pipe is read back from the “read end.” Pipes are
serial devices; the data is always read from the pipe in the same order it was written.
Typically, a pipe is used to communicate between two threads in a single process or
between parent and child processes.
In a shell, the symbol | creates a pipe. For example, this shell command causes the
shell to produce two child processes, one for ls and one for less:
% ls | less
The shell also creates a pipe connecting the standard output of the ls subprocess with
the standard input of the less process.The filenames listed by ls are sent to less in
exactly the same order as if they were sent directly to the terminal.
A pipe’s data capacity is limited. If the writer process writes faster than the reader
process consumes the data, and if the pipe cannot store more data, the writer process
blocks until more capacity becomes available. If the reader tries to read but no data is
available, it blocks until data becomes available.Thus, the pipe automatically synchro-
nizes the two processes.
pipe (pipe_fds);
read_fd = pipe_fds[0];
write_fd = pipe_fds[1];
Data written to the file descriptor read_fd can be read back from write_fd.
int main ()
{
int fds[2];
pid_t pid;
/* Create a pipe. File descriptors for the two ends of the pipe are
placed in fds. */
pipe (fds);
/* Fork a child process. */
pid = fork ();
if (pid == (pid_t) 0) {
FILE* stream;
/* This is the child process. Close our copy of the write end of
the file descriptor. */
close (fds[1]);
/* Convert the read file descriptor to a FILE object, and read
from it. */
stream = fdopen (fds[0], “r”);
reader (stream);
continues
06 0430 CH05 5/22/01 10:23 AM Page 112
return 0;
}
At the beginning of main, fds is declared to be an integer array with size 2.The pipe
call creates a pipe and places the read and write file descriptors in that array.The pro-
gram then forks a child process. After closing the read end of the pipe, the parent
process starts writing strings to the pipe. After closing the write end of the pipe, the
child reads strings from the pipe.
Note that after writing in the writer function, the parent flushes the pipe by
calling fflush. Otherwise, the string may not be sent through the pipe immediately.
When you invoke the command ls | less, two forks occur: one for the ls child
process and one for the less child process. Both of these processes inherit the pipe file
descriptors so they can communicate using a pipe.To have unrelated processes com-
municate, use a FIFO instead, as discussed in Section 5.4.5, “FIFOs.”
The symbolic constant STDIN_FILENO represents the file descriptor for the standard
input, which has the value 0.The call closes standard input and then reopens it as a
duplicate of fd so that the two may be used interchangeably. Equated file descriptors
share the same file position and the same set of file status flags.Thus, characters read
from fd are not reread from standard input.
06 0430 CH05 5/22/01 10:23 AM Page 113
The program in Listing 5.8 uses dup2 to send the output from a pipe to the sort
command.2 After creating a pipe, the program forks.The parent process prints some
strings to the pipe.The child process attaches the read file descriptor of the pipe to its
standard input using dup2. It then executes the sort program.
int main ()
{
int fds[2];
pid_t pid;
/* Create a pipe. File descriptors for the two ends of the pipe are
placed in fds. */
pipe (fds);
/* Fork a child process. */
pid = fork ();
if (pid == (pid_t) 0) {
/* This is the child process. Close our copy of the write end of
the file descriptor. */
close (fds[1]);
/* Connect the read end of the pipe to standard input. */
dup2 (fds[0], STDIN_FILENO);
/* Replace the child process with the “sort” program. */
execlp (“sort”, “sort”, 0);
}
else {
/* This is the parent process. */
FILE* stream;
/* Close our copy of the read end of the file descriptor. */
close (fds[0]);
/* Convert the write file descriptor to a FILE object, and write
to it. */
stream = fdopen (fds[1], “w”);
fprintf (stream, “This is a test.\n”);
fprintf (stream, “Hello, world.\n”);
fprintf (stream, “My dog has fleas.\n”);
fprintf (stream, “This program is great.\n”);
fprintf (stream, “One fish, two fish.\n”);
fflush (stream);
close (fds[1]);
/* Wait for the child process to finish. */
waitpid (pid, NULL, 0);
}
return 0;
}
2. sort reads lines of text from standard input, sorts them into alphabetical order, and prints
them to standard output.
06 0430 CH05 5/22/01 10:23 AM Page 114
int main ()
{
FILE* stream = popen (“sort”, “w”);
fprintf (stream, “This is a test.\n”);
fprintf (stream, “Hello, world.\n”);
fprintf (stream, “My dog has fleas.\n”);
fprintf (stream, “This program is great.\n”);
fprintf (stream, “One fish, two fish.\n”);
return pclose (stream);
}
The call to popen creates a child process executing the sort command, replacing calls
to pipe, fork, dup2, and execlp.The second argument, “w”, indicates that this process
wants to write to the child process.The return value from popen is one end of a pipe;
the other end is connected to the child process’s standard input. After the writing fin-
ishes, pclose closes the child process’s stream, waits for the process to terminate, and
returns its status value.
The first argument to popen is executed as a shell command in a subprocess run-
ning /bin/sh.The shell searches the PATH environment variable in the usual way to
find programs to execute. If the second argument is “r”, the function returns the child
process’s standard output stream so that the parent can read the output. If the second
argument is “w”, the function returns the child process’s standard input stream so that
the parent can send data. If an error occurs, popen returns a null pointer.
Call pclose to close a stream returned by popen. After closing the specified stream,
pclose waits for the child process to terminate.
5.4.5 FIFOs
A first-in, first-out (FIFO) file is a pipe that has a name in the filesystem. Any process
can open or close the FIFO; the processes on either end of the pipe need not be
related to each other. FIFOs are also called named pipes.
06 0430 CH05 5/22/01 10:23 AM Page 115
You can make a FIFO using the mkfifo command. Specify the path to the FIFO
on the command line. For example, create a FIFO in /tmp/fifo by invoking this:
% mkfifo /tmp/fifo
% ls -l /tmp/fifo
prw-rw-rw- 1 samuel users 0 Jan 16 14:04 /tmp/fifo
The first character of the output from ls is p, indicating that this file is actually a
FIFO (named pipe). In one window, read from the FIFO by invoking the following:
% cat < /tmp/fifo
Then type in some lines of text. Each time you press Enter, the line of text is sent
through the FIFO and appears in the first window. Close the FIFO by pressing
Ctrl+D in the second window. Remove the FIFO with this line:
% rm /tmp/fifo
Creating a FIFO
Create a FIFO programmatically using the mkfifo function.The first argument is the
path at which to create the FIFO; the second parameter specifies the pipe’s owner,
group, and world permissions, as discussed in Chapter 10, “Security,” Section 10.3,
“File System Permissions.” Because a pipe must have a reader and a writer, the permis-
sions must include both read and write permissions. If the pipe cannot be created
(for instance, if a file with that name already exists), mkfifo returns –1. Include
<sys/types.h> and <sys/stat.h> if you call mkfifo.
Accessing a FIFO
Access a FIFO just like an ordinary file.To communicate through a FIFO, one pro-
gram must open it for writing, and another program must open it for reading. Either
low-level I/O functions (open, write, read, close, and so on, as listed in Appendix B,
“Low-Level I/O”) or C library I/O functions (fopen, fprintf, fscanf, fclose, and so
on) may be used.
For example, to write a buffer of data to a FIFO using low-level I/O routines, you
could use this code:
int fd = open (fifo_path, O_WRONLY);
write (fd, data, data_length);
close (fd);
To read a string from the FIFO using C library I/O functions, you could use
this code:
FILE* fifo = fopen (fifo_path, “r”);
fscanf (fifo, “%s”, buffer);
fclose (fifo);
06 0430 CH05 5/22/01 10:23 AM Page 116
A FIFO can have multiple readers or multiple writers. Bytes from each writer are
written atomically up to a maximum size of PIPE_BUF (4KB on Linux). Chunks from
simultaneous writers can be interleaved. Similar rules apply to simultaneous reads.
5.5 Sockets
A socket is a bidirectional communication device that can be used to communicate with
another process on the same machine or with a process running on other machines.
Sockets are the only interprocess communication we’ll discuss in this chapter that
permit communication between processes on different computers. Internet programs
such as Telnet, rlogin, FTP, talk, and the World Wide Web use sockets.
For example, you can obtain the WWW page from a Web server using the
Telnet program because they both use sockets for network communications.4
To open a connection to a WWW server at www.codesourcery.com, use
telnet www.codesourcery.com 80.The magic constant 80 specifies a connection to
the Web server programming running www.codesourcery.com instead of some other
process.Try typing GET / after the connection is established.This sends a message
through the socket to the Web server, which replies by sending the home page’s
HTML source and then closing the connection—for example:
% telnet www.codesourcery.com 80
Trying 206.168.99.1...
Connected to merlin.codesourcery.com (206.168.99.1).
Escape character is ‘^]’.
GET /
<html>
<head>
<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1”>
...
3. Note that only Windows NT can create a named pipe;Windows 9x programs can form
only client connections.
4. Usually, you’d use telnet to connect a Telnet server for remote logins. But you can also use
telnet to connect to a server of a different kind and then type comments directly at it.
06 0430 CH05 5/22/01 10:23 AM Page 117
Calling connect
To create a connection between two sockets, the client calls connect, specifying the
address of a server socket to connect to. A client is the process initiating the connec-
tion, and a server is the process waiting to accept connections.The client calls connect
to initiate a connection from a local socket to the server socket specified by the
second argument.The third argument is the length, in bytes, of the address structure
pointed to by the second argument. Socket address formats differ according to the
socket namespace.
Sending Information
Any technique to write to a file descriptor can be used to write to a socket. See
Appendix B for a discussion of Linux’s low-level I/O functions and some of the issues
surrounding their use.The send function, which is specific to the socket file descrip-
tors, provides an alternative to write with a few additional choices; see the man page
for information.
5.5.3 Servers
A server’s life cycle consists of the creation of a connection-style socket, binding an
address to its socket, placing a call to listen that enables connections to the socket,
placing calls to accept incoming connections, and then closing the socket. Data isn’t
read and written directly via the server socket; instead, each time a program accepts a
new connection, Linux creates a separate socket to use in transferring data over that
connection. In this section, we introduce bind, listen, and accept.
06 0430 CH05 5/22/01 10:23 AM Page 119
An address must be bound to the server’s socket using bind if a client is to find it.
Its first argument is the socket file descriptor.The second argument is a pointer to a
socket address structure; the format of this depends on the socket’s address family.The
third argument is the length of the address structure, in bytes.When an address is
bound to a connection-style socket, it must invoke listen to indicate that it is a
server. Its first argument is the socket file descriptor.The second argument specifies
how many pending connections are queued. If the queue is full, additional connec-
tions will be rejected.This does not limit the total number of connections that a server
can handle; it limits just the number of clients attempting to connect that have not yet
been accepted.
A server accepts a connection request from a client by invoking accept. The first
argument is the socket file descriptor.The second argument points to a socket address
structure, which is filled with the client socket’s address.The third argument is the
length, in bytes, of the socket address structure.The server can use the client address to
determine whether it really wants to communicate with the client.The call to accept
creates a new socket for communicating with the client and returns the corresponding
file descriptor.The original server socket continues to accept new client connections.
To read data from a socket without removing it from the input queue, use recv. It
takes the same arguments as read, plus an additional FLAGS argument. A flag of
MSG_PEEK causes data to be read but not removed from the input queue.
Call unlink to remove a local socket when you’re done with it.
06 0430 CH05 5/22/01 10:23 AM Page 120
/* Read text from the socket and print it out. Continue until the
socket closes. Return nonzero if the client sent a “quit”
message, zero otherwise. */
/* First, read the length of the text message from the socket. If
read returns zero, the client closed the connection. */
if (read (client_socket, &length, sizeof (length)) == 0)
return 0;
/* Allocate a buffer to hold the text. */
text = (char*) malloc (length);
/* Read the text itself, and print it. */
int socket_fd;
struct sockaddr_un name;
int client_sent_quit_message;
/* Accept a connection. */
client_socket_fd = accept (socket_fd, &client_name, &client_name_len);
/* Handle the connection. */
client_sent_quit_message = server (client_socket_fd);
/* Close our end of the connection. */
close (client_socket_fd);
}
while (!client_sent_quit_message);
return 0;
}
The client program, in Listing 5.11, connects to a local namespace socket and sends
a message.The name path to the socket and the message are specified on the
command line.
continues
06 0430 CH05 5/22/01 10:23 AM Page 122
Before the client sends the message text, it sends the length of that text by sending the
bytes of the integer variable length. Likewise, the server reads the length of the text by
reading from the socket into an integer variable.This allows the server to allocate an
appropriately sized buffer to hold the message text before reading it from the socket.
To try this example, start the server program in one window. Specify a path to a
socket—for example, /tmp/socket.
% ./socket-server /tmp/socket
In another window, run the client a few times, specifying the same socket path plus
messages to send to the client:
% ./socket-client /tmp/socket “Hello, world.”
% ./socket-client /tmp/socket “This is a test.”
06 0430 CH05 5/22/01 10:23 AM Page 123
The server program receives and prints these messages.To close the server, send the
message “quit” from a client:
% ./socket-client /tmp/socket “quit”
DNS Names
Because it is easier to remember names than numbers, the Domain Name Service (DNS) associates names
such as www.codesourcery.com with computers’ unique IP numbers. DNS is implemented by a world-
wide hierarchy of name servers, but you don’t need to understand DNS protocols to use Internet host
names in your programs.
Internet socket addresses contain two parts: a machine and a port number.This infor-
mation is stored in a struct sockaddr_in variable. Set the sin_family field to AF_INET
to indicate that this is an Internet namespace address.The sin_addr field stores the
Internet address of the desired machine as a 32-bit integer IP number. A port number
distinguishes a given machine’s different sockets. Because different machines store
multibyte values in different byte orders, use htons to convert the port number to
network byte order. See the man page for ip for more information.
To convert human-readable hostnames, either numbers in standard dot notation
(such as 10.0.0.1) or DNS names (such as www.codesourcery.com) into 32-bit IP
numbers, you can use gethostbyname.This returns a pointer to the struct hostent
structure; the h_addr field contains the host’s IP number. See the sample program in
Listing 5.12.
Listing 5.12 illustrates the use of Internet-domain sockets.The program obtains the
home page from the Web server whose hostname is specified on the command line.
06 0430 CH05 5/22/01 10:23 AM Page 124
/* Print the contents of the home page for the server’s socket.
Return an indication of success. */
return 0;
}
This program takes the hostname of the Web server on the command line (not a
URL—that is, without the “http://”). It calls gethostbyname to translate the hostname
into a numerical IP address and then connects a stream (TCP) socket to port 80 on
that host.Web servers speak the Hypertext Transport Protocol (HTTP), so the program
issues the HTTP GET command and the server responds by sending the text of the
home page.
For example, to retrieve the home page from the Web site www.codesourcery.com,
invoke this:
% ./socket-inet www.codesourcery.com
<html>
<meta http-equiv=”Content-Type” content=”text/html; charset=iso-8859-1”>
...
Its first three parameters are the same as those of the socket call:They specify the
domain, connection style, and protocol.The last parameter is a two-integer array,
which is filled with the file descriptions of the two sockets, similar to pipe.When you
call socketpair, you must specify PF_LOCAL as the domain.
07 0430 PT02 5/22/01 10:34 AM Page 127
II
Mastering Linux
6 Devices
7 The /proc File System
8 Linux System Calls
9 Inline Assembly Code
10 Security
11 A Sample GNU/Linux Application
07 0430 PT02 5/22/01 10:34 AM Page 128
08 0430 CH06 5/22/01 10:29 AM Page 129
6
Devices
L INUX, LIKE MOST OPERATING SYSTEMS, INTERACTS WITH HARDWARE devices via
modularized software components called device drivers. A device driver hides the pecu-
liarities of a hardware device’s communication protocols from the operating system
and allows the system to interact with the device through a standardized interface.
Under Linux, device drivers are part of the kernel and may be either linked stati-
cally into the kernel or loaded on demand as kernel modules. Device drivers run as
part of the kernel and aren’t directly accessible to user processes. However, Linux pro-
vides a mechanism by which processes can communicate with a device driver—and
through it with a hardware device—via file-like objects.These objects appear in the
file system, and programs can open them, read from them, and write to them practi-
cally as if they were normal files. Using either Linux’s low-level I/O operations (see
Appendix B, “Low-Level I/O”) or the standard C library’s I/O operations, your pro-
grams can communicate with hardware devices through these file-like objects.
Linux also provides several file-like objects that communicate directly with the
kernel rather than with device drivers.These aren’t linked to hardware devices; instead,
they provide various kinds of specialized behavior that can be of use to application and
system programs.
08 0430 CH06 5/22/01 10:29 AM Page 130
two different drivers, one a character device and one a block device. Minor device
numbers distinguish individual devices or components controlled by a single driver.
The meaning of a minor device number depends on the device driver.
For example, major device no. 3 corresponds to the primary IDE controller on the
system. An IDE controller can have two devices (disk, tape, or CD-ROM drives)
attached to it; the “master” device has minor device no. 0, and the “slave” device has
minor device no. 64. Individual partitions on the master device (if the device supports
partitions) are represented by minor device numbers 1, 2, 3, and so on. Individual parti-
tions on the slave device are represented by minor device numbers 65, 66, 67, and so on.
Major device numbers are listed in the Linux kernel sources documentation.
On many GNU/Linux distributions, this documentation can be found in
/usr/src/linux/Documentation/devices.txt.The special entry /proc/devices lists
major device numbers corresponding to active device drivers currently loaded into the
kernel. (See Chapter 7, “The /proc File System,” for more information about /proc
file system entries.)
Remember that only superuser processes can create block and character devices, so
you must be logged in as root to invoke this command successfully.
The ls command displays device entries specially. If you invoke ls with the -l or
-o options, the first character on each line of output specifies the type of the entry.
Recall that - (a hyphen) designates a normal file, while d designates a directory.
Similarly, b designates a block device, and c designates a character device. For the latter
two, ls prints the major and minor device numbers where it would the size of an
ordinary file. For example, we can display the block device that we just created:
% ls -l lp0
crw-r----- 1 root root 6, 0 Mar 7 17:03 lp0
In a program, you can determine whether a file system entry is a block or character
device and then retrieve its device numbers using stat. See Section B.2, “stat,” in
Appendix B, for instructions.
To remove the entry, use rm.This doesn’t remove the device or device driver; it
simply removes the device entry from the file system.
% rm ./lp0
Similarly, /dev has an entry for the parallel port character device that we used
previously:
% ls -l /dev/lp0
crw-rw---- 1 root daemon 6, 0 May 5 1998 /dev/lp0
In most cases, you should not use mknod to create your own device entries. Use the
entries in /dev instead. Non-superuser programs have no choice but to use preexisting
device entries because they cannot create their own.Typically, only system administra-
tors and developers working with specialized hardware devices will need to create
device entries. Most GNU/Linux distributions include facilities to help system
administrators create standard device entries with the correct names.
08 0430 CH06 5/22/01 10:29 AM Page 133
You must have permission to write to the device entry for this to succeed; on many
GNU/Linux systems, the permissions are set so that only root and the system’s printer
daemon (lpd) can write to the file. Also, what comes out of your printer depends on
how your printer interprets the contents of the data you send it. Some printers will
print plain text files that are sent to them,2 while others will not. PostScript printers
will render and print PostScript files that you send to them.
In a program, sending data to a device is just as simple. For example, this code frag-
ment uses low-level I/O functions to send the contents of a buffer to /dev/lp0.
int fd = open (“/dev/lp0”, O_WRONLY);
write (fd, buffer, buffer_length);
close (fd);
continues
1. Windows users will recognize that this device is similar to the magic Windows file LPR1.
2. Your printer may require explicit carriage return characters, ASCII code 14, at the end of
each line, and may require a form feed character, ASCII code 12, at the end of each page.
08 0430 CH06 5/22/01 10:29 AM Page 134
You can access certain hardware components through more than one character device;
often, the different character devices provide different semantics. For example, when
you use the IDE tape device /dev/ht0, Linux automatically rewinds the tape in the
drive when you close the file descriptor.You can use the device /dev/nht0 to access
the same tape drive, except that Linux will not automatically rewind the tape when
you close the file descriptor.You sometimes might see programs using /dev/cua0 and
similar devices; these are older interfaces to serial ports such as /dev/ttyS0.
08 0430 CH06 5/22/01 10:29 AM Page 135
If you need to authenticate users in your program, you should learn about
GNU/Linux’s PAM facility. See Section 10.5, “Authenticating Users,” in
Chapter 10, “Security,” for more information.
n A program can play sounds through the system’s sound card by sending audio
data to /dev/audio. Note that the audio data must be in Sun audio format (usu-
ally associated with the .au extension).
For example, many GNU/Linux distributions come with the classic sound file
/usr/share/sndconfig/sample.au. If your system includes this file, try playing it
by invoking the following:
% cat /usr/share/sndconfig/sample.au > /dev/audio
If you’re planning on using sound in your program, though, you should investi-
gate the various sound libraries and services available for GNU/Linux.The
Gnome windowing environment uses the Enlightenment Sound Daemon
(EsounD), at http://www.tux.org/~ricdude/EsounD.html. KDE uses aRts, at
http://space.twc.de/~stefan/kde/arts-mcop-doc/. If you use one of these
sound systems instead of writing directly to /dev/audio, your program will
cooperate better with other programs that use the computer’s sound card.
3. On most GNU/Linux systems, you can switch to the first virtual terminal by pressing
Ctrl+Alt+F1. Use Ctrl+Alt+F2 for the second virtual terminal, and so on.
08 0430 CH06 5/22/01 10:29 AM Page 136
6.5.1 /dev/null
The entry /dev/null, the null device, is very handy. It serves two purposes; you are
probably familiar at least with the first one:
n Linux discards any data written to /dev/null. A common trick is to specify
/dev/null as an output file in some context where the output is unwanted.
For example, to run a command and discard its standard output (without print-
ing it or writing it to a file), redirect standard output to /dev/null:
% verbose_command > /dev/null
6.5.2 /dev/zero
The device entry /dev/zero behaves as if it were an infinitely long file filled with 0
bytes. As much data as you’d try to read from /dev/zero, Linux “generates” enough 0
bytes.
To illustrate this, let’s run the hex dump program presented in Listing B.4 in
Section B.1.4, “Reading Data,” of Appendix B.This program prints the contents of a
file in hexadecimal form.
% ./hexdump /dev/zero
0x000000 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000010 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000020 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
6.5.3 /dev/full
The entry /dev/full behaves as if it were a file on a file system that has no more
room. A write to /dev/full fails and sets errno to ENOSPC, which ordinarily indicates
that the written-to device is full.
For example, you can try to write to /dev/full using the cp command:
% cp /etc/fstab /dev/full
cp: /dev/full: No space left on device
The /dev/full entry is primarily useful to test how your program behaves if it runs
out of disk space while writing to a file.
4.We use od here instead of the hexdump program presented in Listing B.4, even though they
do pretty much the same thing, because hexdump terminates when it runs out of data, while od
waits for more data to become available.The -t x1 option tells od to print file contents in
hexadecimal.
08 0430 CH06 5/22/01 10:29 AM Page 138
% od -t x1 /dev/random
0000000 2c 9c 7a db 2e 79 3d 65 36 c2 e3 1b 52 75 1e 1a
0000020 d3 6d 1e a7 91 05 2d 4d c3 a6 de 54 29 f4 46 04
0000040 b3 b0 8d 94 21 57 f3 90 61 dd 26 ac 94 c3 b9 3a
0000060 05 a3 02 cb 22 0a bc c9 45 dd a6 59 40 22 53 d4
The number of lines of output that you see will vary—there may be quite a few—but
the output will eventually pause when Linux exhausts its store of randomness. Now
try moving your mouse or typing on the keyboard, and watch additional random
numbers appear. For even better randomness, let your cat walk on the keyboard.
A read from /dev/urandom, in contrast, will never block. If Linux runs out of ran-
domness, it uses a cryptographic algorithm to generate pseudorandom bytes from the
past sequence of random bytes. Although these bytes are random enough for many
purposes, they don’t pass as many tests of randomness as those obtained from
/dev/random.
For instance, if you invoke the following, the random bytes will fly by forever, until
you kill the program with Ctrl+C:
% od -t x1 /dev/urandom
0000000 62 71 d6 3e af dd de 62 c0 42 78 bd 29 9c 69 49
0000020 26 3b 95 bc b9 6c 15 16 38 fd 7e 34 f0 ba ce c3
0000040 95 31 e5 2c 8d 8a dd f4 c4 3b 9b 44 2f 20 d1 54
...
Using random numbers from /dev/random in a program is easy, too. Listing 6.1
presents a function that generates a random number using bytes read from in
/dev/random. Remember that /dev/random blocks a read until there is enough ran-
domness available to satisfy it; you can use /dev/urandom instead if fast execution is
more important and you can live with the potential lower quality of random numbers.
char* next_random_byte;
int bytes_to_read;
unsigned random_value;
To construct a virtual file system and mount it with a loopback device, follow
these steps:
1. Create an empty file to hold the virtual file system.The size of the file will be
the apparent size of the loopback device after it is mounted.
One convenient way to construct a file of a fixed size is with the dd command.
This command copies blocks (by default, 512 bytes each) from one file to
another.The /dev/zero file is a convenient source of bytes to copy from.
To construct a 10MB file named disk-image, invoke the following:
% dd if=/dev/zero of=/tmp/disk-image count=20480
20480+0 records in
20480+0 records out
% ls -l /tmp/disk-image
-rw-rw---- 1 root root 10485760 Mar 8 01:56 /tmp/disk-image
2. The file that you’ve just created is filled with 0 bytes. Before you mount it, you
must construct a file system.This sets up the various control structures needed to
organize and store files, and builds the root directory.
You can build any type of file system you like in your disk image.To construct
an ext2 file system (the type most commonly used for Linux disks), use the
mke2fs command. Because it’s usually run on a block device, not an ordinary
file, it asks for confirmation:
% mke2fs -q /tmp/disk-image
mke2fs 1.18, 11-Nov-1999 for EXT2 FS 0.5b, 95/08/09
disk-image is not a block special device.
Proceed anyway? (y,n) y
The -q option suppresses summary information about the newly created file
system. Leave this option out if you’re curious about it.
Now disk-image contains a brand-new file system, as if it were a freshly
initialized 10MB disk drive.
08 0430 CH06 5/22/01 2:37 PM Page 141
3. Mount the file system using a loopback device.To do this, use the mount
command, specifying the disk image file as the mount device. Also specify
loop=loopback-device as a mount option, using the -o option to mount to tell
mount which loopback device to use.
For example, to mount our disk-image file system, invoke these commands.
Remember, only the superuser may use a loopback device.The first command
creates a directory, /tmp/virtual-fs, to use as the mount point for the virtual
file system.
% mkdir /tmp/virtual-fs
% mount -o loop=/dev/loop0 /tmp/disk-image /tmp/virtual-fs
Now your disk image is mounted as if it were an ordinary 10MB disk drive.
% df -h /tmp/virtual-fs
Filesystem Size Used Avail Use% Mounted on
/tmp/disk-image 9.7M 13k 9.2M 0% /tmp/virtual-fs
5. If the file system is ever damaged, and some data is recovered but not associated with a
file, it is placed in lost+found.
08 0430 CH06 5/22/01 2:37 PM Page 142
You can delete disk-image if you like, or you can mount it later to access the
files on the virtual file system.You can also copy it to another computer and
mount it there—the whole file system that you created on it will be intact.
Instead of creating a file system from scratch, you can copy one directly from a device.
For instance, you can create an image of the contents of a CD-ROM simply by
copying it from the CD-ROM device.
If you have an IDE CD-ROM drive, use the corresponding device name, such as
/dev/hda, described previously. If you have a SCSI CD-ROM drive, the device name
will be /dev/scd0 or similar.Your system may also have a symbolic link /dev/cdrom
that points to the appropriate device. Consult your /etc/fstab file to determine what
device corresponds to your computer’s CD-ROM drive.
Simply copy that device to a file.The resulting file will be a complete disk image of
the file system on the CD-ROM in the drive—for example:
% cp /dev/cdrom /tmp/cdrom-image
This may take several minutes, depending on the CD-ROM you’re copying and the
speed of your drive.The resulting image file will be quite large—as large as the con-
tents of the CD-ROM.
Now you can mount this CD-ROM image without having the original CD-ROM
in the drive. For example, to mount it on /mnt/cdrom, use this line:
% mount -o loop=/dev/loop0 /tmp/cdrom-image /mnt/cdrom
Because the image is on a hard disk drive, it’ll perform much faster than the actual
CD-ROM disk. Note that most CD-ROMs use the file system type iso9660.
6.6 PTYs
If you run the mount command with no command-line arguments, which displays
the file systems mounted on your system, you’ll notice a line that looks something
like this:
none on /dev/pts type devpts (rw,gid=5,mode=620)
This indicates that a special type of file system, devpts, is mounted at /dev/pts.This
file system, which isn’t associated with any hardware device, is a “magic” file system
that is created by the Linux kernel. It’s similar to the /proc file system; see Chapter 7
for more information about how this works.
08 0430 CH06 5/22/01 10:29 AM Page 143
Like the /dev directory, /dev/pts contains entries corresponding to devices. But
unlike /dev, which is an ordinary directory, /dev/pts is a special directory that is cre-
ated dynamically by the Linux kernel.The contents of the directory vary with time
and reflect the state of the running system.
The entries in /dev/pts correspond to pseudo-terminals (or pseudo-TTYs, or PTYs).
Linux creates a PTY for every new terminal window you open and displays a corre-
sponding entry in /dev/pts.The PTY device acts like a terminal device—it accepts
input from the keyboard and displays text output from the programs that run in it.
PTYs are numbered, and the PTY number is the name of the corresponding entry in
/dev/pts.
You can display the terminal device associated with a process using the ps com-
mand. Specify tty as one of the fields of a custom format with the -o option.To dis-
play the process ID,TTY, and command line of each process sharing the same
terminal, invoke ps -o pid,tty,cmd.
Note that it is a character device, and its owner is the owner of the process for which
it was created.
You can read from or write to the PTY device. If you read from it, you’ll hijack
keyboard input that would otherwise be sent to the program running in the PTY. If
you write to it, the data will appear in that window.
Try opening a new terminal window, and determine its PTY number by invoking
ps -o pid,tty,cmd. From another window, write some text to the PTY device. For
example, if the new terminal window’s PTY number is 7, invoke this command from
another window:
% echo ‘Hello, other window!’ > /dev/pts/7
The output appears in the new terminal window. If you close the new terminal win-
dow, the entry 7 in /dev/pts disappears.
08 0430 CH06 5/22/01 10:29 AM Page 144
If you invoke ps to determine the TTY from a text-mode virtual terminal (press
Ctrl+Alt+F1 to switch to the first virtual terminal, for instance), you’ll see that it’s
running in an ordinary terminal device instead of a PTY:
% ps -o pid,tty,cmd
PID TT CMD
29325 tty1 -bash
29353 tty1 ps -o pid,tty,cmd
6.7 ioctl
The ioctl system call is an all-purpose interface for controlling hardware devices.The
first argument to ioctl is a file descriptor, which should be opened to the device that
you want to control.The second argument is a request code that indicates the opera-
tion that you want to perform.Various request codes are available for different devices.
Depending on the request code, there may be additional arguments supplying data
to ioctl.
Many of the available requests codes for various devices are listed in the ioctl_list
man page. Using ioctl generally requires a detailed understanding of the device driver
corresponding to the hardware device that you want to control. Most of these are
quite specialized and are beyond the scope of this book. However, we’ll present one
example to give you a taste of how ioctl is used.
return 0;
}
08 0430 CH06 5/22/01 10:29 AM Page 145
Listing 6.2 presents a short program that ejects the disk in a CD-ROM drive (if the
drive supports this). It takes a single command-line argument, the CD-ROM drive
device. It opens a file descriptor to the device and invokes ioctl with the request
code CDROMEJECT.This request, defined in the header <linux/cdrom.h>, instructs the
device to eject the disk.
For example, if your system has an IDE CD-ROM drive connected as the master
device on the secondary IDE controller, the corresponding device is /dev/hdc.To eject
the disk from the drive, invoke this line:
% ./cdrom-eject /dev/hdc
08 0430 CH06 5/22/01 10:29 AM Page 146
09 0430 CH07 5/22/01 10:30 AM Page 147
7
The /proc File System
This is the special /proc file system. Notice that the first field, none, indicates that this
file system isn’t associated with a hardware device such as a disk drive. Instead, /proc
is a window into the running Linux kernel. Files in the /proc file system don’t corre-
spond to actual files on a physical device. Instead, they are magic objects that behave
like files but provide access to parameters, data structures, and statistics in the kernel.
The “contents” of these files are not always fixed blocks of data, as ordinary file con-
tents are. Instead, they are generated on the fly by the Linux kernel when you read
from the file.You can also change the configuration of the running kernel by writing
to certain files in the /proc file system.
Let’s look at an example:
% ls -l /proc/version
-r--r--r-- 1 root root 0 Jan 17 18:09 /proc/version
Note that the file size is zero; because the file’s contents are generated by the kernel,
the concept of file size is not applicable. Also, if you try this command yourself, you’ll
notice that the modification time on the file is the current time.
09 0430 CH07 5/22/01 10:30 AM Page 148
What’s in this file? The contents of /proc/version consist of a string describing the
Linux kernel version number. It contains the version information that would be
obtained by the uname system call, described in Chapter 8,“Linux System Calls,” in
Section 8.15,“uname,” plus additional information such as the version of the compiler
that was used to compile the kernel.You can read from /proc/version like you would
any other file. For instance, an easy way to display its contents is with the cat command.
% cat /proc/version
Linux version 2.2.14-5.0 (root@porky.devel.redhat.com) (gcc version egcs-2.91.
66 19990314/Linux (egcs-1.1.2 release)) #1 Tue Mar 7 21:07:39 EST 2000
The various entries in the /proc file system are described extensively in the proc man
page (Section 5).To view it, invoke this command:
% man 5 proc
In this chapter, we’ll describe some of the features of the /proc file system that are
most likely to be useful to application programmers, and we’ll give examples of using
them. Some of the features of /proc are handy for debugging, too.
If you’re interested in exactly how /proc works, take a look at the source code in
the Linux kernel sources, under /usr/src/linux/fs/proc/.
We’ll describe the interpretation of some of these fields in Section 7.3.1, “CPU
Information.”
A simple way to extract a value from this output is to read the file into a buffer and
parse it in memory using sscanf. Listing 7.1 shows an example of this.The program
includes the function get_cpu_clock_speed that reads from /proc/cpuinfo into
memory and extracts the first CPU’s clock speed.
float get_cpu_clock_speed ()
{
FILE* fp;
char buffer[1024];
size_t bytes_read;
char* match;
float clock_speed;
int main ()
{
printf (“CPU clock speed: %4.0f MHz\n”, get_cpu_clock_speed ());
return 0;
}
09 0430 CH07 5/22/01 10:30 AM Page 150
Be aware, however, that the names, semantics, and output formats of entries in the
/proc file system might change in new Linux kernel revisions. If you use them in a
program, you should make sure that the program’s behavior degrades gracefully if the
/proc entry is missing or is formatted unexpectedly.
1. On some UNIX systems, the process IDs are padded with zeros. On GNU/Linux, they
are not.
2.The chroot call and command are outside the scope of this book. See the chroot man page
in Section 1 for information about the command (invoke man 1 chroot), or the chroot man
page in Section 2 (invoke man 2 chroot) for information about the call.
09 0430 CH07 5/22/01 10:30 AM Page 151
n stat contains lots of status and statistical information about the process.These
are the same data as presented in the status entry, but in raw numerical format,
all on a single line.The format is difficult to read but might be more suitable for
parsing by programs.
If you want to use the stat entry in your programs, see the proc man page,
which describes its contents, by invoking man 5 proc.
n statm contains information about the memory used by the process.The statm
entry is described in Section 7.2.6, “Process Memory Statistics.”
n status contains lots of status and statistical information about the process,
formatted to be comprehensible by humans. Section 7.2.7, “Process Statistics,”
contains a description of the status entry.
n The cpu entry appears only on SMP Linux kernels. It contains a breakdown of
process time (user and system) by CPU.
Note that for security reasons, the permissions of some entries are set so that only the
user who owns the process (or the superuser) can access them.
7.2.1 /proc/self
One additional entry in the /proc file system makes it easy for a program to use /proc
to find information about its own process.The entry /proc/self is a symbolic link to
the /proc directory corresponding to the current process.The destination of the
/proc/self link depends on which process looks at it: Each process sees its own
process directory as the target of the link.
For example, the program in Listing 7.2 reads the target of the /proc/self link to
determine its process ID. (We’re doing it this way for illustrative purposes only; calling
the getpid function, described in Chapter 3, “Processes,” in Section 3.1.1, “Process
IDs,” is a much easier way to do the same thing.) This program uses the readlink sys-
tem call, described in Section 8.11, “readlink: Reading Symbolic Links,” to extract
the target of the symbolic link.
pid_t get_pid_from_proc_self ()
{
char target[32];
int pid;
/* Read the target of the symbolic link. */
readlink (“/proc/self”, target, sizeof (target));
continues
09 0430 CH07 5/22/01 10:30 AM Page 152
int main ()
{
printf (“/proc/self reports process id %d\n”,
(int) get_pid_from_proc_self ());
printf (“getpid() reports process id %d\n”, (int) getpid ());
return 0;
}
In Section 2.1.1, we presented a program in Listing 2.1 that printed out its own argu-
ment list. Using the cmdline entries in the /proc file system, we can implement a pro-
gram that prints the argument of another process. Listing 7.3 is such a program; it
prints the argument list of the process with the specified process ID. Because there
may be several NULs in the contents of cmdline rather than a single one at the end,
we can’t determine the length of the string with strlen (which simply counts the
number of characters until it encounters a NUL). Instead, we determine the length of
cmdline from read, which returns the number of bytes that were read.
09 0430 CH07 5/22/01 10:30 AM Page 153
For example, suppose that process 372 is the system logger daemon, syslogd.
% ps 372
PID TTY STAT TIME COMMAND
372 ? S 0:00 syslogd -m 0
% ./print-arg-list 372
syslogd
-m
0
Listing 7.5 (get-exe-path.c) Get the Path of the Currently Running Program
Executable
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
continues
09 0430 CH07 5/22/01 10:30 AM Page 156
int main ()
{
char path[PATH_MAX];
get_executable_path (path, sizeof (path));
printf (“this program is in the directory %s\n”, path);
return 0;
}
In this case, the shell (bash) is running in process 1261. Now open a second window,
and look at the contents of the fd subdirectory for that process.
% ls -l /proc/1261/fd
total 0
lrwx------ 1 samuel samuel 64 Jan 30 01:02 0 -> /dev/pts/4
lrwx------ 1 samuel samuel 64 Jan 30 01:02 1 -> /dev/pts/4
lrwx------ 1 samuel samuel 64 Jan 30 01:02 2 -> /dev/pts/4
(There may be other lines of output corresponding to other open file descriptors as
well.) Recall that we mentioned in Section 2.1.4, “Standard I/O,” that file descriptors
0, 1, and 2 are initialized to standard input, output, and error, respectively.Thus, by
writing to /proc/1261/fd/1, you can write to the device attached to stdout for the
shell process—in this case, the pseudo TTY in the first window. In the second win-
dow, try writing a message to that file:
% echo “Hello, world.” >> /proc/1261/fd/1
Notice the entry for file descriptor 3, linked to the file /etc/fstab opened on this
descriptor.
File descriptors can be opened on sockets or pipes, too (see Chapter 5 for more
information about these). In such a case, the target of the symbolic link corresponding
to the file descriptor will state “socket” or “pipe” instead of pointing to an ordinary
file or device.
3. See the IA-32 Intel Architecture Software Developer’s Manual for documentation about MMX
instructions, and see Chapter 9, “Inline Assembly Code,” in this book for information on how to
use these and other special assembly instructions in GNU/Linux programs.
4. Note that under DOS and Windows, serial ports are numbered from 1, so COM1 corresponds
to serial port number 0 under Linux.
09 0430 CH07 5/22/01 10:30 AM Page 160
For example, this line from /proc/tty/driver/serial might describe serial port 1
(which would be COM2 under Windows):
1: uart:16550A port:2F8 irq:3 baud:9600 tx:11 rx:0
This indicates that the serial port is run by a 16550A-type UART, uses I/O port 0x2f8
and IRQ 3 for communication, and runs at 9,600 baud.The serial port has seen 11
transmit interrupts and 0 receive interrupts.
See Section 6.4, “Hardware Devices,” for information about serial devices.
This indicates that the system is running a 2.2.14 release of the Linux kernel, which
was compiled with EGCS release 1.1.2. (EGCS, the Experimental GNU Compiler
System, was a precursor to the current GCC project.)
The most important items in this output, the OS name and kernel version
and revision, are available in separate /proc entries as well.These are /proc/sys/
kernel/ostype, /proc/sys/kernel/osrelease, and /proc/sys/kernel/version,
respectively.
% cat /proc/sys/kernel/ostype
Linux
% cat /proc/sys/kernel/osrelease
2.2.14-5.0
% cat /proc/sys/kernel/version
#1 Tue Mar 7 21:07:39 EST 2000
This shows 512MB physical memory, of which about 9MB is free, and 258MB of
swap space, of which 216MB is free. In the row corresponding to physical memory,
three other values are presented:
n The Shared column displays total shared memory currently allocated on the sys-
tem (see Section 5.1, “Shared Memory”).
n The Buffers column displays the memory allocated by Linux for block device
buffers.These buffers are used by device drivers to hold blocks of data being
read from and written to disk.
n The Cached column displays the memory allocated by Linux to the page cache.
This memory is used to cache accesses to mapped files.
You can use the free command to display the same memory information.
Table 7.1 Full Paths Corresponding to the Four Possible IDE Devices
Controller Device Subdirectory
Primary Master /proc/ide/ide0/hda/
See Section 6.4, “Hardware Devices,” for more information about IDE device names.
Each IDE device directory contains several entries providing access to identification
and configuration information for the device. A few of the most useful are listed here:
n model contains the device’s model identification string.
n media contains the device’s media type. Possible values are disk, cdrom, tape,
floppy, and UNKNOWN.
n capacity contains the device’s capacity, in 512-byte blocks. Note that for CD-
ROM devices, the value will be 231 –1, not the capacity of the disk in the drive.
Note that the value in capacity represents the capacity of the entire physical
disk; the capacity of file systems contained in partitions of the disk will be
smaller.
For example, these commands show how to determine the media type and device
identification for the master device on the secondary IDE controller. In this case, it
turns out to be a Toshiba CD-ROM drive.
% cat /proc/ide/ide1/hdc/media
cdrom
% cat /proc/ide/ide1/hdc/model
TOSHIBA CD-ROM XM-6702B
5. If properly configured, the Linux kernel can support additional IDE controllers.These are
numbered sequentially from ide2.
09 0430 CH07 5/22/01 10:30 AM Page 163
7.5.3 Mounts
The /proc/mounts file provides a summary of mounted file systems. Each line corre-
sponds to a single mount descriptor and lists the mounted device, the mount point, and
other information. Note that /proc/mounts contains the same information as the ordi-
nary file /etc/mtab, which is automatically updated by the mount command.
These are the elements of a mount descriptor:
n The first element on the line is the mounted device (see Chapter 6). For special
file systems such as the /proc file system, this is none.
n The second element is the mount point, the place in the root file system at which
the file system contents appear. For the root file system itself, the mount point is
listed as /. For swap drives, the mount point is listed as swap.
09 0430 CH07 5/22/01 10:30 AM Page 164
n The third element is the file system type. Currently, most GNU/Linux systems
use the ext2 file system for disk drives, but DOS or Windows drives may be
mounted with other file system types, such as fat or vfat. Most CD-ROMs
contain an iso9660 file system. See the man page for the mount command for a
list of file system types.
n The fourth element lists mount flags.These are options that were specified when
the mount was added. See the man page for the mount command for an expla-
nation of flags for the various file system types.
In /proc/mounts, the
last two elements are always 0 and have no meaning.
See the man page for fstab for details about the format of mount descriptors.6
GNU/Linux includes functions to help you parse mount descriptors; see the man
page for the getmntent function for information on using these.
7.5.4 Locks
Section 8.3, “fcntl: Locks and Other File Operations,” describes how to use the fcntl
system call to manipulate read and write locks on files.The /proc/locks entry
describes all the file locks currently outstanding in the system. Each row in the output
corresponds to one lock.
For locks created with fcntl, the first two entries on the line are POSIX ADVISORY.
The third is WRITE or READ, depending on the lock type.The next number is the
process ID of the process holding the lock.The following three numbers, separated by
colons, are the major and minor device numbers of the device on which the file
resides and the inode number, which locates the file in the file system.The remainder
of the line lists values internal to the kernel that are not of general utility.
Turning the contents of /proc/locks into useful information takes some detective
work.You can watch /proc/locks in action, for instance, by running the program in
Listing 8.2 to create a write lock on the file /tmp/test-file.
% touch /tmp/test-file
% ./lock-file /tmp/test-file
file /tmp/test-file
opening /tmp/test-file
locking
locked; hit enter to unlock...
6.The /etc/fstab file lists the static mount configuration of the GNU/Linux system.
09 0430 CH07 5/22/01 10:30 AM Page 165
There may be other lines of output, too, corresponding to locks held by other pro-
grams. In this case, 5467 is the process ID of the lock-file program. Use ps to figure
out what this process is running.
% ps 5467
PID TTY STAT TIME COMMAND
5467 pts/28 S 0:00 ./lock-file /tmp/test-file
The locked file, /tmp/test-file, resides on the device that has major and minor
device numbers 8 and 5, respectively.These numbers happen to correspond to
/dev/sda5.
% df /tmp
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/sda5 8459764 5094292 2935736 63% /
% ls -l /dev/sda5
brw-rw---- 1 root disk 8, 5 May 5 1998 /dev/sda5
See Section 6.2, “Device Numbers,” for more information about device numbers.
The program in Listing 7.7 extracts the uptime and idle time from the system and dis-
plays them in friendly units.
Listing 7.7 (print-uptime.c) Print the System Uptime and Idle Time
#include <stdio.h>
continues
09 0430 CH07 5/22/01 10:30 AM Page 166
int main ()
{
FILE* fp;
double uptime, idle_time;
/* Read the system uptime and accumulated idle time from /proc/uptime. */
fp = fopen (“/proc/uptime”, “r”);
fscanf (fp, “%lf %lf\n”, &uptime, &idle_time);
fclose (fp);
/* Summarize it. */
print_time (“uptime “, (long) uptime);
print_time (“idle time”, (long) idle_time);
return 0;
}
The uptime command and the sysinfo system call (see Section 8.14, “sysinfo:
Obtaining System Statistics”) also can obtain the system’s uptime.The uptime
command also displays the load averages found in /proc/loadavg.
10 0430 Ch08 5/22/01 10:33 AM Page 167
8
Linux System Calls
S O FAR, WE’VE PRESENTED A VARIETY OF FUNCTIONS that your program can invoke
to perform system-related functions, such as parsing command-line options, manipu-
lating processes, and mapping memory. If you look under the hood, you’ll find that
these functions fall into two categories, based on how they are implemented.
n A library function is an ordinary function that resides in a library external to your
program. Most of the library functions we’ve presented so far are in the standard
C library, libc. For example, getopt_long and mkstemp are functions provided in
the C library.
A call to a library function is just like any other function call.The arguments are
placed in processor registers or onto the stack, and execution is transferred to
the start of the function’s code, which typically resides in a loaded shared library.
n A system call is implemented in the Linux kernel.When a program makes a
system call, the arguments are packaged up and handed to the kernel, which
takes over execution of the program until the call completes. A system call isn’t
an ordinary function call, and a special procedure is required to transfer control
to the kernel. However, the GNU C library (the implementation of the standard
C library provided with GNU/Linux systems) wraps Linux system calls with
functions so that you can call them easily. Low-level I/O functions such as open
and read are examples of system calls on Linux.
10 0430 Ch08 5/22/01 10:33 AM Page 168
The set of Linux system calls forms the most basic interface between programs
and the Linux kernel. Each call presents a basic operation or capability.
Some system calls are very powerful and can exert great influence on the
system. For instance, some system calls enable you to shut down the Linux
system or to allocate system resources and prevent other users from accessing
them.These calls have the restriction that only processes running with superuser
privilege (programs run by the root account) can invoke them.These calls fail if
invoked by a nonsuperuser process.
Note that a library function may invoke one or more other library functions or system
calls as part of its implementation.
Linux currently provides about 200 different system calls. A listing of system calls
for your version of the Linux kernel is in /usr/include/asm/unistd.h. Some of these
are for internal use by the system, and others are used only in implementing special-
ized library functions. In this chapter, we’ll present a selection of system calls that are
likely to be the most useful to application and system programmers.
Most of these system calls are declared in <unistd.h>.
This produces a couple screens of output. Each line corresponds to a single system
call. For each call, the system call’s name is listed, followed by its arguments (or abbre-
viated arguments, if they are very long) and its return value.Where possible, strace
conveniently displays symbolic names instead of numerical values for arguments and
return values, and it displays the fields of structures passed by a pointer into the system
call. Note that strace does not show ordinary function calls.
In the output from strace hostname, the first line shows the execve system call
that invokes the hostname program:2
execve(“/bin/hostname”, [“hostname”], [/* 49 vars */]) = 0
1. hostname invoked without any flags simply prints out the computer’s hostname to
standard output.
2. In Linux, the exec family of functions is implemented via the execve system call.
10 0430 Ch08 5/22/01 10:33 AM Page 169
The first argument is the name of the program to run; the second is its argument list,
consisting of only a single element; and the third is its environment list, which strace
omits for brevity.The next 30 or so lines are part of the mechanism that loads the
standard C library from a shared library file.
Toward the end are system calls that actually help do the program’s work.The
uname system call is used to obtain the system’s hostname from the kernel,
uname({sys=”Linux”, node=”myhostname”, ...}) = 0
Observe that strace helpfully labels the fields (sys and node) of the structure argu-
ment.This structure is filled in by the system call—Linux sets the sys field to the
operating system name and the node field to the system’s hostname.The uname call is
discussed further in Section 8.15, “uname.”
Finally, the write system call produces output. Recall that file descriptor 1 corre-
sponds to standard output.The third argument is the number of characters to write,
and the return value is the number of characters that were actually written.
write(1, “myhostname\n”, 11) = 11
This may appear garbled when you run strace because the output from the hostname
program itself is mixed in with the output from strace.
If the program you’re tracing produces lots of output, it is sometimes more conve-
nient to redirect the output from strace into a file. Use the option -o filename to
do this.
Understanding all the output from strace requires detailed familiarity with the
design of the Linux kernel and execution environment. Much of this is of limited
interest to application programmers. However, some understanding is useful for debug-
ging tricky problems or understanding how other programs work.
The program shown in Listing 8.1 uses access to check for a file’s existence and to
determine read and write permissions. Specify the name of the file to check on the
command line.
For example, to check access permissions for a file named README on a CD-ROM,
invoke it like this:
% ./check-access /mnt/cdrom/README
/mnt/cdrom/README exists
/mnt/cdrom/README is readable
/mnt/cdrom/README is not writable (read-only filesystem)
10 0430 Ch08 5/22/01 10:33 AM Page 171
continues
10 0430 Ch08 5/22/01 10:33 AM Page 172
printf (“unlocking\n”);
/* Release the lock. */
lock.l_type = F_UNLCK;
fcntl (fd, F_SETLKW, &lock);
close (fd);
return 0;
}
Note that the second instance is blocked while attempting to lock the file. Go back to
the first window and press Enter:
unlocking
The program running in the second window immediately acquires the lock.
If you prefer fcntl not to block if the call cannot get the lock you requested,
use F_SETLK instead of F_SETLKW. If the lock cannot be acquired, fcntl returns –1
immediately.
Linux provides another implementation of file locking with the flock call.The
fcntl version has a major advantage: It works with files on NFS3 file systems (as long
as the NFS server is reasonably recent and correctly configured). So, if you have access
to two machines that both mount the same file system via NFS, you can repeat the
previous example using two different machines. Run lock-file on one machine,
specifying a file on an NFS file system, and then run it again on another machine,
specifying the same file. NFS wakes up the second program when the lock is released
by the first program.
3. Network File System (NFS) is a common network file sharing technology, comparable to
Windows’ shares and network drives.
10 0430 Ch08 5/22/01 10:33 AM Page 173
Another system call, fdatasync does the same thing. However, although fsync guaran-
tees that the file’s modification time will be updated, fdatasync does not; it guarantees
only that the file’s data will be written.This means that in principal, fdatasync can
execute faster than fsync because it needs to force only one disk write instead of two.
10 0430 Ch08 5/22/01 10:33 AM Page 174
However, in current versions of Linux, these two system calls actually do the same
thing, both updating the file’s modification time.
The fsync system call enables you to force a buffer write explicitly.You can also
open a file for synchronous I/O, which causes all writes to be committed to disk imme-
diately.To do this, specify the O_SYNC flag when opening the file with the open call.
4. See the man page for your shell for more information about ulimit.
10 0430 Ch08 5/22/01 10:33 AM Page 175
int main ()
{
struct rlimit rl;
return 0;
}
When the program is terminated by SIGXCPU, the shell helpfully prints out a message
interpreting the signal:
% ./limit_cpu
CPU time limit exceeded
The function in Listing 8.5 prints out the current process’s user and system time.
void print_cpu_time()
{
struct rusage usage;
getrusage (RUSAGE_SELF, &usage);
printf (“CPU time: %ld.%06ld sec user, %ld.%06ld sec system\n”,
usage.ru_utime.tv_sec, usage.ru_utime.tv_usec,
usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
}
The strftime function additionally can produce from the struct tm pointer a cus-
tomized, formatted string displaying the date and time.The format is specified in a
manner similar to printf, as a string with embedded codes indicating which time
fields to include. For example, this format string
“%Y-%m-%d %H:%M:%S”
10 0430 Ch08 5/22/01 10:33 AM Page 177
Pass strftime a character buffer to receive the string, the length of that buffer, the for-
mat string, and a pointer to a struct tm variable. See the strftime man page for a
complete list of codes that can be used in the format string. Notice that neither
localtime nor strftime handles the fractional part of the current time more precise
than 1 second (the tv_usec field of struct timeval). If you want this in your format-
ted time strings, you’ll have to include it yourself.
Include <time.h> if you call localtime or strftime.
The function in Listing 8.6 prints the current date and time of day, down to the
millisecond.
void print_time ()
{
struct timeval tv;
struct tm* ptm;
char time_string[40];
long milliseconds;
A time-critical program might lock physical memory because the time delay of
paging memory out and back may be too long or too unpredictable. High-security
applications may also want to prevent sensitive data from being written out to a swap
file, where they might be recovered by an intruder after the program terminates.
Locking a region of memory is as simple as calling mlock with a pointer to the start
of the region and the region’s length. Linux divides memory into pages and can lock
only entire pages at a time; each page that contains part of the memory region speci-
fied to mlock is locked.The getpagesize function returns the system’s page size, which
is 4KB on x86 Linux.
For example, to allocate 32MB of address space and lock it into RAM, you would
use this code:
const int alloc_size = 32 * 1024 * 1024;
char* memory = malloc (alloc_size);
mlock (memory, alloc_size);
Note that simply allocating a page of memory and locking it with mlock doesn’t
reserve physical memory for the calling process because the pages may be copy-on-
write.5 Therefore, you should write a dummy value to each page as well:
size_t i;
size_t page_size = getpagesize ();
for (i = 0; i < alloc_size; i += page_size)
memory[i] = 0;
The write to each page forces Linux to assign a unique, unshared memory page to the
process for that page.
To unlock a region, call munlock, which takes the same arguments as mlock.
If you want your program’s entire address space locked into physical memory, call
mlockall. This system call takes a single flag argument: MCL_CURRENT locks all currently
allocated memory, but future allocations are not locked; MCL_FUTURE locks all pages that
are allocated after the call. Use MCL_CURRENT|MCL_FUTURE to lock into memory both
current and subsequent allocations.
Locking large amounts of memory, especially using mlockall, can be dangerous to
the entire Linux system. Indiscriminate memory locking is a good method of bringing
your system to a grinding halt because other running processes are forced to compete
for smaller physical memory resources and swap rapidly into and back out of memory
(this is known as thrashing). If you lock too much memory, the system will run out of
memory entirely and Linux will start killing off processes.
For this reason, only processes with superuser privilege may lock memory with
mlock or mlockall. If a nonsuperuser process calls one of these functions, it will fail,
return –1, and set errno to EPERM.
The munlockall call unlocks all memory locked by the current process, including
memory locked with mlock and mlockall.
5. Copy-on-write means that Linux makes a private copy of a page of memory for a process
only when that process writes a value somewhere into it.
10 0430 Ch08 5/22/01 10:33 AM Page 179
A convenient way to monitor the memory usage of your program is to use the top
command. In the output from top, the SIZE column displays the virtual address space
size of each program (the total size of your program’s code, data, and stack, some of
which may be paged out to swap space).The RSS column (for resident set size) shows
the size of physical memory that each program currently resides in.The sum of all the
RSS values for all running programs cannot exceed your computer’s physical memory
size, and the sum of all address space sizes is limited to 2GB (for 32-bit versions of
Linux).
Include <sys/mman.h> if you use any of the mlock system calls.
Alternately, you can use the mmap system call to bypass malloc and allocate page-aligned memory
directly from the Linux kernel. See Section 5.3, “Mapped Memory,” for details.
For example, suppose that your program allocates a page of memory by mapping
/dev/zero, as described in Section 5.3.5, “Other Uses for mmap.”The memory is ini-
tially both readable and writable.
int fd = open (“/dev/zero”, O_RDONLY);
char* memory = mmap (NULL, page_size, PROT_READ | PROT_WRITE,
MAP_PRIVATE, fd, 0);
close (fd);
Later, your program could make the memory read-only by calling mprotect:
mprotect (memory, page_size, PROT_READ);
10 0430 Ch08 5/22/01 10:33 AM Page 180
int main ()
{
int fd;
struct sigaction sa;
while (1)
{
/* Sleep for the time specified in tv. If interrupted by a
signal, place the remaining time left to sleep back into tv. */
int rval = nanosleep (&tv, &tv);
if (rval == 0)
/* Completed the entire sleep time; all done. */
return 0;
else if (errno == EINTR)
/* Interrupted by a signal. Try again. */
continue;
else
/* Some other error; bail out. */
return rval;
}
return 0;
}
if (len == -1) {
/* The call failed. */
if (errno == EINVAL)
/* It’s not a symbolic link; report that. */
fprintf (stderr, “%s is not a symbolic link\n”, link_path);
else
/* Some other problem occurred; print the generic message. */
perror (“readlink”);
return 1;
}
else {
/* NUL-terminate the target path. */
target_path[len] = ‘\0’;
/* Print it. */
printf (“%s\n”, target_path);
return 0;
}
}
For example, here’s how you could make a symbolic link and use print-symlink to
read it back:
% ln -s /usr/bin/wc my_link
% ./print-symlink my_link
/usr/bin/wc
Using sendfile, the intermediate buffer can be eliminated. Call sendfile, passing
the file descriptor to write to; the descriptor to read from; a pointer to an offset vari-
able; and the number of bytes to transfer.The offset variable contains the offset in the
input file from which the read should start (0 indicates the beginning of the file) and
is updated to the position in the file after the transfer.The return value is the number
of bytes transferred. Include <sys/sendfile.h> in your program if it uses sendfile.
The program in Listing 8.10 is a simple but extremely efficient implementation of
a file copy.When invoked with two filenames on the command line, it copies the con-
tents of the first file into a file named by the second. It uses fstat to determine the
size, in bytes, of the source file.
return 0;
}
The sendfile call can be used in many places to make copies more efficient. One
good example is in a Web server or other network daemon, that serves the contents of
a file over the network to a client program.Typically, a request is received from a
socket connected to the client computer.The server program opens a local disk file to
10 0430 Ch08 5/22/01 10:33 AM Page 185
retrieve the data to serve and writes the file’s contents to the network socket. Using
sendfile can speed up this operation considerably. Other steps need to be taken to
make the network transfer as efficient as possible, such as setting the socket parameters
correctly. However, these are outside the scope of this book.
continues
10 0430 Ch08 5/22/01 10:33 AM Page 186
int main ()
{
struct sigaction sa;
struct itimerval timer;
/* Do busy work. */
while (1);
}
See the sysinfo man page for a full description of structsysinfo. Include
<linux/kernel.h>, <linux/sys.h>, and <sys/sysinfo.h> if you use sysinfo.
The program in Listing 8.12 prints some statistics about the current system.
10 0430 Ch08 5/22/01 10:33 AM Page 187
int main ()
{
/* Conversion constants. */
const long minute = 60;
const long hour = minute * 60;
const long day = hour * 24;
const double megabyte = 1024 * 1024;
/* Obtain system statistics. */
struct sysinfo si;
sysinfo (&si);
/* Summarize interesting values. */
printf (“system uptime : %ld days, %ld:%02ld:%02ld\n”,
si.uptime / day, (si.uptime % day) / hour,
(si.uptime % hour) / minute, si.uptime % minute);
printf (“total RAM : %5.1f MB\n”, si.totalram / megabyte);
printf (“free RAM : %5.1f MB\n”, si.freeram / megabyte);
printf (“process count : %d\n”, si.procs);
return 0;
}
8.15 uname
The uname system call fills a structure with various system information, including the
computer’s network name and domain name, and the operating system version it’s
running. Pass uname a single argument, a pointer to a struct utsname object. Include
<sys/utsname.h> if you use uname.
The call to uname fills in these fields:
n sysname—The name of the operating system (such as Linux).
n release, version—The Linux kernel release number and version level.
n machine—Some information about the hardware platform running Linux. For
x86 Linux, this is i386 or i686, depending on the processor.
n node—The computer’s unqualified hostname.
n __domain—The computer’s domain name.
Listing 8.13 (print-uname) Print Linux Version Number and Hardware Information
#include <stdio.h>
#include <sys/utsname.h>
int main ()
{
struct utsname u;
uname (&u);
printf (“%s release %s (version %s) on %s\n”, u.sysname, u.release,
u.version, u.machine);
return 0;
}
11 0430 CH09 5/22/01 10:36 AM Page 189
9
Inline Assembly Code
1.The expression sin (angle) is usually implemented as a function call into the math
library, but if you specify the -O1 or higher optimization flag, GCC is smart enough to replace
the function call with a single fsin assembly instruction.
11 0430 CH09 5/22/01 10:36 AM Page 190
Observe that unlike ordinary assembly code instructions, asm statements permit you to
specify input and output operands using C syntax.
To read more about the x86 instruction set, which we will use in this
chapter, see http://developer.intel.com/design/pentiumii/manuals/ and
http://www.x86-64.org/documentation.
Remember that foo and bar each require two words of stack storage on a 32-bit x86
architecture.The register ebp points to data on the stack.
The first two instructions copy foo into registers EDX and ECX on which mycool_asm
operates.The compiler decides to use the same registers to store the answer, which is
copied into bar by the final two instructions. It chooses appropriate registers, even
reusing the same registers, and copies operands to and from the proper locations
automatically.
11 0430 CH09 5/22/01 10:36 AM Page 192
First, fucomip compares its two operands x and y, and stores values indicating the result
into the condition code register.Then seta converts these values into a 0 or 1 result.
9.3.2 Outputs
The second section specifies the instructions’ output operands using C syntax. Each
operand is specified by an operand constraint string followed by a C expression in
parentheses. For output operands, which must be lvalues, the constraint string should
begin with an equals sign.The compiler checks that the C expression for each output
operand is in fact an lvalue.
Letters specifying registers for a particular architecture can be found in the
GCC source code, in the REG_CLASS_FROM_LETTER macro. For example, the
gcc/config/i386/i386.h configuration file in GCC lists the register letters for the x86
architecture.3 Table 9.1 summarizes these.
3. You’ll need to have some familiarity with GCC’s internals to make sense of this file.
11 0430 CH09 5/22/01 10:36 AM Page 193
9.3.3 Inputs
The third section specifies the input operands for the assembler instructions.The con-
straint string for an input operand should not have an equals sign, which indicates an
lvalue. Otherwise, an input operand’s syntax is the same as for output operands.
To indicate that a register is both read from and written to in the same asm, use an
input constraint string of the output operand’s number. For example, to indicate that
an input register is the same as the first output register number, use 0. Output
operands are numbered left to right, starting with 0. Merely specifying the same C
expression for an output operand and an input operand does not guarantee that the
two values will be placed in the same register.
This input section can be omitted if there are no input operands and the subse-
quent clobber section is empty.
11 0430 CH09 5/22/01 10:36 AM Page 194
9.3.4 Clobbers
If an instruction modifies the values of one or more registers as a side effect, specify
the clobbered registers in the asm’s fourth section. For example, the fucomip instruc-
tion modifies the condition code register, which is denoted cc. Separate strings repre-
senting clobbered registers with commas. If the instruction can modify an arbitrary
memory location, specify memory. Using the clobber information, the compiler deter-
mines which values must be reloaded after the asm executes. If you don’t specify this
information correctly, GCC may assume incorrectly that registers still contain values
that have, in fact, been overwritten, which will affect your program’s correctness.
9.4 Example
The x86 architecture includes instructions that determine the positions of the least
significant set bit and the most significant set bit in a word.The processor can execute
these instructions quite efficiently. In contrast, implementing the same operation in C
requires a loop and a bit shift.
For example, the bsrl assembly instruction computes the position of the most sig-
nificant bit set in its first operand, and places the bit position (counting from 0, the
least significant bit) into its second operand.To place the bit position for number into
position, we could use this asm statement:
asm (“bsrl %1, %0” : “=r” (position) : “r” (number));
One way you could implement the same operation in C is using this loop:
long i;
for (i = (number >> 1), position = 0; i != 0; ++position)
i >>= 1;
To test the relative speeds of these two versions, we’ll place them in a loop that com-
putes the bit positions for a large number of values. Listing 9.1 does this using the C
loop implementation.The program loops over integers, from 1 up to the value speci-
fied on the command line. For each value of number, it computes the most significant
bit that is set. Listing 9.2 does the same thing using the inline assembly instruction.
Note that in both versions, we assign the computed bit position to a volatile variable
result.This is to coerce the compiler’s optimizer so that it does not eliminate the
entire bit position computation; if the result is not used or stored in memory, the opti-
mizer eliminates the computation as “dead code.”
long i;
unsigned position;
volatile unsigned result;
return 0;
}
return 0;
}
Now let’s run each using the time command to measure execution time.We’ll specify
a large value as the command-line argument, to make sure that each version takes at
least a few seconds to run.
11 0430 CH09 5/22/01 10:36 AM Page 196
10
Security
M UCH OF THE POWER OF A GNU/LINUX SYSTEM COMES FROM its support for
multiple users and for networking. Many people can use the system at once, and they
can connect to the system from remote locations. Unfortunately, with this power
comes risk, especially for systems connected to the Internet. Under some circum-
stances, a remote “hacker” can connect to the system and read, modify, or remove files
that are stored on the machine. Or, two users on the same machine can read, modify,
or remove each other’s files when they should not be allowed to do so.When this
happens, the system’s security is said to have been compromised.
The Linux kernel provides a variety of facilities to ensure that these events do not
take place. But to avoid security breaches, ordinary applications must be careful as well.
For example, imagine that you are developing accounting software. Although you
might want all users to be able to file expense reports with the system, you wouldn’t
want all users to be able to approve those reports.You might want users to be able to
view their own payroll information, but you certainly wouldn’t want them to be able
to view everyone else’s payroll information.You might want managers to be able to
view the salaries of employees in their departments, but you wouldn’t want them to
view the salaries of employees in other departments.
12 0430 CH10 5/22/01 10:42 AM Page 198
To enforce these kinds of controls, you have to be very careful. It’s amazingly easy
to make a mistake that allows users to do something you didn’t intend them to be able
to do.The best approach is to enlist the help of security experts. Still, every application
developer ought to understand the basics.
The first part shows you that the user ID for the user who ran the command was 501.
The command also figures out what the corresponding username is and displays that
in parentheses.The command shows that user ID 501 is actually in two groups: group
501 (called mitchell) and group 503 (called csl).You’re probably wondering why
group 501 appears twice: once in the gid field and once in the groups field.We’ll
explain this later.
12 0430 CH10 5/22/01 10:42 AM Page 199
1.The fact that there is only one special user gave AT&T the name for its UNIX operating
system. In contrast, an earlier operating system that had multiple special users was called
MULTICS. GNU/Linux, of course, is mostly compatible with UNIX.
12 0430 CH10 5/22/01 10:42 AM Page 200
To get the user ID and group ID for the current process, you can use the geteuid
and getegid functions, declared in <unistd.h>.These functions don’t take any parame-
ters, and they always work; you don’t have to check for errors. Listing 10.1 shows a
simple program that provides a subset of the functionality provide by the id command:
int main()
{
uid_t uid = geteuid ();
gid_t gid = getegid ();
printf (“uid=%d gid=%d\n”, (int) uid, (int) gid);
return 0;
}
When this program is run (by the same user who ran the real id program) the output
is as follows:
% ./simpleid
uid=501 gid=501
2. Actually, there are some rare exceptions, involving sticky bits, discussed later in Section
10.3.2, “Sticky Bits.”
12 0430 CH10 5/22/01 10:42 AM Page 201
Linux enables you to designate which of these three actions—reading, writing, and
executing—can be performed by the owning user, owning group, and everybody else.
For example, you could say that the owning user can do anything she wants with the
file, that anyone in the owning group can read and execute the file (but not write to
it), and that nobody else can access the file at all.
You can view these permission bits interactively with the ls command by using the
-l or -o options and programmatically with the stat system call.You can set the per-
mission bits interactively with the chmod program3 or programmatically with the
system call of the same name.To look at the permissions on a file named hello, use
ls -l hello. Here’s how the output might look:
% ls -l hello
-rwxr-x--- 1 samuel csl 11734 Jan 22 16:29 hello
The samuel and csl fields indicate that the owning user is samuel and that the owning
group is csl.
The string of characters at the beginning of the line indicates the permissions asso-
ciated with the file.The first dash indicates that this is a normal file. It would be d for
a directory, or it can be other letters for special kinds of files such as devices (see
Chapter 6, “Devices”) or named pipes (see Chapter 5, “Interprocess Communication,”
Section 5.4, “Pipes”).The next three characters show permissions for the owning user;
they indicate that samuel can read, write, and execute the file.The next three charac-
ters show permissions for members of the csl group; these members are allowed only
to read and execute the file.The last three characters show permissions for everyone
else; these users are not allowed to do anything with hello.
Let’s see how this works. First, let’s try to access the file as the user nobody, who is
not in the csl group:
% id
uid=99(nobody) gid=99(nobody) groups=99(nobody)
% cat hello
cat: hello: Permission denied
% echo hi > hello
sh: ./hello: Permission denied
% ./hello
sh: ./hello: Permission denied
We can’t read the file, which is why cat fails; we can’t write to the file, which is why
echo fails; and we can’t run the file, which is why ./hello fails.
3.You’ll sometimes see the permission bits for a file referred to as the file’s mode.The name
of the chmod command is short for “change mode.”
12 0430 CH10 5/22/01 10:42 AM Page 202
Things are better if we are accessing the file as mitchell, who is a member of the
csl group:
% id
uid=501(mitchell) gid=501(mitchell) groups=501(mitchell),503(csl)
% cat hello
#!/bin/bash
echo “Hello, world.”
% ./hello
Hello, world.
% echo hi > hello
bash: ./hello: Permission denied
We can list the contents of the file, and we can run it (it’s a simple shell script), but we
still can’t write to it.
If we run as the owner (samuel), we can even overwrite the file:
% id
uid=502(samuel) gid=502(samuel) groups=502(samuel),503(csl)
% echo hi > hello
% cat hello
hi
You can change the permissions associated with a file only if you are the file’s owner
(or the superuser). For example, if you now want to allow everyone to execute the
file, you can do this:
% chmod o+x hello
% ls -l hello
-rwxr-x--x 1 samuel csl 3 Jan 22 16:38 hello
Note that there’s now an x at the end of the first string of characters.The o+x bit
means that you want to add the execute permission for other people (not the file’s
owner or members of its owning group).You could use g-w instead, to remove the
write permission from the group. See the man page in section 1 for chmod for details
about this syntax:
% man 1 chmod
Programmatically, you can use the stat system call to find the permissions associated
with a file.This function takes two parameters: the name of the file you want to find
out about, and the address of a data structure that is filled in with information about
the file. See Appendix B,“Low-Level I/O,” Section B.2,“stat,” for a discussion of other
information that you can obtain with stat. Listing 10.2 shows an example of using
stat to obtain file permissions.
The S_IWUSR constant corresponds to write permission for the owning user.There are
other constants for all the other bits. For example, S_IRGRP is read permission for the
owning group, and S_IXOTH is execute permission for users who are neither the own-
ing user nor a member of the owning group. If you store permissions in a variable, use
the typedef mode_t for that variable. Like most system calls, stat will return -1 and set
errno if it can’t obtain information about the file.
You can use the chmod function to change the permission bits on an existing file.
You call chmod with the name of the file you want to change and the permission bits
you want set, presented as the bitwise or of the various permission constants men-
tioned previously. For example, this next line would make hello readable and exe-
cutable by its owning user but would disable all other permissions associated with
hello:
chmod (“hello”, S_IRUSR | S_IXUSR);
The same permission bits apply to directories, but they have different meanings. If a
user is allowed to read from a directory, the user is allowed to see the list of files that
are present in that directory. If a user is allowed to write to a directory, the user is
allowed to add or remove files from the directory. Note that a user may remove files
from a directory if she is allowed to write to the directory, even if she does not have per-
mission to modify the file she is removing. If a user is allowed to execute a directory, the
user is allowed to enter that directory and access the files therein.Without execute
access to a directory, a user is not allowed to access the files in that directory indepen-
dent of the permissions on the files themselves.
To summarize, let’s review how the kernel decides whether to allow a process to
access a particular file. It checks to see whether the accessing user is the owning user, a
member of the owning group, or someone else.The category into which the accessing
user falls is used to determine which set of read/write/execute bits are checked.Then
the kernel checks the operation that is being performed against the permission bits
that apply to this user.4
4.The kernel may also deny access to a file if a component directory in its file path is inac-
cessible. For instance, if a process may not access the directory /tmp/private/, it may not read
/tmp/private/data, even if the permissions on the latter are set to allow the access.
12 0430 CH10 5/22/01 10:42 AM Page 204
There is one important exception: Processes running as root (those with user ID 0)
are always allowed to access any file, regardless of the permissions associated with it.
5.This name is anachronistic; it goes back to a time when setting the sticky bit caused a pro-
gram to be retained in main memory even when it was done executing.The pages allocated to
the program were “stuck” in memory.
12 0430 CH10 5/22/01 2:38 PM Page 205
To set the sticky bit programmatically, call chmod with the S_ISVTX mode flag. For
example, to set the sticky bit of the directory specified by dir_path to those of the
/tmp and give full read, write, and execute permissions to all users, use this call:
chmod (dir_path, S_IRWXU | S_IRWXG | S_IRWXO | S_ISVTX);
Programs that authenticate users when they log in take advantage of the capability
to change user IDs as well.These login programs run as root.When the user enters a
username and password, the login program verifies the username and password in the
system password database.Then the login program changes both the effective user ID
and the real ID to be that of the user. Finally, the login program calls exec to start the
user’s shell, leaving the user running a shell whose effective user ID and real user ID
are that of the user.
The function used to change the user IDs for a process is setreuid. (There is, of
course, a corresponding setregid function as well.) This function takes two argu-
ments.The first argument is the desired real user ID; the second is the desired effective
user ID. For example, here’s how you would exchange the effective and real user IDs:
setreuid (geteuid(), getuid ());
Obviously, the kernel won’t let just any process change its user IDs. If a process were
allowed to change its effective user ID at will, then any user could easily impersonate
any other user, simply by changing the effective user ID of one of his processes.The
kernel will let a process running with an effective user ID of 0 change its user IDs as
it sees fit. (Again, notice how much power a process running as root has! A process
whose effective user ID is 0 can do absolutely anything it pleases.) Any other process,
however, can do only one of the following things:
n Set its effective user ID to be the same as its real user ID
n Set its real user ID to be the same as its effective user ID
n Swap the two user IDs
The first alternative would be used by our accounting process when it has finished
accessing files as mitchell and wants to return to being root.The second alternative
could be used by a login program after it has set the effective user ID to that of the
user who just logged in. Setting the real user ID ensures that the user will never be
able go back to being root. Swapping the two user IDs is almost a historical artifact;
modern programs rarely use this functionality.
You can pass -1 to either argument to setreuid if you want to leave that user ID
alone.There’s also a convenience function called seteuid.This function sets the effec-
tive user ID, but it doesn’t modify the real user ID.The following two statements both
do exactly the same thing:
seteuid (id);
setreuid (-1, id);
Here’s a puzzle: Can you, running as a non-root user, ever become root? That
doesn’t seem possible, using the previous techniques, but here’s proof that it can be
done:
% whoami
mitchell
% su
Password: ...
% whoami
root
The whoami command is just like id, except that it shows only the effective user ID,
not all the other information.The su command enables you to become the superuser
if you know the root password.
How does su work? Because we know that the shell was originally running with
both its real user ID and its effective user ID set to mitchell, setreuid won’t allow us
to change either user ID.
The trick is that the su program is a setuid program.That means that when it is
run, the effective user ID of the process will be that of the file’s owner rather than the
effective user ID of the process that performed the exec call. (The real user ID will
still be that of the executing user.) To create a setuid program, you use chmod +s at the
command line, or use the S_ISUID flag if calling chmod programmatically.6
For example, consider the program in Listing 10.3.
int main ()
{
printf (“uid=%d euid=%d\n”, (int) getuid (), (int) geteuid ());
return 0;
}
Now suppose that this program is setuid and owned by root. In that case, the ls out-
put will look like this:
-rwsrws--x 1 root root 11931 Jan 24 18:25 setuid-test
The s bits indicate that the file is not only executable (as an x bit would indicate) but
also setuid and setgid.When we use this program, we get output like this:
% whoami
mitchell
% ./setuid-test
uid=501 euid=0
6. Of course, there is a similar notion of a setgid program.When run, its effective group
ID is the same as that of the group owner of the file. Most setuid programs are also setgid
programs.
12 0430 CH10 5/22/01 10:42 AM Page 208
Note that the effective user ID is set to 0 when the program is run.
You can use the chmod command with the u+s or g+s arguments to set the setuid
and setgid bits on an executable file, respectively—for example:
% ls -l program
-rwxr-xr-x 1 samuel csl 0 Jan 30 23:38 program
% chmod g+s program
% ls -l program
-rwxr-sr-x 1 samuel csl 0 Jan 30 23:38 program
% chmod u+s program
% ls -l program
-rwsr-sr-x 1 samuel csl 0 Jan 30 23:38 program
You can also use the chmod call with the S_ISUID or S_ISGID mode flags.
su is capable of changing the effective user ID through this mechanism. It runs
initially with an effective user ID of 0.Then it prompts you for a password. If the
password matches the root password, it sets its real user ID to be root as well and
then starts a new shell. Otherwise, it exits, unceremoniously leaving you as a non-
privileged user.
Take a look at the permissions on the su program:
% ls -l /bin/su
-rwsr-xr-x 1 root root 14188 Mar 7 2000 /bin/su
Notice that it’s owned by root and that the setuid bit is set.
Note that su doesn’t actually change the user ID of the shell from which it was
run. Instead, it starts a new shell process with the new user ID.The original shell is
blocked until the new shell completes and su exits.
7. It has been found that system administrators tend to pick the word god as their password
more often than any other password. (Make of that what you will.) So, if you ever need root
access on a machine and the sysadmin isn’t around, a little divine inspiration might be just what
you need.
12 0430 CH10 5/22/01 10:42 AM Page 209
For example, many organizations now require the use of special “one-time” pass-
words that are generated by special electronic ID cards that users keep with them.The
same password can’t be used twice, and you can’t get a valid password out of the ID
card without entering a PIN. So, an attacker must obtain both the physical card and
the PIN to break in. In a really secure facility, retinal scans or other kinds of biometric
testing are used.
If you’re writing a program that must perform authentication, you should allow the
system administrator to use whatever means of authentication is appropriate for that
installation. GNU/Linux comes with a very useful library that makes this very easy.
This facility, called Pluggable Authentication Modules, or PAM, makes it easy to write
applications that authenticate their users as the system administrator sees fit.
It’s easiest to see how PAM works by looking at a simple PAM application. Listing
10.4 illustrates the use of PAM.
int main ()
{
pam_handle_t* pamh;
struct pam_conv pamc;
To compile this program, you have to link it with two libraries: the libpam library and
a helper library called libpam_misc:
% gcc -o pam pam.c -lpam -lpam_misc
12 0430 CH10 5/22/01 10:42 AM Page 210
This program starts off by building up a PAM conversation object.This object is used
by the PAM library whenever it needs to prompt the user for information.The
misc_conv function used in this example is a standard conversation function that uses
the terminal for input and output.You could write your own function that pops up a
dialog box, or that uses speech for input and output, or that provides even more exotic
input and output methods.
The program then calls pam_start.This function initializes the PAM library.The first
argument is a service name.You should use a name that uniquely identifies your appli-
cation. For example, if your application is named whizbang, you should probably use
that for the service name, too. However, the program probably won’t work until the
system administrator explicitly configures the system to work with your service. So, in
this example, we use the su service, which says that our program should authenticate
users in the same way that the su command does.You should not use this technique in a
real program. Pick a real service name, and have your installation scripts help the system
administrator to set up a correct PAM configuration for your application.
The second argument is the name of the user whom you want to authenticate. In
this example, we use the value of the USER environment variable. (Normally, this is the
username that corresponds to the effective user ID of the current process, but that’s
not always the case.) In most real programs, you would prompt for a username at this
point.The third argument indicates the PAM conversation, discussed previously.The
call to pam_start fills in the handle provided as the fourth argument. Pass this handle
to subsequent calls to PAM library routines.
Next, the program calls pam_authenticate.The second argument enables you to
pass various flags; the value 0 means to use the default options.The return value from
this function indicates whether authentication succeeded.
Finally, the programs calls pam_end to clean up any allocated data structures.
Let’s assume that the valid password for the current user is “password” (an excep-
tionally poor password).Then, running this program with the correct password pro-
duces the expected:
% ./pam
Password: password
Authentication OK.
If you run this program in a terminal, the password probably won’t actually appear
when you type it in; it’s hidden to prevent others from peeking at your password over
your shoulder as you type.
However, if a hacker tries to use the wrong password, the PAM library will cor-
rectly indicate failure:
% ./pam
Password: badguess
Authentication failed!
12 0430 CH10 5/22/01 10:42 AM Page 211
The basics covered here are enough for most simple programs. Full documentation
about how PAM works is available in /usr/doc/pam on most GNU/Linux systems.
The buggy versions of finger, talk, and sendmail all shared a common flaw. Each
used a fixed-length string buffer, which implied a constant upper limit on the size of
the string but then allowed network clients to provide strings that overflowed the
buffer. For example, they contained code similar to this:
#include <stdio.h>
int main ()
{
/* Nobody in their right mind would have more than 32 characters in
their username. Plus, I think UNIX allows only 8-character
usernames. So, this should be plenty of space. */
char username[32];
/* Prompt the user for the username. */
printf (“Enter your username: “);
/* Read a line of input. */
gets (username);
/* Do other things here... */
return 0;
}
The combination of the 32-character buffer with the gets function permits a buffer
overrun.The gets function reads user input up until the next newline character and
stores the entire result in the username buffer.The comments in the code are correct
in that people generally have short usernames, so no well-meaning user is likely to
type in more than 32 characters. But when you’re writing secure software, you must
consider what a malicious attacker might do. In this case, the attacker might deliber-
ately type in a very long username. Local variables such as username are stored on the
stack, so by exceeding the array bounds, it’s possible to put arbitrary bytes onto the
stack beyond the area reserved for the username variable.The username will overrun
the buffer and overwrite parts of the surrounding stack, allowing the kind of attack
described previously.
Fortunately, it’s relatively easy to prevent buffer overruns.When reading strings, you
should always use a function, such as getline, that either dynamically allocates a suffi-
ciently large buffer or stops reading input if the buffer is full. For example, you could
use this:
char* username = getline (NULL, 0, stdin);
This call automatically uses malloc to allocate a buffer big enough to hold the line and
returns it to you.You have to remember to call free to deallocate the buffer, of course,
to avoid leaking memory.
Your life will be even easier if you use C++ or another language that provides
simple primitives for reading input. In C++, for example, you can simply use this:
string username;
getline (cin, username);
12 0430 CH10 5/22/01 10:42 AM Page 213
The username string will automatically be deallocated as well; you don’t have to
remember to free it.8
Of course, buffer overruns can occur with any statically sized array, not just with
strings. If you want to write secure code, you should never write into a data structure,
on the stack or elsewhere, without verifying that you’re not going to write beyond its
region of memory.
8. Some programmers believe that C++ is a horrible and overly complex language.Their
arguments about multiple inheritance and other such complications have some merit, but it is
easier to write code that avoids buffer overruns and other similar problems in C++ than in C.
9. Obviously, if you’re also a system administrator, you shouldn’t mount /tmp over NFS.
12 0430 CH10 5/22/01 10:42 AM Page 214
One approach that works is to call lstat on the newly created file (lstat is discussed in
Section B.2, “stat”).The lstat function is like stat, except that if the file referred to
is a symbolic link, lstat tells you about the link, not the file to which it refers. If
lstat tells you that your new file is an ordinary file, not a symbolic link, and that it is
owned by you, then you should be okay.
Listing 10.5 presents a function that tries to securely open a file in /tmp.The authors
of this book have not had it audited professionally, nor are we professional security
experts, so there’s a good chance that it has a weakness, too.We do not recommend that
you use this code without getting an audit, but it should at least convince you that
writing secure code is tricky.To help dissuade you, we’ve deliberately made the inter-
face difficult to use in real programs. Error checking is an important part of writing
secure software, so we’ve included error-checking logic in this example.
int secure_temp_file ()
{
/* This file descriptor points to /dev/random and allows us to get
a good source of random bits. */
static int random_fd = -1;
/* A random integer. */
unsigned int random;
/* A buffer, used to convert from a numeric to a string
representation of random. This buffer has fixed size, meaning
that we potentially have a buffer overrun bug if the integers on
this machine have a *lot* of bits. */
char filename[128];
/* The file descriptor for the new temporary file. */
int fd;
/* Information about the newly created file. */
struct stat stat_buf;
return fd;
}
This function calls open to create the file and then calls lstat a few lines later to make
sure that the file is not a symbolic link. If you’re thinking carefully, you’ll realize that
there seems to be a race condition at this point. In particular, an attacker could remove
the file and replace it with a symbolic link between the time we call open and the
12 0430 CH10 5/22/01 10:42 AM Page 216
time we call lstat.That won’t harm us directly because we already have an open file
descriptor to the newly created file, but it will cause us to indicate an error to our
caller.This attack doesn’t create any direct harm, but it does make it impossible for our
program to get its work done. Such an attack is called a denial-of-service (DoS ) attack.
Fortunately, the sticky bit comes to the rescue. Because the sticky bit is set on /tmp,
nobody else can remove files from that directory. Of course, root can still remove files
from /tmp, but if the attacker has root privilege, there’s nothing you can do to protect
your program.
If you choose to assume competent system administration, then /tmp will not be
mounted via NFS. And if the system administrator was foolish enough to mount /tmp
over NFS, then there’s a good chance that the sticky bit isn’t set, either. So, for most
practical purposes, we think it’s safe to use mkstemp. But you should be aware of these
issues, and you should definitely not rely on O_EXCL to work correctly if the directory
in use is not /tmp—nor you should rely on the sticky bit being set anywhere else.
Here, word is the word that the user is curious about.The exit code from grep will tell
you whether that word appears in /usr/dict/words.10
Listing 10.6 shows how you might try to code the part of the server that
invokes grep:
10. If you don’t know about grep, you should look at the manual pages. It’s an incredibly
useful program.
12 0430 CH10 5/22/01 10:42 AM Page 217
Note that by calculating the number of characters we need and then allocating the
buffer dynamically, we’re sure to be safe from buffer overruns.
Unfortunately, the use of the system function (described in Chapter 3, “Processes,”
Section 3.2.1, “Using system”) is unsafe.This function invokes the standard system
shell to run the command and then returns the exit value. But what happens if a mali-
cious hacker sends a “word” that is actually the following line or a similar string?
foo /dev/null; rm -rf /
Now the problem is obvious.The user has turned one command, ostensibly the invo-
cation of grep, into two commands because the shell treats a semicolon as a command
separator.The first command is still a harmless invocation of grep, but the second
removes all files on the entire system! Even if the server is not running as root, all the
files that can be removed by the user running the server will be removed.The same
problem can arise with popen (described in Section 5.4.4, “popen and pclose”), which
creates a pipe between the parent and child process but still uses the shell to run the
command.
There are two ways to avoid these problems. One is to use the exec family of func-
tions instead of system or popen.That solution avoids the problem because characters
that the shell treats specially (such as the semicolon in the previous command) are not
treated specially when they appear in the argument list to an exec call. Of course, you
give up the convenience of system and popen.
12 0430 CH10 5/22/01 10:42 AM Page 218
The other alternative is to validate the string to make sure that it is benign. In the
dictionary server example, you would make sure that the word provided contains only
alphabetic characters, using the isalpha function. If it doesn’t contain any other char-
acters, there’s no way to trick the shell into executing a second command. Don’t
implement the check by looking for dangerous and unexpected characters; it’s always
safer to explicitly check for the characters that you know are safe rather than try to
anticipate all the characters that might cause trouble.
13 0430 CH11 5/22/01 10:46 AM Page 219
11
A Sample GNU/Linux
Application
11.1 Overview
The example program is part of a system for monitoring a running GNU/Linux
system. It includes these features:
n The program incorporates a minimal Web server. Local or remote clients access
system information by requesting Web pages from the server via HTTP.
n The program does not serve static HTML pages. Instead, the pages are generated
on the fly by modules, each of which provides a page summarizing one aspect of
the system’s state.
13 0430 CH11 5/22/01 10:46 AM Page 220
n Modules are not linked statically into the server executable. Instead, they are
loaded dynamically from shared libraries. Modules can be added, removed, or
replaced while the server is running.
n The server services each connection in a child process.This enables the server to
remain responsive even when individual requests take a while to complete, and
it shields the server from failures in modules.
n The server does not require superuser privilege to run (as long as it is not run
on a privileged port). However, this limits the system information that it can
collect.
We provide four sample modules that demonstrate how modules might be written.
They further illustrate some of the techniques for gathering system information pre-
sented previously in this book.The time module demonstrates using the gettimeofday
system call.The issue module demonstrates low-level I/O and the sendfile system
call.The diskfree module demonstrates the use of fork, exec, and dup2 by running a
command in a child process.The processes module demonstrates the use of the /proc
file system and various system calls.
11.1.1 Caveats
This program has many of the features you’d expect in an application program, such as
command-line parsing and error checking. At the same time, we’ve made some simpli-
fications to improve readability and to focus on the GNU/Linux-specific topics dis-
cussed in this book. Bear in mind these caveats as you examine the code.
n We don’t attempt to provide a full implementation of HTTP. Instead, we
implement just enough for the server to interact with Web clients. A real-world
program either would provide a more complete HTTP implementation or
would interface with one of the various excellent Web server implementations1
available instead of providing HTTP services directly.
n Similarly, we don’t aim for full compliance with HTML specifications (see
http://www.w3.org/MarkUp/).We generate simple HTML output that can be
handled by popular Web browsers.
n The server is not tuned for high performance or minimum resource usage. In
particular, we intentionally omit some of the network configuration code that
you would expect in a Web server.This topic is outside the scope of this book.
See one of the many excellent references on network application development,
such as UNIX Network Programming,Volume 1: Networking APIs—Sockets and XTI,
by W. Richard Stevens (Prentice Hall, 1997), for more information.
1.The most popular open source Web server for GNU/Linux is the Apache server, available
from http://www.apache.org.
13 0430 CH11 5/22/01 10:46 AM Page 221
HTTP
The Hypertext Transport Protocol (HTTP ) is used for communication between Web clients and servers. The
client connects to the server by establishing a connection to a well-known port (usually port 80 for
Internet Web servers, but any port may be used). HTTP requests and headers are composed of plain text.
Once connected, the client sends a request to the server. A typical request is GET /page HTTP/1.0.
The GET method indicates that the client is requesting that the server send it a Web page. The second
element is the path to that page on the server. The third element is the protocol and version. Subsequent
lines contain header fields, formatted similarly to email headers, which contain extra information about
the client. The header ends with a blank line.
The server sends back a response indicating the result of processing the request. A typical response is
HTTP/1.0 200 OK. The first element is the protocol version. The next two elements indicate the
result; in this case, result 200 indicates that the request was processed successfully. Subsequent lines
contain header fields, formatted similarly to email headers. The header ends with a blank line. The server
may then send arbitrary data to satisfy the request.
Typically, the server responds to a page request by sending back HTML source for the Web page. In this
case, the response headers will include Content-type: text/html, indicating that the result is
HTML source. The HTML source follows immediately after the header.
See the HTTP specification at http://www.w3.org/Protocols/ for more information.
11.2 Implementation
All but the very smallest programs written in C require careful organization to pre-
serve the modularity and maintainability of the source code.This program is divided
into four main source files.
Each source file exports functions or variables that may be accessed by the other
parts of the program. For simplicity, all exported functions and variables are declared in
a single header file, server.h (see Listing 11.1), which is included by the other files.
Functions that are intended for use within a single compilation unit only are declared
static and are not declared in server.h.
13 0430 CH11 5/22/01 10:46 AM Page 222
#include <netinet/in.h>
#include <sys/types.h>
/* Print an error message for a failed call OPERATION, using the value
of errno, and end the program. */
extern void system_error (const char* operation);
#endif /* SERVER_H */
#include “server.h”
int verbose;
continues
13 0430 CH11 5/22/01 10:46 AM Page 224
char* get_self_executable_directory ()
{
int rval;
char link_target[1024];
char* last_slash;
size_t result_length;
char* result;
You could use these functions in other programs as well; the contents of this file might
be included in a common code library that is shared among many projects:
n xmalloc, xrealloc, and xstrdup are error-checking versions of the C library
functions malloc, realloc, and strdup, respectively. Unlike the standard versions,
which return a null pointer if the allocation fails, these functions immediately
abort the program when insufficient memory is available.
Early detection of memory allocation failure is a good idea. Otherwise, failed
allocations introduce null pointers at unexpected places into the program.
Because allocation failures are not easy to reproduce, debugging such problems
can be difficult. Allocation failures are usually catastrophic, so aborting the pro-
gram is often an acceptable course of action.
n The error function is for reporting a fatal program error. It prints a message to
stderr and ends the program. For errors caused by failed system calls or library
calls, system_error generates part of the error message from the value of errno
(see Section 2.2.3, “Error Codes from System Calls,” in Chapter 2, “Writing
Good GNU/Linux Software”).
n get_self_executable_directory determines the directory containing the exe-
cutable file being run in the current process.The directory path can be used to
locate other components of the program, which are installed in the same place
at runtime.This function works by examining the symbolic link /proc/self/exe
in the /proc file system (see Section 7.2.1, “/proc/self,” in Chapter 7, “The
/proc File System”).
#include “server.h”
char* module_dir;
/* Construct the full path of the module shared library we’ll try to
load. */
module_path =
(char*) xmalloc (strlen (module_dir) + strlen (module_name) + 2);
sprintf (module_path, “%s/%s”, module_dir, module_name);
Each module is a shared library file (see Section 2.3.2, “Shared Libraries,” in Chapter
2) and must define and export a function named module_generate.This function gen-
erates an HTML Web page and writes it to the client socket file descriptor passed as
its argument.
module.c contains two functions:
n module_open attempts to load a server module with a given name.The name
normally ends with the .so extension because server modules are implemented
as shared libraries.This function opens the shared library with dlopen and
resolves a symbol named module_generate from the library with dlsym (see
Section 2.3.6, “Dynamic Loading and Unloading,” in Chapter 2). If the library
can’t be opened, or if module_generate isn’t a name exported by the library, the
call fails and module_open returns a null pointer. Otherwise, it allocates and
returns a module object.
n module_close closes the shared library corresponding to the server module and
deallocates the struct server_module object.
module.c also defines a global variable module_dir.This is the path of the directory in
which module_open attempts to find shared libraries corresponding to server modules.
13 0430 CH11 5/22/01 10:46 AM Page 228
#include “server.h”
/* Process an HTTP “GET” request for PAGE, and send the results to the
file descriptor CONNECTION_FD. */
/* Make sure the requested page begins with a slash and does not
contain any additional slashes -- we don’t support any
subdirectories. */
if (*page == ‘/’ && strchr (page + 1, ‘/’) == NULL) {
char module_file_name[64];
/* The page name looks OK. Construct the module name by appending
“.so” to the page name. */
snprintf (module_file_name, sizeof (module_file_name),
“%s.so”, page + 1);
/* Try to open the module. */
module = module_open (module_file_name);
}
if (module == NULL) {
/* Either the requested page was malformed, or we couldn’t open a
module with the indicated name. Either way, return the HTTP
response 404, Not Found. */
continues
13 0430 CH11 5/22/01 10:46 AM Page 230
/* Send the HTTP response indicating success, and the HTTP header
for an HTML page. */
write (connection_fd, ok_response, strlen (ok_response));
/* Invoke the module, which will generate HTML output and send it
to the client file descriptor. */
(*module->generate_function) (connection_fd);
/* We’re done with the module. */
module_close (module);
}
}
continues
13 0430 CH11 5/22/01 10:46 AM Page 232
if (verbose) {
/* In verbose mode, display the local address and port number
we’re listening on. */
socklen_t address_length;
2.Your computer might be configured to include such interfaces as eth0, an Ethernet card;
lo, thelocal (loopback) network; or ppp0, a dial-up network connection.
13 0430 CH11 5/22/01 10:46 AM Page 235
#include “server.h”
continues
13 0430 CH11 5/22/01 10:46 AM Page 236
/* Set defaults for options. Bind the server to all local addresses,
and assign an unused port automatically. */
local_address.s_addr = INADDR_ANY;
port = 0;
/* Don’t print verbose messages. */
verbose = 0;
/* Load modules from the directory containing this executable. */
module_dir = get_self_executable_directory ();
assert (module_dir != NULL);
/* Parse options. */
do {
next_option =
getopt_long (argc, argv, short_options, long_options, NULL);
switch (next_option) {
case ‘a’:
/* User specified -a or --address. */
{
struct hostent* local_host_name;
case ‘h’:
/* User specified -h or --help. */
print_usage (0);
case ‘m’:
/* User specified -m or --module-dir. */
{
struct stat dir_info;
case ‘p’:
/* User specified -p or --port. */
{
long value;
char* end;
case ‘v’:
/* User specified -v or --verbose. */
verbose = 1;
break;
continues
13 0430 CH11 5/22/01 10:46 AM Page 238
case -1:
/* Done with options. */
break;
default:
abort ();
}
} while (next_option != -1);
return 0;
}
The default value for the directory from which to load server modules
is the directory containing the server executable, as determined by
get_self_executable_directory.The user may override this with the
--module-dir (-m) option; main makes sure that the specified directory is
accessible.
By default, verbose messages are not printed.The user may enable them by
specifying the --verbose (-v) option.
n If the user specifies the --help (-h) option or specifies invalid options, main
invokes print_usage, which prints a usage summary and exits.
11.3 Modules
We provide four modules to demonstrate the kind of functionality you could imple-
ment using this server implementation. Implementing your own server module is as
simple as defining a module_generate function to return the appropriate HTML text.
#include “server.h”
continues
13 0430 CH11 5/22/01 10:46 AM Page 240
This module uses standard C library I/O routines for convenience.The fdopen call
generates a stream pointer (FILE*) corresponding to the client socket file descriptor
(see Section B.4, “Relation to Standard C Library I/O Functions,” in Appendix B,
“Low-Level I/O”).The module writes to it using fprintf and flushes it using fflush
to prevent the loss of buffered data when the socket is closed.
The HTML page returned by the time.so module includes a <meta> element in
the page header that instructs clients to reload the page every 5 seconds.This way the
client displays the current time.
#include “server.h”
“ <pre>\n”;
/* HTML source for the page indicating there was a problem opening
/proc/issue. */
/* Open /etc/issue. */
input_fd = open (“/etc/issue”, O_RDONLY);
if (input_fd == -1)
system_error (“open”);
/* Obtain file information about it. */
rval = fstat (input_fd, &file_info);
if (rval == -1)
/* Either we couldn’t open the file or we couldn’t read from it. */
write (fd, error_page, strlen (error_page));
else {
int rval;
off_t offset = 0;
continues
13 0430 CH11 5/22/01 10:46 AM Page 242
close (input_fd);
}
The module first tries to open /etc/issue. If that file can’t be opened, the module
sends an error page to the client. Otherwise, the module sends the start of the
HTML page, contained in page_start.Then it sends the contents of /etc/issue using
sendfile (see Section 8.12, “sendfile: Fast Data Transfers,” in Chapter 8). Finally, it
sends the end of the HTML page, contained in page_end.
You can easily adapt this module to send the contents of another file. If the file
contains a complete HTML page, simply omit the code that sends the contents of
page_start and page_end.You could also adapt the main server implementation to
serve static files, in the manner of a traditional Web server. Using sendfile provides an
extra degree of efficiency.
Listing 11.8 (diskfree.c) Server Module to Display Information About Free Disk
Space
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include “server.h”
While issue.so sends the contents of a file using sendfile, this module must invoke a
command and redirect its output to the client.To do this, the module follows these
steps:
1. First, the module creates a child process using fork (see Section 3.2.2, “Using
fork and exec,” in Chapter 3).
2. The child process copies the client socket file descriptor to file descriptors
STDOUT_FILENO and STDERR_FILENO, which correspond to standard output and
standard error (see Section 2.1.4, “Standard I/O,” in Chapter 2).The file descrip-
tors are copied using the dup2 call (see Section 5.4.3, “Redirecting the Standard
Input, Output, and Error Streams,” in Chapter 5). All further output from the
process to either of these streams is sent to the client socket.
3. The child process invokes the df command with the -h option by calling execv
(see Section 3.2.2, “Using fork and exec,” in Chapter 3).
4. The parent process waits for the child process to exit by calling waitpid (see
Section 3.4.2, “The wait System Calls,” in Chapter 3).
You could easily adapt this module to invoke a different command and redirect its
output to the client.
#include “server.h”
/* Set *UID and *GID to the owning user ID and group ID, respectively,
of process PID. Return 0 on success, nonzero on failure. */
13 0430 CH11 5/22/01 10:46 AM Page 245
/* Return the name of user UID. The return value is a buffer that the
caller must allocate with free. UID must be a valid user ID. */
/* Return the name of group GID. The return value is a buffer that the
caller must allocate with free. GID must be a valid group ID. */
/* The program name is the second element of the file contents and is
surrounded by parentheses. Find the positions of the parentheses
in the file contents. */
open_paren = strchr (status_info, ‘(‘);
close_paren = strchr (status_info, ‘)’);
if (open_paren == NULL
|| close_paren == NULL
|| close_paren < open_paren)
/* Couldn’t find them; bail. */
return NULL;
/* Allocate memory for the result. */
result = (char*) xmalloc (close_paren - open_paren);
/* Copy the program name into the result. */
strncpy (result, open_paren + 1, close_paren - open_paren - 1);
/* strncpy doesn’t NUL-terminate the result, so do it here. */
result[close_paren - open_paren - 1] = ‘\0’;
/* All done. */
return result;
}
13 0430 CH11 5/22/01 10:46 AM Page 247
/* Generate an HTML table row for process PID. The return value is a
pointer to a buffer that the caller must deallocate with free, or
NULL if an error occurs. */
continues
13 0430 CH11 5/22/01 10:46 AM Page 248
/* Compute the length of the string we’ll need to hold the result, and
allocate memory to hold it. */
result_length = strlen (program_name)
+ strlen (user_name) + strlen (group_name) + 128;
result = (char*) xmalloc (result_length);
/* Format the result. */
snprintf (result, result_length,
“<tr><td align=\”right\”>%d</td><td><tt>%s</tt></td><td>%s</td>”
“<td>%s</td><td align=\”right\”>%d</td></tr>\n”,
(int) pid, program_name, user_name, group_name, rss);
/* Clean up. */
free (program_name);
free (user_name);
free (group_name);
/* All done. */
return result;
}
“ <th>RSS (KB)</th>\n”
“ </tr>\n”
“ </thead>\n”
“ <tbody>\n”;
/* The first buffer is the HTML source for the start of the page. */
vec[vec_length].iov_base = page_start;
vec[vec_length].iov_len = strlen (page_start);
++vec_length;
continues
13 0430 CH11 5/22/01 10:46 AM Page 250
/* Make sure the iovec array is long enough to hold this buffer
(plus one more because we’ll add an extra element when we’re done
listing processes). If not, grow it to twice its current size. */
if (vec_length == vec_size - 1) {
vec_size *= 2;
vec = xrealloc (vec, vec_size * sizeof (struct iovec));
}
/* Store this buffer as the next element of the array. */
vec[vec_length].iov_base = process_info;
vec[vec_length].iov_len = strlen (process_info);
++vec_length;
}
/* Add one last buffer with HTML that ends the page. */
vec[vec_length].iov_base = page_end;
vec[vec_length].iov_len = strlen (page_end);
++vec_length;
/* Output the entire page to the client file descriptor all at once. */
writev (fd, vec, vec_length);
/* Deallocate the buffers we created. The first and last are static
and should not be deallocated. */
for (i = 1; i < vec_length - 1; ++i)
free (vec[i].iov_base);
/* Deallocate the iovec array. */
free (vec);
}
13 0430 CH11 5/22/01 10:46 AM Page 251
Gathering process data and formatting it as an HTML table is broken down into
several simpler operations:
n get_uid_gid extracts the IDs of the owning user and group of a process.To do
this, the function invokes stat (see Section B.2, “stat,” in Appendix B) on the
process’s subdirectory in /proc (see Section 7.2, “Process Entries,” in Chapter 7).
The user and group that own this directory are identical to the process’s owning
user and group.
n get_user_name returns the username corresponding to a UID.This function
simply calls the C library function getpwuid, which consults the system’s
/etc/passwd file and returns a copy of the result. get_group_name returns the
group name corresponding to a GID. It uses the getgrgid call.
n get_program_name returns the name of the program running in a specified
process.This information is extracted from the stat entry in the process’s direc-
tory under /proc (see Section 7.2, “Process Entries,” in Chapter 7).We use this
entry rather than examining the exe symbolic link (see Section 7.2.4, “Process
Executable,” in Chapter 7) or cmdline entry (see Section 7.2.2, “Process
Argument List,” in Chapter 7) because the latter two are inaccessible if the
process running the server isn’t owned by the same user as the process being
examined. Also, reading from stat doesn’t force Linux to page the process under
examination back into memory, if it happens to be swapped out.
n get_rss returns the resident set size of a process.This information is available as
the second element in the contents of the process’s statm entry (see Section
7.2.6, “Process Memory Statistics,” in Chapter 7) in its /proc subdirectory.
n format_process_info generates a string containing HTML elements for a
single table row, representing a single process. After calling the functions listed
previously to obtain this information, it allocates a buffer and generates HTML
using snprintf.
n module_generate generates the entire HTML page, including the table.The
output consists of one string containing the start of the page and the table (in
page_start), one string for each table row (generated by format_process_info),
and one string containing the end of the table and the page (in page_end).
module_generate determines the PIDs of the processes running on the system
by examining the contents of /proc. It obtains a listing of this directory using
opendir and readdir (see Section B.6, “Reading Directory Contents,” in
Appendix B). It scans the contents, looking for entries whose names are com-
posed entirely of digits; these are taken to be process entries.
Potentially a large number of strings must be written to the client socket—one
each for the page start and end, plus one for each process. If we were to write
each string to the client socket file descriptor with a separate call to write, this
would generate unnecessary network traffic because each string may be sent in a
separate network packet.
13 0430 CH11 5/22/01 10:46 AM Page 252
To optimize packing of data into packets, we use a single call to writev instead
(see Section B.3, “Vector Reads and Writes,” in Appendix B).To do this, we
must construct an array of struct iovec objects, vec. However, because we do
not know the number of processes beforehand, we must start with a small array
and expand it as new processes are added.The variable vec_length contains the
number of elements of vec that are used, while vec_size contains the allocated
size of vec.When vec_length is about to exceed vec_size, we expand vec to
twice its size by calling xrealloc.When we’re done with the vector write, we
must deallocate all of the dynamically allocated strings pointed to by vec, and
then vec itself.
Listing 11.10 (Makefile) GNU Make Configuration File for Server Example
### Configuration. ####################################################
# Phony targets don’t correspond to files that are built; they’re names
# for conceptual build targets.
.PHONY: all clean
# All object files in the server depend on server.h. But use the
# default rule for building object files from source files.
$(OBJECTS): server.h
This builds the server program and the server module shared libraries.
% ls -l server *.so
-rwxr-xr-x 1 samuel samuel 25769 Mar 11 01:15 diskfree.so
-rwxr-xr-x 1 samuel samuel 31184 Mar 11 01:15 issue.so
-rwxr-xr-x 1 samuel samuel 41579 Mar 11 01:15 processes.so
-rwxr-xr-x 1 samuel samuel 71758 Mar 11 01:15 server
-rwxr-xr-x 1 samuel samuel 13980 Mar 11 01:15 time.so
The server is now running. Open a browser window, and attempt to contact the server
at this port number. Request a page whose name matches one of the modules. For
instance, to invoke the diskfree.so module, use this URL:
http://localhost:4000/diskfree
13 0430 CH11 5/22/01 10:46 AM Page 255
Instead of 4000, enter the port number you specified (or the port number that Linux
chose for you). Press Ctrl+C to kill the server when you’re done.
If you didn’t specify localhost as the server address, you can also connect to the
server with a Web browser running on another computer by using your computer’s
hostname in the URL—for example:
http://host.domain.com:4000/diskfree
If you specify the --verbose (-v) option, the server prints some information at startup
and displays the numerical Internet address of each client that connects to it. If you
connect via the localhost address, the client address will always be 127.0.0.1.
If you experiment with writing your own server modules, you may place them in a
different directory than the one containing the server module. In this case, specify
that directory with the --module-dir (-m) option.The server will look in this direc-
tory for server modules instead.
If you forget the syntax of the command-line options, invoke server with the
--help (-h) option.
% ./server --help
Usage: ./server [ options ]
-a, --address ADDR Bind to local address (by default, bind
to all local addresses).
-h, --help Print this information.
-m, --module-dir DIR Load modules from specified directory
(by default, use executable directory).
-p, --port PORT Bind to specified port.
-v, --verbose Print verbose messages.
11.5 Finishing Up
If you were really planning on releasing this program for general use, you’d need to
write documentation for it as well. Many people don’t realize that writing good docu-
mentation is just as difficult and time-consuming—and just as important—as writing
good software. However, software documentation is a subject for another book, so
we’ll leave you with a few references of where to learn more about documenting
GNU/Linux software.
You’d probably want to write a man page for the server program, for instance.This
is the first place many users will look for information about a program. Man pages are
formatted using a classic UNIX formatting system troff.To view the man page for
troff, which describes the format of troff files, invoke the following:
% man troff
To learn about how GNU/Linux locates man pages, consult the man page for the man
command itself by invoking this:
% man man
13 0430 CH11 5/22/01 10:46 AM Page 256
You might also want to write info pages, using the GNU Info system, for the server
and its modules. Naturally, documentation about the info system comes in info format;
to view it, invoke this line:
% info info
III
Appendixes
A
Other Development Tools
Using various command options, you can cause GCC to issue warnings about
many different types of questionable programming constructs.The -Wall option
enables most of these checks. For example, the compiler will produce a warning
about a comment that begins within another comment, about an incorrect return type
specified for main, and about a non void function omitting a return statement. If you
specify the -pedantic option, GCC emits warnings demanded by strict ANSI C and
ISO C++ compliance. For example, use of the GNU asm extension causes a warning
using this option. A few GNU extensions, such as using alternate keywords beginning
with _ _ (two underscores), will not trigger warning messages. Although the GCC
info pages deprecate use of this option, we recommend that you use it anyway and
avoid most GNU language extensions because GCC extensions tend to change
through time and frequently interact poorly with code optimization.
Consider compiling the “Hello World” program shown in Listing A.1.Though GCC
compiles the program without complaint, the source code does not obey ANSI C
rules. If you enable warnings by compiling with the -Wall -pedantic, GCC reveals
three questionable constructs.
% gcc -Wall -pedantic hello.c
hello.c:2: warning: return type defaults to ‘int’
hello.c: In function ‘main’:
hello.c:3: warning: implicit declaration of function ‘printf’
hello.c:4: warning: control reaches end of non-void function
In the sections that follow, we first describe how to use the more easily used malloc
checking and mtrace, and then ccmalloc and Electric Fence.
export turns on malloc checking. Specifying the value 2 causes the program to halt as
soon as an error is detected.
Using malloc checking is advantageous because the program need not be recom-
piled, but its capability to diagnose errors is limited. Basically, it checks that the alloca-
tor data structures have not been corrupted.Thus, it can detect double deallocation of
the same allocation. Also, writing just before the beginning of a memory allocation
can usually be detected because the allocator stores the size of each memory allocation
just before the allocated region.Thus, writing just before the allocated memory will
corrupt this number. Unfortunately, consistency checking can occur only when your
program calls allocation routines, not when it accesses memory, so many illegal reads
and writes can occur before an error is detected. In the previous example, the illegal
write was detected only when the allocated memory was deallocated.
1. Modify the source code to include <mcheck.h> and to invoke mtrace () as soon
as the program starts, at the beginning of main.The call to mtrace turns on
tracking of memory allocations and deallocations.
2. Specify the name of a file to store information about all memory allocations and
deallocations:
% export MALLOC_TRACE=memory.log
3. Run the program. All memory allocations and deallocations are stored in the
logging file.
15 0430 APPA 5/22/01 10:53 AM Page 264
4. Using the mtrace command, analyze the memory allocations and deallocations
to ensure that they match.
% mtrace my_program $MALLOC_TRACE
The messages produced by mtrace are relatively easy to understand. For example, for
our malloc-use example, the output would look like this:
- 0000000000 Free 3 was never alloc’d malloc-use.c:39
Execute the program to produce a report. For example, running our malloc-use pro-
gram to allocate but not deallocate memory produces the following report:
% ./ccmalloc-use 12
file-name=a.out does not contain valid symbols
trying to find executable in current directory ...
using symbols from ‘ccmalloc-use’
(to speed up this search specify ‘file ccmalloc-use’
in the startup file ‘.ccmalloc’)
Please enter a command: a 0 12
Please enter a command: q
15 0430 APPA 5/22/01 10:53 AM Page 265
.---------------.
|ccmalloc report|
========================================================
| total # of| allocated | deallocated | garbage |
+-----------+-------------+-------------+---------------+
| bytes| 60 | 48 | 12 |
+-----------+-------------+-------------+---------------+
|allocations| 2 | 1 | 1 |
+-------------------------------------------------------+
| number of checks: 1 |
| number of counts: 3 |
| retrieving function names for addresses ... done. |
| reading file info from gdb ... done. |
| sorting by number of not reclaimed bytes ... done. |
| number of call chains: 1 |
| number of ignored call chains: 0 |
| number of reported call chains: 1 |
| number of internal call chains: 1 |
| number of library call chains: 0 |
========================================================
|
*100.0% = 12 Bytes of garbage allocated in 1 allocation
| |
| | 0x400389cb in <???>
| |
| | 0x08049198 in <main>
| | at malloc-use.c:89
| |
| | 0x08048fdc in <allocate>
| | at malloc-use.c:30
| |
| ‘-----> 0x08049647 in <malloc>
| at src/wrapper.c:284
|
‘------------------------------------------------------
The last few lines indicate the chain of function calls that allocated memory that was
not deallocated.
To use ccmalloc to diagnose writes before the beginning or after the end of the
allocated region, you’ll have to modify the .ccmalloc file in the current directory.This
file is read when the program starts execution.
As with ccmalloc, your program’s object files must be linked with Electric Fence’s
library by appending -lefence to the linking command, for instance:
% gcc -g -Wall -pedantic malloc-use.o -o emalloc-use –lefence
As the program runs, allocated memory uses are checked for correctness. A violation
causes a segmentation fault:
% ./emalloc-use 12
Electric Fence 2.0.5 Copyright (C) 1987-1998 Bruce Perens.
Please enter a command: a 0 12
Please enter a command: r 0 12
Segmentation fault
Using a debugger, you can determine the context of the illegal action.
By default, Electric Fence diagnoses only accesses beyond the ends of allocations.To
find accesses before the beginning of allocations instead of accesses beyond the end of
allocations, use this code:
% export EF_PROTECT_BELOW=1
Electric Fence to find illegal memory accesses.This will eliminate almost all memory
errors.When using Electric Fence, you will need to be careful to not perform too
many allocations and deallocations because each allocation requires at least two pages
of memory. Using these two tools will reveal most memory errors.
The user is responsible for obeying (or disobeying) the rules on dynamic
memory use. */
#ifdef MTRACE
#include <mcheck.h>
#endif /* MTRACE */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
/* Deallocate memory. */
continues
15 0430 APPA 5/22/01 10:53 AM Page 268
#ifdef MTRACE
mtrace ();
#endif /* MTRACE */
if (argc != 2) {
fprintf (stderr, “%s: array-size\n”, argv[0]);
return 1;
}
case ‘a’:
fgets (command, sizeof (command), stdin);
if (sscanf (command, “%u %i”, &array_index, &size_or_position) == 2
&& array_index < array_size)
15 0430 APPA 5/22/01 10:53 AM Page 269
case ‘d’:
fgets (command, sizeof (command), stdin);
if (sscanf (command, “%u”, &array_index) == 1
&& array_index < array_size)
deallocate (&(array[array_index]));
else
error = 1;
break;
case ‘r’:
fgets (command, sizeof (command), stdin);
if (sscanf (command, “%u %i”, &array_index, &size_or_position) == 2
&& array_index < array_size)
read_from_memory (array[array_index], size_or_position);
else
error = 1;
break;
case ‘w’:
fgets (command, sizeof (command), stdin);
if (sscanf (command, “%u %i”, &array_index, &size_or_position) == 2
&& array_index < array_size)
write_to_memory (array[array_index], size_or_position);
else
error = 1;
break;
case ‘q’:
free ((void *) array);
return 0;
default:
error = 1;
}
}
A.3 Profiling
Now that your program is (hopefully) correct, we turn to speeding its execution.
Using the profiler gprof, you can determine which functions require the most execu-
tion time.This can help you determine which parts of the program to optimize or
rewrite to execute more quickly. It can also help you find errors. For example, you
may find that a particular function is called many more times than you expect.
15 0430 APPA 5/22/01 10:53 AM Page 270
In this section, we describe how to use gprof. Rewriting code to run more quickly
requires creativity and careful choice of algorithms.
Obtaining profiling information requires three steps:
1. Compile and link your program to enable profiling.
2. Execute your program to generate profiling data.
3. Use gprof to analyze and display the profiling data.
Before we illustrate these steps, we introduce a large enough program to make
profiling interesting.
1. In postfix notation, a binary operator is placed after its operands instead of between them.
So, for example, to multiply 6 and 8, you would use 6 8 ×.To multiply 6 and 8 and then add 5
to the result, you would use 6 8 × 5 +.
15 0430 APPA 5/22/01 10:53 AM Page 271
This enables collecting information about function calls and timing information.To
collect line-by-line use information, also specify the debugging flag -g.To count basic
block executions, such as the number of do-loop iterations, use -a.
The second step is to run the program.While it is running, profiling data is col-
lected into a file named gmon.out, only for those portions of the code that are exer-
cised.You must vary the program’s input or commands to exercise the code sections
that you want to profile.The program must terminate normally for the profiling file to
be written.
Computing the function decrement_number and all the functions it calls required
26.07% of the program’s total execution time. It was called 20,795,463 times. Each
individual execution required 0.0 seconds—namely, a time too small to measure.The
add function was invoked 1,787 times, presumably to compute the product. Each call
15 0430 APPA 5/22/01 10:53 AM Page 272
required 0.92 seconds.The copy_number function was invoked only 1,788 times, while
it and the functions it calls required only 0.15% of the total execution time.
Sometimes the mcount and profil functions used by profiling appear in the data.
In addition to the flat profile data, which indicates the total time spent within each
function, gprof produces call graph data showing the time spent in each function and
its children within the context of a function call chain:
index % time self children called name
<spontaneous>
[1] 100.0 0.00 6.75 main [1]
0.00 6.75 2/2 apply_binary_function [2]
0.00 0.00 1/1792 destroy_number [4]
0.00 0.00 1/1 number_to_unsigned_int [10]
0.00 0.00 3/3 string_to_number [12]
0.00 0.00 3/5 push_stack [16]
0.00 0.00 1/1 create_stack [18]
0.00 0.00 1/11 empty_stack [14]
0.00 0.00 1/5 pop_stack [15]
0.00 0.00 1/1 clear_stack [17]
-----------------------------------------------
0.00 6.75 2/2 main [1]
[2] 100.0 0.00 6.75 2 apply_binary_function [2]
0.00 6.74 1/1 product [3]
0.00 0.01 4/1792 destroy_number [4]
0.00 0.00 1/1 subtract [11]
0.00 0.00 4/11 empty_stack [14]
0.00 0.00 4/5 pop_stack [15]
0.00 0.00 2/5 push_stack [16]
-----------------------------------------------
0.00 6.74 1/1 apply_binary_function [2]
[3] 99.8 0.00 6.74 1 product [3]
1.02 2.65 1787/1792 destroy_number [4]
1.65 1.43 1787/1787 add [5]
0.00 0.00 1788/62413059 zerop [7]
0.00 0.00 1/1792 make_zero [13]
The first frame shows that executing main and its children required 100% of the pro-
gram’s 6.75 seconds. It called apply_binary_function twice, which was called a total
of two times throughout the entire program. Its caller was <spontaneous>; this indi-
cates that the profiler was not capable of determining who called main.This first frame
also shows that string_to_number called push_stack three times but was called five
times throughout the program.The third frame shows that executing product and the
functions it calls required 99.8% of the program’s total execution time. It was invoked
once by apply_binary_function.
The call graph data displays the total time spent executing a function and its chil-
dren. If the function call graph is a tree, this number is easy to compute, but recur-
sively defined functions must be treated specially. For example, the even function calls
odd, which calls even. Each largest such call cycle is given its own number and is dis-
15 0430 APPA 5/22/01 10:53 AM Page 273
played individually in the call graph data. Consider this profiling data from determin-
ing whether 1787 × 13 × 3 is even:
-----------------------------------------------
0.00 0.02 1/1 main [1]
[9] 0.1 0.00 0.02 1 apply_unary_function [9]
0.01 0.00 1/1 even <cycle 1> [13]
0.00 0.00 1/1806 destroy_number [5]
0.00 0.00 1/13 empty_stack [17]
0.00 0.00 1/6 pop_stack [18]
0.00 0.00 1/6 push_stack [19]
-----------------------------------------------
[10] 0.1 0.01 0.00 1+69693 <cycle 1 as a whole> [10]
0.00 0.00 34847 even <cycle 1> [13]
-----------------------------------------------
34847 even <cycle 1> [13]
[11] 0.1 0.01 0.00 34847 odd <cycle 1> [11]
0.00 0.00 34847/186997954 zerop [7]
0.00 0.00 1/1806 make_zero [16]
34846 even <cycle 1> [13]
The 1+69693 in the [10] frame indicates that cycle 1 was called once, while the func-
tions in the cycle were called 69,693 times.The cycle called the even function.The
next entry shows that odd was called 34,847 times by even.
In this section, we have briefly discussed only some of gprof’s features. Its info
pages contain information about other useful features:
n Use the -s option to sum the execution results from several different runs.
n Use the -c option to identify children that could have been called but were not.
n Use the -l option to display line-by-line profiling information.
n Use the -A option to display source code annotated with percentage execution
numbers.
The info pages also provide more information about the interpretation of the
analyzed data.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include “definitions.h”
/* Apply the binary function with operands obtained from the stack,
pushing the answer on the stack. Return nonzero upon success. */
/* Apply the unary function with an operand obtained from the stack,
pushing the answer on the stack. Return nonzero upon success. */
int main ()
{
char command_line[1000];
char* command_to_parse;
char* token;
Stack number_stack = create_stack ();
while (1) {
printf (“Please enter a postfix expression:\n”);
command_to_parse = fgets (command_line, sizeof (command_line), stdin);
if (command_to_parse == NULL)
return 0;
return 0;
}
15 0430 APPA 5/22/01 10:53 AM Page 276
The functions in Listing A.4 implement unary numbers using empty linked lists.
#include <assert.h>
#include <stdlib.h>
#include <limits.h>
#include “definitions.h”
number make_zero ()
{
return 0;
}
/* Add 1 to a number. */
/* Destroying a number. */
continues
15 0430 APPA 5/22/01 10:53 AM Page 278
The functions in Listing A.5 implement a stack of unary numbers using a linked list.
#include <assert.h>
#include <stdlib.h>
#include “definitions.h”
Stack create_stack ()
{
return 0;
}
continues
15 0430 APPA 5/22/01 10:53 AM Page 280
/* Operations on numbers. */
number make_zero ();
void destroy_number (number n);
number add (number n1, number n2);
number subtract (number n1, number n2);
number product (number n1, number n2);
number even (number n);
number odd (number n);
number string_to_number (char* char_number);
unsigned number_to_unsigned_int (number n);
#endif /* DEFINITIONS_H */
16 0430 APPB 5/22/01 10:58 AM Page 281
B
Low-Level I/O
1.The C++ standard library provides iostreams with similar functionality.The standard C
library is also available in the C++ language.
2. See Chapter 8, “Linux System Calls,” for an explanation of the difference between a system
call and an ordinary function call.
16 0430 APPB 5/22/01 10:58 AM Page 282
Throughout this book, we assume that you’re familiar with the calls described in this
appendix.You may already be familiar with them because they’re nearly the same as
those provided on other UNIX and UNIX-like operating systems (and on the Win32
platform as well). If you’re not familiar with them, however, read on; you’ll find the
rest of the book much easier to understand if you familiarize yourself with this
material first.
You can specify additional options by using the bitwise or of this value with one or
more flags.These are the most commonly used values:
n Specify O_TRUNC to truncate the opened file, if it previously existed. Data written
to the file descriptor will replace previous contents of the file.
n Specify O_APPEND to append to an existing file. Data written to the file descriptor
will be added to the end of the file.
n Specify O_CREAT to create a new file. If the filename that you provide to open
does not exist, a new file will be created, provided that the directory containing
it exists and that the process has permission to create files in that directory. If the
file already exists, it is opened instead.
n Specify O_EXCL with O_CREAT to force creation of a new file. If the file already
exists, the open call will fail.
If you call open with O_CREAT, provide an additional third argument specifying the per-
missions for the new file. See Chapter 10, “Security,” Section 10.3, “File System
Permissions,” for a description of permission bits and how to use them.
For example, the program in Listing B.1 creates a new file with the filename speci-
fied on the command line. It uses the O_EXCL flag with open, so if the file already
exists, an error occurs.The new file is given read and write permissions for the owner
and owning group, and read permissions only for others. (If your umask is set to a
nonzero value, the actual permissions may be more restrictive.)
Umasks
When you create a new file with open, some permission bits that you specify may be turned off. This is
because your umask is set to a nonzero value. A process’s umask specifies bits that are masked out of all
newly created files’ permissions. The actual permissions used are the bitwise and of the permissions you
specify to open and the bitwise complement of the umask.
To change your umask from the shell, use the umask command, and specify the numerical value of the
mask, in octal notation. To change the umask for a running process, use the umask call, passing it the
desired mask value to use for subsequent open calls.
For example, calling this line
% umask 027
specifies that write permissions for group members and read, write, and execute permissions for others
will always be masked out of a new file’s permissions.
16 0430 APPB 5/22/01 10:58 AM Page 284
return 0;
}
Note that the length of the new file is 0 because the program didn’t write any data to it.
char* get_timestamp ()
{
time_t now = time (NULL);
return asctime (localtime (&now));
}
Note that the first time we invoke timestamp, it creates the file tsfile, while the
second time it appends to it.
The write call returns the number of bytes that were actually written, or -1 if an
error occurred. For certain kinds of file descriptors, the number of bytes actually writ-
ten may be less than the number of bytes requested. In this case, it’s up to you to call
write again to write the rest of the data.The function in Listing B.3 demonstrates
how you might do this. Note that for some applications, you may have to check for
special conditions in the middle of the writing operation. For example, if you’re writ-
ing to a network socket, you’ll have to augment this function to detect whether the
network connection was closed in the middle of the write operation, and if it has, to
react appropriately.
continues
16 0430 APPB 5/22/01 10:58 AM Page 288
/* Read from the file, one chunk at a time. Continue until read
“comes up short”, that is, reads less than we asked for.
This indicates that we’ve hit the end of the file. */
do {
/* Read the next line’s worth of bytes. */
bytes_read = read (fd, buffer, sizeof (buffer));
/* Print the offset in the file, followed by the bytes themselves. */
printf (“0x%06x : “, offset);
for (i = 0; i < bytes_read; ++i)
printf (“%02x “, buffer[i]);
printf (“\n”);
/* Keep count of our position in the file. */
offset += bytes_read;
}
while (bytes_read == sizeof (buffer));
/* All done. */
close (fd);
return 0;
}
Here’s hexdump in action. It’s shown printing out a dump of its own executable file:
% ./hexdump hexdump
0x000000 : 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
0x000010 : 02 00 03 00 01 00 00 00 c0 83 04 08 34 00 00 00
0x000020 : e8 23 00 00 00 00 00 00 34 00 20 00 06 00 28 00
0x000030 : 1d 00 1a 00 06 00 00 00 34 00 00 00 34 80 04 08
...
Your output may be different, depending on the compiler you used to compile
hexdump and the compilation flags you specified.
The lseek call enables you to reposition a file descriptor in a file. Pass it the file
descriptor and two additional arguments specifying the new position.
n If the third argument is SEEK_SET, lseek interprets the second argument as a
position, in bytes, from the start of the file.
n If the third argument is SEEK_CUR, lseek interprets the second argument as an
offset, which may be positive or negative, from the current position.
n If the third argument is SEEK_END, lseek interprets the second argument as an
offset from the end of the file. A positive value indicates a position beyond the
end of the file.
The call to lseek returns the new position, as an offset from the beginning of the file.
The type of the offset is off_t. If an error occurs, lseek returns -1.You can’t use
lseek with some types of file descriptors, such as socket file descriptors.
If you want to find the position of a file descriptor in a file without changing it,
specify a 0 offset from the current position—for example:
off_t position = lseek (file_descriptor, 0, SEEK_CUR);
Linux enables you to use lseek to position a file descriptor beyond the end of the file.
Normally, if a file descriptor is positioned at the end of a file and you write to the file
descriptor, Linux automatically expands the file to make room for the new data. If you
position a file descriptor beyond the end of a file and then write to it, Linux first
expands the file to accommodate the “gap” that you created with the lseek operation
and then writes to the end of it.This gap, however, does not actually occupy space on
the disk; instead, Linux just makes a note of how long it is. If you later try to read
from the file, it appears to your program that the gap is filled with 0 bytes.
Using this behavior of lseek, it’s possible to create extremely large files that occupy
almost no disk space.The program lseek-huge in Listing B.5 does this. It takes as
command-line arguments a filename and a target file size, in megabytes.The program
opens a new file, advances past the end of the file using lseek, and then writes a single
0 byte before closing the file.
continues
16 0430 APPB 5/22/01 10:58 AM Page 290
return 0;
}
Using lseek-huge, we’ll make a 1GB (1024MB) file. Note the free space on the drive
before and after the operation.
% df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/hda5 2.9G 2.1G 655M 76% /
% ./lseek-huge bigfile 1024
% ls -l bigfile
-rw-r----- 1 samuel samuel 1073741824 Feb 5 16:29 bigfile
% df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/hda5 2.9G 2.1G 655M 76% /
No appreciable disk space is consumed, despite the enormous size of bigfile. Still, if
we open bigfile and read from it, it appears to be filled with 1GB worth of 0s. For
instance, we can examine its contents with the hexdump program of Listing B.4.
% ./hexdump bigfile | head -10
0x000000 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000010 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000020 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000030 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000040 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x000050 : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
If you run this yourself, you’ll probably want to kill it with Ctrl+C, rather than watch-
ing it print out 230 0 bytes.
Note that these magic gaps in files are a special feature of the ext2 file system that’s
typically used for GNU/Linux disks. If you try to use lseek-huge to create a file on
some other type of file system, such as the fat or vfat file systems used to mount
DOS and Windows partitions, you’ll find that the resulting file does actually occupy
the full amount of disk space.
Linux does not permit you to rewind before the start of a file with lseek.
16 0430 APPB 5/22/01 10:58 AM Page 291
B.2 stat
Using open and read, you can extract the contents of a file. But how about other
information? For instance, invoking ls -l displays, for the files in the current direc-
tory, such information as the file size, the last modification time, permissions, and the
owner.
The stat call obtains this information about a file. Call stat with the path to the
file you’re interested in and a pointer to a variable of type struct stat. If the call to
stat is successful, it returns 0 and fills in the fields of the structure with information
about that file; otherwise, it returns -1.
These are the most useful fields in struct stat:
n st_mode contains the file’s access permissions. File permissions are explained in
Section 10.3, “File System Permissions.”
n In addition to the access permissions, the st_mode field encodes the type of the
file in higher-order bits. See the text immediately following this bulleted list for
instructions on decoding this information.
n st_uid and st_gid contain the IDs of the user and group, respectively, to which
the file belongs. User and group IDs are described in Section 10.1, “Users and
Groups.”
n st_size contains the file size, in bytes.
n st_atime contains the time when this file was last accessed (read or written).
n st_mtime contains the time when this file was last modified.
These macros check the value of the st_mode field value to figure out what kind of
file you’ve invoked stat on. A macro evaluates to true if the file is of that type.
S_ISBLK (mode) block device
If you call stat on a symbolic link, stat follows the link and you can obtain the
information about the file that the link points to, not about the symbolic link itself.
This implies that S_ISLNK will never be true for the result of stat. Use the lstat
function if you don’t want to follow symbolic links; this function obtains information
about the link itself rather than the link’s target. If you call lstat on a file that isn’t a
symbolic link, it is equivalent to stat. Calling stat on a broken link (a link that points
to a nonexistent or inaccessible target) results in an error, while calling lstat on such
a link does not.
If you already have a file open for reading or writing, call fstat instead of stat.
This takes a file descriptor as its first argument instead of a path.
Listing B.6 presents a function that allocates a buffer large enough to hold the con-
tents of a file and then reads the file into the buffer.The function uses fstat to deter-
mine the size of the buffer that it needs to allocate and also to check that the file is
indeed a regular file.
/* Finish up. */
close (fd);
return buffer;
}
Listing B.7 (write-args.c) Write the Argument List to a File with writev
#include <fcntl.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>
close (fd);
free (vec);
return 0;
}
Note that stream and file_descriptor correspond to the same opened file. If you call
this line, you may no longer write to file_descriptor:
fclose (stream);
Similarly, if you call this line, you may no longer write to stream:
close (file_descriptor);
To go the other way, from a file descriptor to a stream, use the fdopen function.This
constructs a FILE* stream pointer corresponding to a file descriptor.The fdopen func-
tion takes a file descriptor argument and a string argument specifying the mode in
16 0430 APPB 5/22/01 10:58 AM Page 296
which to create the stream.The syntax of the mode argument is the same as that of
the second argument to fopen, and it must be compatible with the file descriptor. For
example, specify a mode of r for a read file descriptor or w for a write file descriptor.
As with fileno, the stream and file descriptor refer to the same open file, so if you
close one, you may not subsequently use the other.
/* Return a string that describes the type of the file system entry PATH. */
if (argc >= 2)
/* If a directory was specified on the command line, use it. */
dir_path = argv[1];
else
/* Otherwise, use the current directory. */
dir_path = “.”;
/* Copy the directory path into entry_path. */
strncpy (entry_path, dir_path, sizeof (entry_path));
path_len = strlen (dir_path);
/* If the directory path doesn’t end with a slash, append a slash. */
if (entry_path[path_len - 1] != ‘/’) {
entry_path[path_len] = ‘/’;
entry_path[path_len + 1] = ‘\0’;
++path_len;
}
/* All done. */
closedir (dir);
return 0;
}
Here are the first few lines of output from listing the /dev directory. (Your output
might differ somewhat.)
% ./listdir /dev
directory : /dev/.
directory : /dev/..
socket : /dev/log
character device : /dev/null
regular file : /dev/MAKEDEV
fifo : /dev/initctl
character device : /dev/agpgart
...
To verify this, you can use the ls command on the same directory. Specify the -U flag
to instruct ls not to sort the entries, and specify the -a flag to cause the current direc-
tory (.) and the parent directory (..) to be included.
% ls -lUa /dev
total 124
drwxr-xr-x 7 root root 36864 Feb 1 15:14 .
drwxr-xr-x 22 root root 4096 Oct 11 16:39 ..
srw-rw-rw- 1 root root 0 Dec 18 01:31 log
crw-rw-rw- 1 root root 1, 3 May 5 1998 null
-rwxr-xr-x 1 root root 26689 Mar 2 2000 MAKEDEV
prw------- 1 root root 0 Dec 11 18:37 initctl
crw-rw-r-- 1 root root 10, 175 Feb 3 2000 agpgart
...
The first character of each line in the output of ls indicates the type of the entry.
16 0430 APPB 5/22/01 10:58 AM Page 300
17 0430 APPC 5/22/01 10:59 AM Page 301
C
Table of Signals
T ABLE C.1 LISTS SOME OF THE LINUX SIGNALS YOU’RE MOST LIKELY to encounter or
use. Note that some signals have multiple interpretations, depending on where they
occur.
The names of the signals listed here are defined as preprocessor macros.To
use them in your program, include <signal.h>.The actual definitions are in
/usr/include/sys/signum.h, which is included as part of <signal.h>.
For a full list of Linux signals, including a short description of each and the default
behavior when the signal is delivered, consult the signal man page in Section 7 by
invoking the following:
% man 7 signal
D
Online Resources
E
Open Publication License
Version 1.0
The reference must be immediately followed with any options elected by the author(s)
or publisher of the document (see Section VI, “License Options”).
Commercial redistribution of Open Publication-licensed material is permitted.
Any publication in standard (paper) book form shall require the citation of the
original publisher and author.The publisher and author’s names shall appear on all
outer surfaces of the book. On all outer surfaces of the book, the original publisher’s
name shall be as large as the title of the work and cited as possessive with respect to
the title.
19 0430 APPE 5/22/01 11:05 AM Page 306
II. Copyright
The copyright to each Open Publication is owned by its author(s) or designee.
V. Good-Practice Recommendations
In addition to the requirements of this license, it is requested from and strongly rec-
ommended of redistributors that:
1. If you are distributing Open Publication works on hard copy or CD-ROM, you
provide email notification to the authors of your intent to redistribute at least 30
days before your manuscript or media freeze, to give the authors time to provide
updated documents.This notification should describe modifications, if any, made
to the document.
19 0430 APPE 5/22/01 11:05 AM Page 307
F
GNU General Public License1
Preamble
The licenses for most software are designed to take away your freedom to share and
change it. By contrast, the GNU General Public License is intended to guarantee your
freedom to share and change free software—to make sure the software is free for all its
users.This General Public License applies to most of the Free Software Foundation’s
software and to any other program whose authors commit to using it. (Some other
Free Software Foundation software is covered by the GNU Library General Public
License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our
General Public Licenses are designed to make sure that you have the freedom to
distribute copies of free software (and charge for this service if you wish), that you
receive source code or can get it if you want it, that you can change the software or
use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you
these rights or to ask you to surrender the rights.These restrictions translate to certain
responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee,
you must give the recipients all the rights that you have.You must make sure that they,
too, receive or can get the source code. And you must show them these terms so they
know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you
this license which gives you legal permission to copy, distribute and/or modify the
software.
Also, for each author’s protection and ours, we want to make certain that everyone
understands that there is no warranty for this free software. If the software is modified
by someone else and passed on, we want its recipients to know that what they have is
not the original, so that any problems introduced by others will not reflect on the
original authors’ reputations.
Finally, any free program is threatened constantly by software patents.We wish to
avoid the danger that redistributors of a free program will individually obtain patent
licenses, in effect making the program proprietary.To prevent this, we have made it
clear that any patent must be licensed for everyone’s free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
of any warranty; and give any other recipients of the Program a copy of this
License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at
your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion of it, thus
forming a work based on the Program, and copy and distribute such modifica-
tions or work under the terms of Section 1 above, provided that you also meet
all of these conditions:
n a) You must cause the modified files to carry prominent notices stating
that you changed the files and the date of any change.
n b) You must cause any work that you distribute or publish, that in whole
or in part contains or is derived from the Program or any part thereof, to
be licensed as a whole at no charge to all third parties under the terms of
this License.
n c) If the modified program normally reads commands interactively when
run, you must cause it, when started running for such interactive use in
the most ordinary way, to print or display an announcement including an
appropriate copyright notice and a notice that there is no warranty (or
else, saying that you provide a warranty) and that users may redistribute
the program under these conditions, and telling the user how to view a
copy of this License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on the
Program is not required to print an announcement.)
3. You may copy and distribute the Program (or a work based on it, under Section
2) in object code or executable form under the terms of Sections 1 and 2 above
provided that you also do one of the following:
n a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections 1 and
2 above on a medium customarily used for software interchange; or,
n b) Accompany it with a written offer, valid for at least three years, to give
any third party, for a charge no more than your cost of physically perform-
ing source distribution, a complete machine-readable copy of the corre-
sponding source code, to be distributed under the terms of Sections 1 and
2 above on a medium customarily used for software interchange; or,
n c) Accompany it with the information you received as to the offer to dis-
tribute corresponding source code. (This alternative is allowed only for
noncommercial distribution and only if you received the program in
object code or executable form with such an offer, in accord with
Subsection b above.)
The source code for a work means the preferred form of the work for making
modifications to it. For an executable work, complete source code means all the
source code for all modules it contains, plus any associated interface definition
files, plus the scripts used to control compilation and installation of the exe-
cutable. However, as a special exception, the source code distributed need not
include anything that is normally distributed (in either source or binary form)
with the major components (compiler, kernel, and so on) of the operating sys-
tem on which the executable runs, unless that component itself accompanies the
executable.
If distribution of executable or object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the source code
from the same place counts as distribution of the source code, even though third
parties are not compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program except as
expressly provided under this License. Any attempt otherwise to copy, modify,
sublicense or distribute the Program is void, and will automatically terminate
your rights under this License. However, parties who have received copies, or
rights, from you under this License will not have their licenses terminated so
long as such parties remain in full compliance.
5. You are not required to accept this License, since you have not signed it.
However, nothing else grants you permission to modify or distribute the
Program or its derivative works.These actions are prohibited by law if you do
not accept this License.Therefore, by modifying or distributing the Program (or
any work based on the Program), you indicate your acceptance of this License
to do so, and all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
20 0430 APPF 5/22/01 11:02 AM Page 313
6. Each time you redistribute the Program (or any work based on the Program),
the recipient automatically receives a license from the original licensor to copy,
distribute or modify the Program subject to these terms and conditions.You may
not impose any further restrictions on the recipients’ exercise of the rights
granted herein.You are not responsible for enforcing compliance by third parties
to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement or
for any other reason (not limited to patent issues), conditions are imposed on
you (whether by court order, agreement or otherwise) that contradict the condi-
tions of this License, they do not excuse you from the conditions of this License.
If you cannot distribute so as to satisfy simultaneously your obligations under
this License and any other pertinent obligations, then as a consequence you may
not distribute the Program at all. For example, if a patent license would not per-
mit royalty-free redistribution of the Program by all those who receive copies
directly or indirectly through you, then the only way you could satisfy both it
and this License would be to refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under any particu-
lar circumstance, the balance of the section is intended to apply and the section
as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or
other property right claims or to contest validity of any such claims; this section
has the sole purpose of protecting the integrity of the free software distribution
system, which is implemented by public license practices. Many people have
made generous contributions to the wide range of software distributed through
that system in reliance on consistent application of that system; it is up to the
author/donor to decide if he or she is willing to distribute software through any
other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a con-
sequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries
either by patents or by copyrighted interfaces, the original copyright holder who
places the Program under this License may add an explicit geographical distribu-
tion limitation excluding those countries, so that distribution is permitted only
in or among countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions of the
General Public License from time to time. Such new versions will be similar in
spirit to the present version, but may differ in detail to address new problems or
concerns.
20 0430 APPF 5/22/01 2:42 PM Page 314
No Warranty
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE,THERE
IS NO WARRANTY FOR THE PROGRAM,TO THE EXTENT PERMIT-
TED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN
WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM “AS IS”WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE.THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE,YOU
ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR
AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR
ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIM-
ITED TO LOSS OF DATA OR DATA BEING RENDERED INACCU-
RATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS
BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
20 0430 APPF 5/22/01 2:42 PM Page 315
This General Public License does not permit incorporating your program into propri-
etary programs. If your program is a subroutine library, you may consider it more use-
ful to permit linking proprietary applications with the library. If this is what you want
to do, use the GNU Library General Public License instead of this License.
FSF & GNU inquiries & questions to gnu@gnu.org.
Comments on these web pages to webmasters@www.gnu.org, send other questions
to gnu@gnu.org.
Copyright notice above.
Free Software Foundation, Inc., 59 Temple Place–Suite 330, Boston, MA 02111, USA
Updated: 31 Jul 2000 jonas
21 0430 index 5/22/01 2:26 PM Page 317
Index
Symbols
\$(CFLAGS), make variable, 10 system load information, 165
/dev directory, 132 system uptime information, 165-166
version number of kernel, 148, 160
/dev/full, 137
/proc/cpuinfo (system CPU
/dev/loop# (loopback devices),
information), 148-150, 159
139-142
/proc/devices (device information), 159
/dev/null (null device), 136
/proc/filesystems (file systems
/dev/pts (PTYs), 142-144
information), 161
/dev/random (random number
/proc/ide (IDE device information), 162
device), 137-139
/proc/loadavg (system load
/dev/urandom (random number
information), 165
device), 137-139
/proc/locks (file locks information),
/dev/zero, 136
164-165
mapped memory, 109
/proc/meminfo (memory usage of
/etc/services file, 125
kernel), 161
/proc file system, 147-148
/proc/mounts (mounted file system
CD-ROM drive information, 163
information), 163-164
CPU information, 159
device information, 159 /proc/pci (PCI bus information), 159
file locks information, 164-165 /proc/scsi/scsi (SCSI device
file size, 147 information), 163
file systems information, 161 /proc/self, 151-152
hostname and domain name, 160
IDE device information, 162 /proc/sys/dev/cdrom/info (CD-ROM
memory usage of kernel, 161 drive information), 163
mounted file system information, /proc/sys/kernel/domainname
163-164 (domain names), 160
output from, 148-150 /proc/sys/kernel/hostname
partition information, 163 (hostnames), 160
PCI bus information, 159
/proc/tty/driver/serial (serial port
process argument list, 152-154
information), 159-160
process directories, 150-151
process environment, 154-155 /proc/uptime (system uptime
process executable, 155-156 information), 165-166
process file descriptors, 156-158 /proc/version (version number of
process memory statistics, 158 kernel), 148, 160
process statistics, 158 /tmp directory, race conditions
SCSI device information, 163 (security hole), 213-216
serial port information, 159-160
| (pipe symbol), 110
21 0430 index 5/22/01 2:26 PM Page 318
commands 319
320 commands
322 documentation
documentation, 13 Emacs, 4
header files, 15 automatic formatting, 5
Info documentation system, 14-15 opening source files, 4
man pages, 14 running GDB in, 13
sample application program, 255-256 syntax highlighting, 5
source code, 15 environ global variable, 26
Domain Name Service (DNS), 123 environ process entry, 150, 154-155
domain names, environment
/proc/sys/kernel/domainname, 160 defined, 25-27
DoS (denial-of-service) attack, 216 printing, 25
DOS/Windows text files, reading, 287 processes, 154-155
drivers. See device drivers environment variables, 25-27
accessing, 26
dup2 system call, 112-113
clearing, 26
dup2.c (output redirection), as configuration information, 26-27
listing 5.8, 113 enumerating all, 26
dynamic linking (libraries), 36 MALLOC_CHECK, 263
dynamic memory allocation, 261-262 MALLOC_TRACE, 264
ccmalloc, 264-265 setting, 26
Electric Fence, 265-266 errno variable, 33
malloc, 262-263 error checking, 30
mtrace, 263-264 assert macro, 30-31
sample program, 267-269 resource allocation, 35-36
selecting development tools, 266-267 system call failures, 32
dynamic runtime loading, shared error codes, 33-35
libraries, 42-43 error codes, system call failures, 33-35
dynamically linked libraries. See shared error function, 225
libraries
error streams, redirection with pipes,
112-113
E error-checking functions, memory
allocation, 225
-e option (ps command), 47
error-checking mutexes, locking, 82
editors
errors, stderr (error stream), 23-24
defined, 4
Emacs, 4 example program. See sample
automatic formatting, 5 application program
opening source files, 4 exe process entry, 150, 155-156
running GDB in, 13 exec functions
syntax highlighting, 5 avoiding security holes, 217
effective user IDs versus real user IDs, creating processes, 48, 50-51
205-206 executable files, processes, 155-156
setuid programs, 206-208
execute permissions, warning
EINTR error code, 34 about, 204
Electric Fence (dynamic memory executing programs with the shell,
allocation), 265-266 security holes, 216-218
comparison with other dynamic
memory allocation tools, 262 execve system call, 168
21 0430 index 5/22/01 2:27 PM Page 323
functions 323
324 functions
I semaphores, 101
allocation and deallocation, 101
-I option (GCC compiler), 7 debugging, 105
I/O (input/output) initialization, 102
FIFO access, 115-116 wait and post operations, 103-104
input/output and error streams, 23-24 shared memory, 96
mmap function, 109 access speed, 96-97
redirection with pipes, 112-113 advantages and disadvantages, 101
allocation, 97-98
I/O functions, low-level. See low-level attachment and detachment, 98-99
I/O functions deallocation, 99
id command, 198 debugging, 100
IDE (Integrated Development example program, 99-100
Environment), 9 memory model, 97
sockets, 116
IDE device information,
connect function, 118
/proc/ide, 162
creating, 118
idle time information, /proc/uptime, destroying, 118
165-166 functions, list of, 117
Info documentation system, 14-15, 256 Internet-domain sockets, 123-125
init process, 59 local sockets, 119-123
send function, 118
initialization, semaphores servers, 118-119
(processes), 102 socket pairs, 125-126
inline assembly code. See terminology, 117
assembly code interval timers, setting, 185-186
input operands, asm syntax, 193 ioctl system call, 144
input. See I/O (input/output) IP (Internet Protocol), 123
Integrated Development Environment IPC. See interprocess communication
(IDE), 9
ipcrm command, 100
Intel x86 architectures, register
letters, 193 ipcrm sem command, 105
Internet Protocol (IP), 123 ipcs -s command, 105
Internet-domain sockets, 123-125 ipcs command, 100
interprocess communication (IPC) issue.c (GNU/Linux distrubution
defined, 95 information), listing 11.7, 240-242
mapped memory, 105 issue.so module (sample application
example programs, 106-108 program), 240-242
mmap function, 105-109 itemer.c (interval timers), listing 8.11,
private mappings, 109 185-186
shared file access, 108-109
pipes, 110
creating, 110 J-K
FIFOs, 114-116
parent-child process communication, -j option (ps command), 47
110-112 job control notification, in shell, 93
popen and pclose functions, 114 job-queue1.c (thread race conditions),
redirection, 112-113 listing 4.10, 78
21 0430 index 5/22/01 2:27 PM Page 327
listings 327
328 listings
330 memory
W-Z
wait functions, terminating processes,
56-57
wait operation (semaphores), 83,
103-104
-Wall option (GCC compiler), 260
Web sites, list of online resources,
303-304
where command, GDB, 12
whoami command, 207
Win32 named pipes, versus FIFOs, 116
Windows text files, reading, 287
write system call, 169, 285-286
write-all.c (write all buffered data),
listing B.3, 286
write-args.c (writev function),
listing B.7, 294-295
writev system call, 293-295
write_journal_entry.c (data buffer
flushing), listing 8.3, 173
writing
data to file descriptors, low-level I/O
functions, 285-286
man pages, 255
H O W TO C O N TA C T U S
EMAIL US
Contact us at: nrfeedback@newriders.com
• If you have comments or questions about this book
• To report errors that you have found in this book
M AT T E R
• If you have a book proposal to submit or are interested in writing for New Riders
• If you are an expert in a computer topic or technology and are interested in being a
technical editor who reviews manuscripts for technical accuracy
Contact us at: nreducation@newriders.com
• If you are an instructor from an educational institution who wants to preview New
Riders books for classroom use. Email should include your name, title, school, depart-
ment, address, phone number, office days/hours, text in use, and enrollment, along
with your request for desk/examination copies and/or additional information.
Contact us at: nrmedia@newriders.com
• If you are a member of the media who is interested in reviewing copies of New
Riders books. Send your name, mailing address, and email address, along with the
name of the publication or Web site you work for.
T H AT
B U L K P U R C H A S E S / C O R P O R AT E S A L E S
If you are interested in buying 10 or more copies of a title or want to set up an
account for your company to purchase directly from the publisher at a substantial
discount, contact us at 800-382-3419 or email your contact information to
corpsales@pearsontechgroup.com. A sales representative will contact you with
more information.
W R I T E TO U S
V O I C E S
C A L L / FA X U S
Toll-free (800) 571-5840
If outside U.S. (317) 581-3500
Ask for New Riders
F A X : (317) 581-4663
W W W. N E W R I D E R S . CO M
35710430BM 5/22/01 1:39 PM Page 342
TO P S E L L I N G B O O K S F R O M N E W R I D E R S
Inside XML
Berkeley DB Steven Holzner
Sleepycat Software Inside XML is a foundation book
that covers both the Microsoft
This book is a tutorial on using
and non-Microsoft approach to
the Berkeley DB, covering meth-
XML programming. It covers in
ods, architecture, data applica-
detail the hot aspects of XML,
tions, memory, and configuring
such as DTD’s vs. XML Schemas,
the APIs in Perl, Java, and Tcl,
CSS, XSL, XSLT, Xlinks,
etc. The second part of the
Xpointers, XHTML, RDF, CDF,
book is a reference section of
ISBN: 0735710201 parsing XML in Perl and Java, and
the various Berkeley DB APIs.
1152 pages much more.
ISBN: 0735710643 US $49.99
Available Summer 2001
US $49.99
PHP Functions Essential
Reference
The PHP Functions Essential
Reference is a simple, clear, and
authoritative function reference
that clarifies and expands upon
PHP's existing documentation. It
will help the reader write effec-
tive code that makes full use of
the rich variety of functions avail-
able in PHP.
ISBN 073570970X
500 pages
US $39.99
35710430BM 5/22/01 1:39 PM Page 343
www.informit.com
OPERATING SYSTEMS
New Riders has partnered with ■ Master the skills you need,
WEB DEVELOPMENT when you need them
InformIT.com to bring technical
■ Call on resources from
PROGRAMMING information to your desktop.
some of the best minds in
Drawing on New Riders authors the industry
NETWORKING
and reviewers to provide additional ■ Get answers when you need
CERTIFICATION information on topics you’re them, using InformIT’s
comprehensive library or
interested in, InformIT.com has live experts online
AND MORE…
free, in-depth information you ■ Go above and beyond what
won’t find anywhere else. you find in New Riders
Expert Access. books, extending your
Free Content. knowledge
www.informit.com ■ www.newriders.com
35710430BM 5/22/01 1:39 PM Page 344
Colophon
The ruins of the Stabian Baths in Pompeii, captured by photographer Mel Curtis, are featured on
the cover of this book. Said to be the largest and oldest of the baths, the Stabian baths also offered
massages and poetry readings. Residents of Pompeii visited these public baths daily.The baths are
named for their location on Stabian Street.
This book was written and edited in LaTeX, and then converted to Microsoft Word by New Riders
and laid out in QuarkXPress.The font used for the body text is Bembo and MCPdigital. It was
printed on 50# Husky Offset Smooth paper at R.R. Donnelley & Sons in Crawfordsville, Indiana.
Prepress consisted of PostScript computer-to-plate technology (filmless process).The cover was
printed at Moore Langen Printing in Terre Haute, Indiana, on Carolina, coated on one side.