Linux Programming by Example

:.
: PRENTICE
• • HALL
Prentice Hall Open Source Software Development Series
Programmi by Example
ARNOLD ROBBINS
Prentice Hall
Open Source Software Development Series
Arnold Robbins, Series Editor
((Real world code from real world applications n
Open Source technology has revolll(ionized the computing world. Many large-scale projects are
in production use worldwide, such as Apache, MySQL, and Postgres, with programmers writing
applications in a variety of languages includ ing Perl , Python , and PHP These technologies are in
use o n m any di fferent systems, ranging fro m proprietary sys tems , to Linux systems, to traditional
UNIX sys tems, to main fra mes.
T he Prentice Hall Open Source Software D evelopment Series is designed to bring you the
best of these Open Sou rce tech nologies. Not only will you learn how to use them for yo ur
projects, but you will learn ftom them. By seeing real code from real applications , yo u will learn
the best practices of Open So urce developers the wo rl d over.
Titles currently in the series include:
Linux®Debugging and Performance Tuning: Tips and Techniques

Steve Best
0 13 1492470 , Paper, 10/ 14/200 5
The book is no t o nly a high-level strategy gui de b ut also a book that combines strategy wi th
hands-o n deb ugging sessions and perfor mance tu n ing too ls and techniq ues.
Linux Programming by E..;ample: The Fundamentals

Arnold Ro bbins
0 13 1429647, Paper, 4/ 12/2 004
G rad ual ly, o ne step at a time, Robbins teaches both high level p rinciples and "un de r the hood"
techniques. This book will hel p the reader master the fundamentals needed to b uild serious
Lin ux software .
The Linux® Kernel Primer: A Top-Down Approach for x86 and PowerPC Architectures
Claud ia Salzbe rg, Gordo n Fischer, Steven Smolski
013118 1637, Paper, 9/21/2005
A comprehensive view of the Linux Kernel is presented in a top down ap proach-t he big picture
first wi th a clear view of all components , how they interrelate, and where the hardware/softwa re
separation exists. The coverage of both (he x86 and the PowerPC is unique to this book.
To my wife Miriam)
and my children)
Chana) Rivka) Nachum) and MaIka.
Linux Programming
by Example
Arnold Robbins
PRENTICE HALL
Professional Technical Reference
PREN T ICE
HAll Upper Saddle River, NJ 07458
PTR www.phptr.com
© 2004 Pearson Education, In c.
PRENTI C E
Publi shin g as Prenrice Hall Professio nal Technical Refere nce
HAll U pper Saddl e River, New Jersey 074 58
PTR
Prenrice H all PT R offe rs d isco unrs on m is book wh en orde red in quantiry for bul k purchases or special sales. Fo r
more info rmat io n, please conract: U.S. Co rporate and Governm enr Sales, 1-800-382-34 19,
corpsales@ pearsonrechgro up.com. For sales o uts ide of the U nited States, please co nract: Inrernational Sales,
1-3 17 -58 1-3793, inrernati o nal@pearsonrech group.com.
Porti ons of Chapter 1, Copyright © 1994 Arn old David Robbins, first appeared in an article in Issue 16 of Linux
JournaL, reprinred by permi ssion.
Porti on s of the documenratio n for Valgrind , C o pyright © 2003 Julian Seward , reprinred by permi ssion.
Portions of the documentatio n fo r the DBUG library, by Fred N. Fish, reprinred by permiss ion.
The GNU programs in this book are Copyright © 1985-2003, Free Software Foundati on , Inc .. T he full list of fil es
and copyright dates is provided in the Preface. Each program is "free software; you can redistribute it and/or modify
it un der the terms of the G NU General Pu blic License as pu blished by the Free Software Foundation; either version
2 of the License, or (at your option) any later version." Appendi x C of this book p rovides m e text of the GNU
General Public License .
.All V7 Unix code and docum enration are Copyri ght © C ald era International In c. 2001 -200 2. Al l ri ghts reserved .
They are reprinred here und er the terms of th e C aldera Ancient UN IX License, which is reprod uced in full in
Appendix B.
Cove r im age courtesy of Parks Sabers, Inc. T he Arc-Wave(tm) saber is manufactured by Parks Sabers, Inc., C opyright
© 2001 , www. parksabers.com. Parks Sabers is not associated with any Lucasfi lm Ltd . properry, fi lm, or franchi se.
The programs and applications presenred in thi s book have been included fo r th ei r instructi onal val ue. They h ave
been tested with care but are not guaranteed fo r any particular purpose. The publisher does not offe r any warranti es
or represe ntations, nor does it accept any li abi lities with respect to the programs or applications. UN IX is a registered
tradem ark of T he O pen G roup in the U ni ted States and oth er co untri es.
Microso ft, MS, and MS- D OS are registered trademarks, and W indows is a trad emark of Microsoft Co rpo rati o n in
the U ni ted States and other countries. Linux is a registered trademark of Linux Torvalds.
All com pany and product names mentioned herein are the tradematks or registered tradem arks of th eir respective
owners.
This ma te rial may be distributed only subj ect ro th e terms and conditions set fo rth in th e O pen Pu bli cati on License,
vl.O or later (the latest version is presently available at http://www.opencontenr.org/openp ub/). with License
Option B.
Printed in the United States of America
ISBN 0-13-142964-7
T ext printed on recycled paper
First primi ng
Pearso n Ed ucation LTD .
Pearson Ed ucation Austral ia PIT, Lim ited
Pearso n Education Sin gapo re, Pte. Ltd .
Pearso n Ed ucation North Asia Ltd.
Pearso n Ed ucation C an ada, Ltd.
Pearso n Ed ucacion de Mexico, S.A. de c.v.
Pearso n Education- Japan
Pearso n Ed ucation M alays ia, Ptd. Ltd.
Contents
Preface ........................................................................................................... XVII
PART I Files and Users ........................................... . 1

Chapter 1 Introduction ............................................................................. . 3
1.1 T he Linux/Unix File Model .. .. .... ............. .... ............ ...... ..... ..... ....................... . 4
1.1.1 Files and Permissions ... ............................... .. ....... ... ............ ..................... . 4
1.1.2 Director ies and Filenames ......................... .. ...................... ........... ......... .. . 6
1.1.3 Executable Files ..... ................. .. ........ ................. .......... ... ...... ..... .. ...... .. ... . . 7
l.1.4 Devices .... ...... ... .. ................... ... ....... ...... .. ...... ........ ........ .. ..... ....... ... ........ . . 9
1.2 The Linux/Un ix Process Model ....................................... ........ ...... ............. .... .. 10
l.2.1 Pipes: H ooking Processes Together .............................................. .......... .. 12
1.3 Standard C vs. Original C ....................................................................... .. ...... . 12
l.4 Why GNU Programs Are Better .... .. .. .. .... ...... .............. .. .. .. ............ .................. . 14
1.4.1 Program Design ....................................................... ... ... ...... ... .... ........... .. 15
1.4.2 Program Behavior ............ ......... .......... .... ........ ... .......... .. ....... ... ..... .... .. .... .. 16
1.4 .3 C Code Programming ............ .. ................. ........................ .............. .. .. ... .. 16
l.4.4 Things That Make a GNU Program Better ........... ....................... ... ........ . 17
1.4.5 Parting Thoughts about the "GNU Coding Standards" .............. .. ......... .. 19
1.5 Portability Revisited .................. ....... ......... ........... ........ ............... ..... .. ..... .. .. ..... 19
l.6 Suggested Reading ............ .... ...... ........ ............. .... .. .... .. .. .............. ........ .. ........... 20
1.7 Summary .. ............... ............. .... .. .............. ... ... ....... ........................................... 21
Exercises .. ....... ....... .. .. ...... ... ............... ............. .. .. .... ..... ... .................. ............. ... ........... 22
Chapter 2 Arguments, Options, and the Environment ................................ 23
2.1 Option and Argument Co nventions ............................................... ................. . 24

2.1.1 POSIX Conventions .... ....... ..................... ...... .. ...... .... ... ........... ................ . 25
2.1 .2 GNU Long Options ............ .......... .. ................. . .. .... ............... .... ...... .. .... .. 27
v
VI Coments
2.2 Basic Command-Line Processing ............................ .......................................... 28

2.2.1 The V7 echo Program ............... . ............ .. .............. ........ .. .... .... .... .. ........ 29
2.3 Option Parsing: getopt () and getopt_long () ................ ... ...... ........ ...... 30
2.3.1 Single-Letter Options ....... .. ..... ... .................... ... .... .. ..... .. ............ .. .... ........ 30
2.3.2 GNU getopt () and Option Ordering .. ... ... .. .. .. ....... ....... .. .. ..... ... ... ....... 33
2.3.3 Long Options ... ... ..................................................................................... 34
2.3.3.1 Long Options Table .. .... .... .... ...................... ......... .......... ....... .. ..... ...... 34
2.3.3.2 Long Options, POSIX Style ........ ...................... .. ............................... 37
2.3.3.3 getopt_long () Return Value Summary ............................... ...... ... 37
2.3.3.4 GNU getopt () or getopt_ long () in User Programs ......... ..... .. 39
2.4 The Envi ronment ............................................................................ ............. .... 40
2.4.1 Environment Management Functions ................. .. ..... ................. ............ . 41
2.4.2 The Entire Environment: environ ........ ........................................ ....... . 43
2.4.3 GNU env ........ ........ ... ...... ..... .. .................. .............. .... .. ................. ........ . 44
2.5 Summary .......................... ...... ... ... ........ .................. .. ......................... ......... ...... 49
Exercises.. .... ... .... ........... .... .......... .. . ...... ... .... .................. ... ................. ....... ................... 50
Chapter 3 User-Level Memory Management .............................................. . 51
3.1 Linux/Unix Address Space ............................. .................. ... ...... ................ ....... . 52
3.2 Memory Allocation ....................................................... .............................. .... . 56
3.2.1 Library Calls: malloc (), calloc (), realloc (), free () ............ .. 56
3.2.1.1 Examining C Language Details ............................................ .. ............ . 57
3.2.1.2 Initially Allocating Memory: malloc () .......................................... . 58
3.2.1.3 Releasing Memory: free () ...................... .. .................................... .. 60
3.2.1.4 Changing Size: realloc () ............................................................ .. 62
3.2.1.5 Allocating and Zero-filling: calloc () ............................................ .. 65
3.2.1.6 Summarizing from the GNU Coding Standards .... .................... ........ .. 66
3.2.1.7 Using Private Allocators .................................................................... .. 67
3.2.1.8 Example: Reading Arbitrarily Long Lines ...................... .... ................ . 67
3.2.1.9 GLIBC Only: Reading Entire Lines: getline ( ) and getdelim () . 73
3.2.2 String Copying: s trdup () .................................................................... . 74
3.2.3 System Calls: brk () and sbrk () .......... .. ........................................ .... .. 75
3.2.4 Lazy Programmer Calls: alloca () .......... .............. ............................... . 76
3.2.5 Address Space Examination .. .............................................. .................. .. .. 78
3.3 Summary .. ... .. ....................... ........... ......................................... .. ..... ......... ... .... . 80
Exercises ........................ ............................................... ........ ........................... ...... .. .. .. 81
Contents VII
Chapter 4 Files and File I/O ...................................................... ................. 83

4.1 Introducing the Linux/Unix 1/0 Model...... ..... ... .. ...... ........ .......... ........... ........ 84
4.2 Presenting a Basic Program Structure ..... .... ......... .... ....... ............. .... .. ....... ... . .... 84
4 .3 Determining What Went Wrong ......... .... ........................................................ 86
4.3. 1 Values for errno .................................. ...... ... .......................... ..... ........... 87
4.3.2 Error Message Style ......... .. ......... .. ................ ..... ........... ...... ....... ....... ..... .. . 90
4.4 Doing Input and O utput .............................. ........ ....................... ... ................. 91
4.4 .1 U nderstandin g File Descriptors ...... .. ........ ........ .... .... ........ ..... ........... .. ...... 92
4.4.2 Opening and Closing Files ....................................................................... 93
4.4.2 .1 Mapping FIL E * Variables to File Descriptors .................... ....... .. .... 95
4.4. 2.2 Closin g All Open Files .... .. .................... .. .... ........ .... ........ .. .................. 96
4.4.3 Reading and Writing ................................................... .. ...... ...... .. ............. 96
4.4.4 Example: Unix cat ................................ .... .. ............ ...... .......................... 99
4.5 Random Access: Movin g Around within a File .. .. ...... ...... .. ........................... .... 102
4.6 Creating Files .. ................. ... ............ .... ................................. .... ..................... ... 106
4.6.1 Specifying Initial File Permissions ...... ........ ...... ..... .. ... ................ ...... ........ 106
4. 6.2 C reating Files with crea t () ............ .... ........... .... .... ..... .. .... .... .. ...... ........ 109
4.6.3 Revisiting open () ........ ...... ... .. ..... .. ...... .. ...................... ...... ..... ............... . 110
4.7 Forcing Data to Disk ... .... .. ............ ...... .......... .. .... .... ................ .... .. ............ .... .. . 113
4.8 Setting File Length .................. ...... .... ...... .. .................. ..... .......................... .. .... 114
4.9 Summary.......... ........ ...... .... ......................... ............................ ... .... .................. 115
Exercises .... .. ........ .... ... .. .. .... .. ..... .. ... .... ....... ...... ..... .. ......... ........ .. .......... ... .................. ... 115
Chapter 5 Directories and File Metadata .................................................... 117
5.1 Co nside ring Directory Co ntents.. ......... ..... ............ .......... ................................. 118
5.1.1 Definitions ........ .... ............. ............ .. .......................................... ...... ...... .. 118
5.1 .2 Directory Contents ................. ....................... ... .. ........ ........ .... .... .. ...... .... .. 120
5.1.3 H ard Links .. .. ........... .. ... .. ........ ...... ...... ....... ... ... ... ..... .. .. .................... .... ... . 122
5. 1.3. 1 The GNU 1 ink Program .. .... ........ .... ............................................. .. . 123
5. 1.3.2 Dot and Dot-Dot ........ .... ...... ........................ ..... ........... ................... . . 125
5.1.4 File Renaming ........ .... ....... .. .... .. .... ...... ............ .. ...... ......... .... .. .. ............... . 125
5.1.5 File Removal ......... ..... .......... ... .... ............................. .... .... ...... ................. . 126
5. 1.5.1 Removing Open Files .. .... ........ ........ ................. .. ........ .. .. .... .... .......... .. 127
5. 1.5.2 Using ISO C: r emove () .... ............................................................. . 127
5.1.6 Symbolic Links ........ ................ .. ... ....... ........ .......... .. ............................... .. 128
VIII Contents
5.2 Creating and Removing Directories ... .... ....... .. ......... .... ... .. .... .. .......... ... ............. 130
5.3 Reading Directories ..... ..... .... ............ ...... ...... .... ......... ... ..... .. .. ...... ............... ..... . 132
5.3.1 Basic Directory Reading .... ... ..... .... ............ ...... ................ ......................... 133
5.3.1.1 Portability Considerations ... .... ........... .. .............................................. 136
5.3. 1.2 Linux and BSD Directory Entries ...... ..... ... .......... ..... ............. ...... .... .. . 137
5.3.2 BSD Directory Positioning Functions ............. ..... ..... ................. .. ... ........ . 138
5.4 Obtaining Information about Files ........... ..... ...................... ... .......... .. .... .......... 139
5.4.1 Linux File Types .... ........ ....... ....... ... ........................ .... ...... ...... .... ..... .... .... . 139
5.4.2 Retrieving File Information ... ..... .......... ..... .. ..... .... ..... ............. ..... .... .... ..... 141
5.4.3 Linux Only: Specifying Higher-Precision File Times ....... ... .. .... ... ... .......... 143
5.4.4 Determining File Type .. .... ........... ..... ................ ..... .... ...... .. ...... ................ 144
5.4.4.1 Device Information ... ................. ... ...... ....... .. ............. .. ........ ...... ....... .. 147
5.4.4.2 The V7 cat Revisited .. ..... .... .. ... ..... ... ... ... .... .... .. .... ... ... .. ... ..... ... .. ....... 150
5.4 .5 Working with Symbolic Links ....... ... ..... ...... ........... ... ........ ...... .. .... ..... ...... 151
5.5 Changing Ownership, Permission, and Modification Times .. ... ....... .. ............. .. 155
5.5.1 Changing File Ownership: chown (), fch own (), and l c h own () ....... 155
5.5.2 Changing Permissions: chmod() and f chrnod() ..... ... ........... .......... ... .. 156
5.5.3 Changing Timestamps: ut ime () ........ ........... ..... ..... ........ .... ........ ........ ... 157
5.5.3.1 Fakingutime ( f ile, NULL) ... . ... ........ .. .. ................ .. ................... . 159
5.5.4 Using fc h own () and fchrn od () for Security ...... ....... ... .................. .. ... 161
5.6 Summary ............ .. ....... ............. ...... ... ........... ....... ... .. .. ..... ..... ............. ...... ... .... .. 162
Exe rcises. .. ....... ... .......... ...... ........... ..... ...... .... .... ........ .... .. .. .... ... ..... ... ............. ... ..... ....... 163
Chapter 6 General Library Interfaces - Part 1 ............................................ 165
6.1 Times and Dates ......... ............. ........ .......... .......... ................. ......... ... ... ...... .. .... 166
6.1.1 Retrieving the Current Time: time () and difftime () .................. .. .. 167
6.1.2 Breaking Down Times: gmtime () and l ocalt ime () ... .......... ..... ....... 168
6.1.3 Formatting Dates and Times ............. .. .. .. .......... ..... .......... .. .. ......... ..... ...... 170
6.1.3. 1 Simple Time Formatting: asctime () and c time ( ) ..... ......... ........ 170
6.1.3 .2 Complex Time Formatting: str ft i me () .. ... .. .. ......... .... ... ... ....... ... .. 171
6.1. 4 Converting a Broken-Down Time to a t i me _ t ... ............ ........ ..... .. ........ 176
6.1.5 Getting Time-Zone Information .... ...... ......... ... .. .. ... .. ............................... 178
6.1.5.1 BSD Systems Gotcha: time zone ( ) , Not timezone ....... .......... ..... 179
6.2 Sorting and Searching Functions ... ............................................... .. ... .. .. ........... 181
6.2.1 Sorting: qso rt () .... ........ ... ..... ....... ... ....... ...... .... ....... ...... ........... ... ... .... .. 181
Contems IX
6.2.1.1 Example: Sorting Employees .................... ................... ............. ......... . 183

6.2.1.2 Example: So rring Directory Contents ... .......... .... .. .. .. .. ....... ................ . 188
6 .2.2 Binary Searchin g: bsearch () ........ .............. .... ................ .... ..... .. .......... . 191
6.3 User and Group Names .............. ...... ........................... ............ ...... ...... ............ . 195
6.3. 1 User Database ......... ........... .. .................... ............................................... . 196
6.3 .2 Group Database ........... ...... .. ................................................................... . 199
6.4 Terminals: isatty () ........ ................................................... ........................ .. 202
6.5 Suggested Reading .......... ................. .............................. ...... .... ........ ............. .. .. 203
6.6 Summary ............................. .......... ... ...... ......................................................... . 203
Exercises ................. .. ................................. .. ...... .. ....... .. ... .......... .. ............................... . 205
Chapter 7 Putting It All Together: ls ......................................................... 207
7. 1 V7 ls Options ......................................................... .. ...................................... 208

7.2 V7 ls Code ......................................................... .. ............................. .. .... .... .. . 209
7.3 Summary ....... .......... .... ...... ..... .......... .. ................ .. ...... ........ ...... ... ..................... 225
Exercises............ .................................. ...... ...................... ................... .. ........ ............. .. 226
Chapter 8 Filesystems and Directory Walks ................................................ 227
8. 1 Mouming and Unmounting Fi lesystems ...................................................... .... 228

8.1.1 Reviewing the Background ...... .. .................................. ....... .... .. .............. .. 228
8.1.2 Looking at Different Filesystem Types .............. .. ...... .. .................. .. .... .... . 232
8.1.3 Mounting Filesystems: mount ............................................... .. ................ 236
8.1.4 Unmounti ng Filesystems: urnount ................................................ .......... 237
8.2 Files for Filesystem Admin istration ................................................................... 238
8.2.1 Using Mount Options .................. .. ............ ............ .................................. 239
8.2.2 Working with Mounted Filesystems: getrnntent () ................ .............. 24 1
8.3 Retrieving Per-Filesystem Info rmation .......................................................... .. . 244
8.3 .1 POSIX Style: statvfs () and fstatvfs () ................ .. .. .. ...... .. .......... 244
8.3.2 Linux Style: statfs () and fstatfs () ............ .. ............. .. .................. 252
8.4 Moving Around in the File Hierarchy...... ...... .................................. ...... ...... .... 256
8.4.1 Changin g Directory: chdir () and fchdir () ...................................... 256
8.4.2 Getting the Current Directory: getcwd () ....... .. .................. ............... .. . 258
8.4.3 Walkin g a Hierarchy: nftw () ........................................... .. ................... . 260
8.4.3. 1 T he nf tw () Interface .................... ................... ................................ 26 1
8.4.3.2 The nftw () Callback Function ...... .... ............................. .. ............... 263
x Contents
8.5 Walking a File Tree: GNU du .. ................ .... ... ... .................. ..... .. .. .. ... ........... .. 269
8.6 Changi ng the Root Directory: c hr oo t () ..... ... .. ...... .... .. ...... ..... .... ..... ........ .... . 276
8.7 Summary ... ................ .. .. .... .. ... ... .......... .... .... ..... .. ... .. .... ...... ...... ... ... .... ... .. .... .. .... 277
Exercises .. ..... ... .... ... .... .. ...... ... ...... .. .. .... ...... ...... .. ... ... ..... ..... .... ... .... ......... ....... .. .. ... ..... ... 278
PART II Processes, IPC, and Internationalization ..... .. 281

Chapter 9 Process Management and Pipes ....... ........... ............ ...... .......... .. . 283
9.1 Process Creation and Managem ent .. ...... .. .. .. ...... .. ...... .. ...... .. .... .. .. .. .. .. .... .. ........ . 284
9.l. 1 Creating a Process: fo rk () ...... ............ .. .... .................................... .. .... .. . 284
9.l.1.1 After the fork () : Shared and Distinct Attributes .......... .. .... .. .. ........ . 285
9. l.l.2 File Descrip to r Sharing .... .... .............. ............ .. .. .... .. .. .. .. .. .......... .. .... ... 286
9.1.l.3 File Descriptor Sharing and clo s e () ...... .. .. .. .. .. .. .. ....... .... .. ...... .. .... .. 288
9.1.2 Identifying a Process: getpid ( ) and getppid () ...... .. .. .... .... .. ........ .. .. 289
9.1.3 Setting Process Priority: ni c e () .. ............ .. ...... .... ...... .. .... .. .. .. .... ... .... ...... 29 1
9.1.3.1 POSIX vs. Reality ........ ...... .... .. .......... .. ........ .. .. .. .. .... ...... ...... ... ............ 293
9.1.4 Starting New Program s: T he exec () Family ...... .... .. ...... ........ ........ .... .. .. 293
9.l.4.1 T he e x ecve ( ) System Call ............ .. .................... .... .. .. ...... ...... .... .. .. 294
9. l.4.2 Wrapper Functions: e xec l () et al. ............ .. ........ .. .. .... .. .. .. .. ... ......... 295
9. l.4.3 Program N ames and a rgv[O) .. .. .. .... .... .. .. .. .. .. .. ............ .. .......... .... .... 297
9.1.4.4 Attributes Inherited across exe c () .. .. .. .. .. .. .. .. .... .. ....................... .. .. .. 298
9.1. 5 Terminating a Process .......... .. .. ...... .. .. .. .. .......... ........ .... .. ........ .. .............. .. 300
9.1.5.1 D efining Process Exit Status .. .. ...... .... .......... .. .. .. .. .. .. ........ .. ....... ......... . 300
9.1.5.2 Returning from ma i n ( ) .................. .. .. .. .......... ........... .. ......... ....... .... . 30 1
9.1.5.3 Exiting Functions .. .. ..... .. .. ...... .. ............ .. .... .... ........ .. .. .. .. .. .......... ....... . 302
9.l. 6 Recovering a Child's Exit Status .. .. .... .. .... ................ ................. .. .. .. .. .. ...... 305
9.l.6. 1 Using POSIX Fun ctions: wa it ( ) and wai tp i d () .. ......... ...... .. ...... 306
9. l.6.2 Using BSD Functions: wai t3 () and wai t4 () ...... .. ... .... ................ 310
9.2 Process Groups .. .... .. ....... .. ...... ...... .. .... .. ................................ ..... ... .... .... .......... .. 312
9.2.1 Job C ontrol Overview .. .. .. ...... ........ .... ........ ...... .... .... .................. .. ............ 312
9.2.2 Process Gro up Identification: getpg r p () and g etpgid ( ) .... .. ............ 314
9.2.3 Process Gro up Setting: s etpgid ( ) and se t pgrp () .. .. .......... .... .......... 314
9. 3 Basic Interprocess Communication: Pipes and FIFOs .... .... .. .. .. ... .. .. ............ .. ... 315
9.3.1 Pipes ........ .. .. ................. ....... .. .... .... .. .. .. ..... ..... ...... ..... .. .... ... .. .......... .... ....... 315
9.3.1.1 Creating Pipes .... ...... .. .. .................... .... .. .. .. ...... ....... .. ...... .. ...... .... .. ..... 316
9.3.1.2 Pipe Buffering .. .... ...... ...... .. .... .. ............ ......... .................... .. ............ .. . 318
Contents XI
9.3.2 FIFOs .. ....... .............. ... ....... .. .. .... ..... .... .......... .. ...... .... ..... ...... ........ ............ 319
9.4 File Descriptor M anagement .. .... .. ... ...... .. .............. .... ..... .... ... .... .. ............. .. ...... 320
9.4.1 Duplicating Open Files: dup () and dup2 () ...... ...... ......... ....... ............. 321
9.4.2 Creating Nonlinear Pipelines: I dev I fd l xx ........................................... 326
9.4.3 Managing File Attributes: fcntl () ..... ................ .. ......... ....... .......... ....... 328
9.4.3.1 The Close-on-exec Flag ... ......... .. .... ... .... ........... ..... ........................... .. 329
9.4.3.2 File Descriptor Duplication ................................................................ 331
9.4.3.3 Manipulation of File Status Flags and Access Modes ............ .............. 332
9.4.3.4 Nonblocking I/O for Pipes and FIFOs .... .......................... .... ............. 333
9.4.3.5 fcntl () Summary ................ .. .... .. .. .. ...... .. ........................ .. .... ........ . 336
9.5 Example: Two-Way Pipes in gawk .................................................................. 337
9.6 Suggested Reading ...... .... ................... ............. .......... ...... .. .. .... .. .............. .. .. .. .. .. 34 1
9.7 Summary ..................... ... .. ... ...... .. ....... ............... .. ............. ... .. ........................... 342
Exercises ... ........ ... ........ .... ........... ...................... ... .................. ...... ... ............... .. .. ..... ... .. 344
Chapter 10 Signals ...... .... ...... .. .. ........................... .. .......... ........ ..... .... .. .. .... .. . 347
10.1 Introduction .......................... ... . ..................................... ................ ...... ............ 348

10.2 Signal Actions.......... ......... ... ..... ............ .... . ................ .. ..... .. .... .. ........................ 348
10.3 Standard C Signals: signal () and raise () ................... .... .... .. .. ............ .. .. 349
10.3.1 The signal () Function .............................. .................................. .. ...... 349
10.3. 2 Sending Signals Programmatically: raise () ....... ...... ............... ...... ........ 353
10.4 Signal H andlers in Action .......................................... .. ..... .. ...... .... .................. .. 353
10.4.1 Traditional Systems .................................................................................. 353
10.4.2 BSD and GNU/Linux ........ .......................... .. .. .. ........................ .. ........ .... 356
10.4. 3 Ignoring Signals ....................................................................................... 356
10.4.4 Restartable System Calls ........................................................................... 357
10.4.4.1 Example: GNU Coreutils safe_read () and safe_ wri te () ...... 359
10.4.4.2 GLIBC Only: TEMP_FA I LURE_RETRY () ....................................... 360
10.4.5 Race Conditions and sig_a tomic_t (ISO C) ........ .. ...................... .. .... 361
10.4.6 Additional Caveats ............ .. ...... .. ............. ..................... ........................... 363
10.4.7 Our Story So Far, Episode I .................. .................. .. ............................... 363
10.5 The System V Release 3 Signal APIs: sigset () et al. .................................... 365
10.6 POSIX Signals ......... ......................................................................................... 367
10.6.1 Uncovering the Problem...... ...... ...... .. ........ ........ .. ...... ........................ .... .. 367
10.6.2 Signal Sets: sigset_t and Related Functions .......... .. .. ...... .. .................. 368
XII Contents
10.6.3 Managing the Signal Mask: sigpr ocmask () et al . .. .. .......................... . 369

10.6.4 Catching Signals: sigact ion () ....... ............ ... ................. ............ .. .... ... 370
10.6.5 Retrieving Pending Signals: sigpending () .. ........ ..... .............. ........ .. ... 375
10.6.6 Making Functions Interruptible: siginterrupt () ..................... ....... . 376
10.6.7 Sending Signals: kill () and killpg () .. ... .... ...... .... .... .. ........ .... .... ..... . 376
10.6.8 Our Story So Far, Episode II .. .......... .. ... .... .. ....... ...... ........ ....... ... ..... .. ...... . 378
10.7 Signals for Interprocess Communication .. ............ ............................................ 379
10.8 Important Special-Purpose Signals .................. ...... .. ....... .................................. 382
10.8.1 Alarm Clocks: sleep ( ) , alarm ( ) , and SIGALRM ............................. .. 382
10.8.1.1 Harder but with More Control: a larm() and S IGALRM ...... ........... 382
10.8.1.2 Simple and Easy: sleep () ..... ............................... ........ ...... ........ ...... 383
10.8.2 Job Control Signals ................. ...... ................ .... ... ... .... .... .. ... ........... .. ... .... 383
10.8.3 Parental Supervision: Three Different Strategies ....... .. .. .. ....... .. ........ ......... 385
10.8.3.1 Poor Parenting: Ignoring Children Completely......... .. ............... .... .... 385
10.8.3.2 Permissive Parenting: Supervising Minimally.. ......... .. ... ....... ......... .. .. 386
10.8.3.3 Strict Parental Control........................................... ..... ..... ............. ...... 393
10.9 SignalsAcrossf ork() andexec() .. .................................... .............. .. ......... 398
10.10 Summary .... .. ..... ...................................... ........... ..... ... ... .. ............................. .. .. 399
Exercises ........ .. ..... ... ....... .. ........ .... ..... ... .............. .. .. .......... ... ................................... ..... 401
Chapter 11 Permissions and User and Group 10 Numbers ........................... 403
11.1 Checking Permissions .. ...... ....... ................... .. ............. ... .. ..... ...... .. ... .. ..... .. ..... ... 404
11.1.1 Real and Effective IDs ...... .... ....................................... ............................. 405
11.1.2 Setuid and Setgid Bits .............................................................................. 406
11.2 Retrieving User and Group IDs ...... ...... .................... .............. .. ............ ............ 407
11.3 C hecking As the Real User: access () ................ ................ ...................... ..... 410
11.4 Checki ng as the Effective User: euidaccess () (G LIBC) .... .. ...................... . 412
11.5 Setting Extra Permission Bits for Directories ............ .. ...................................... 412
11.5.1 D efault Group for New Files and Directories .... .......... ............................. 412
11.5.2 Direcrories and the Sticky Bit ................................................................... 414
11.6 Setting Real and Effective IDs ................................ .. .. .. .................................. .. 415
11.6.1 C hanging the Group Set .......... .. .. .................... .............................. ...... .... 416
11.6.2 C hanging the Real and Effective IDs .. .............. ........................................ 416
11.6.3 Using the Setuid and Setgid Bits .............. .. ............ .............................. .... 419
11.7 Working with All Three IDs: getresuid () and setre suid () (Linux).. .. 421
Co ntents XIII
11.8 C rossing a Security Minefield: Setuid root .... ... ......... ... ...... ........... ....... .. ........ 422
11.9 Suggested Reading .................................. ......... .... .............. .......................... ..... 423
11.10 Summary ... ........ ...................... ........................ .. ... .. .... ... .. .............. .... .. .. ........... 424
Exerc ises ..... .. .. .................. ....... ... ....... ... ... ......... ......... ......................... .... .. ....... ...... ...... 426
Chapter 12 General Library Interfaces - Part 2 ................... ............... .......... 427
12.1 Assertion Statements: as se r t () .............................. .. ... ......... .... .. .... ... .. ...... ... 428
12.2 Low-Level Memory: T he me mXXX () Functions ...... .... ..... ............. .. .......... ...... 432
12.2. 1 Setting Memory: me mset () .... .. ........ .... ................................................ . 432
12.2.2 Copyi ng Memory: memcpy ( ) , memmove ( ) , and memcc py () .. ..... .. .. .. 433
12.2.3 Compar ing Memory Blocks: memcmp () .................. ...... ............. ... .. ...... . 434
12.2.4 Searching for a Byre Value: memc hr () .... .. ...... .. ... .. ................. .. .. .. ...... .. .. 435
12.3 Temporary Files .......... .......... ......... ........... ..... ... ............................. .... .. ...... .. .. .. 436
12.3.1 Ge nerating Temporary Filenames (Bad) ................................ ...... .. ... .. ...... 437
12.3.2 Creating and Openi ng Temporary Files (Good) .................... ... .. .... .... ... ... 44 1
12.3 .3 Using the TMPDIR Environment Variable ........ .... ................................ .... 443
12.4 Committing Suicide: abo rt () ....................... .. ........ ......................... ........ ..... 445
12.5 Non local Gotos .............................. .. ......... ..... ..... .............. ..... ......... ................ . 446
12. 5. 1 Using Standard Functions: se tjmp () and longjmp () ..... .... ..... .......... 447
12. 5.2 H andli ng Signal Masks: si g s etjmp ( ) and si g l o ng j mp () .. .. .... .. ..... 449
12.5.3 Observing Important Caveats .... .. .. .. .......... .. .. ... ............... ...... ................... 450
12.6 Pseudorando m Numbers ........ .................... .. ........ ......... .. ..... ....................... ..... 454
12.6. 1 Standard C: rand () and srand () ...... ........ .. ... ........ .... .... ................ ..... 455
12.6.2 POSIX Functions: random () and srandom () .................. .............. ..... 457
12.6.3 The Idev / random and Idev / urandom Special Files ...... .. .... .............. 460
12.7 Metacharacter Expans ions...... ..................... .. .... .......... .... .......... ....................... 461
12. 7. 1 Simple Pattern Matching: fnma tch () ..... .. ... .. ... ....... .... .. .... ............... .... 462
12.7.2 Filename Expansion: gl o b () and g lob free () .... ... .. ........ ..... ...... .. ..... 464
12.7.3 Shell Word Expansion: wo r d exp ( ) and wo r dfree () ......................... 470
12.8 Regular Exp ressions .... ........... ........ .................. .. .. ............................................. 47 1
12.9 Suggested Reading ........... .... ......... ..... ....................................... ..... .. ................. 480
12.10 Summary .... .......... .............. ... ........ ....... .. ...... .. ....... ................... ... .... .. .... ........... 48 1
Exercises ................. .............. ... ......... ...... ... ... ........... . .... ........ ....... .. ......... ... ........ .... ...... 482
XIV Contents
Chapter 13 Internationalization and Localization .......................................... 485
13.1 Introduction ............. ........................................ ... .. .... .............................. ......... 486

13.2 Locales and the C Library . ... .. ........................................... ................................ 487
13.2.1 Locale Categories and Environment Variables ........... .. ........... .................. 487
13.2.2 Setting the Locale: setlocale () ....... ....... ......................... ................... 489
13.2.3 String Collation: strcoll () and strxfrrn () .... ...... ... ........................ 490
13.2.4 Low-Level Numeric and Monetary Formatting: localeconv ( ) 494
13.2.5 High-Level Numeric and Monetary Formatting: strfrnon ( )
andprintf () .............. .... .. .... ............... ... .... .. .......... ........ .. ........... ........ . 498
13.2.6 Example: Formatting Numeric Values in gawk ..... .... .... ............ .... ......... . 501
13.2.7 Formatting Date and Time Values: ctirne () and strftirne () ....... ... . 503
13.2.8 Other Locale Information: nl_langinfo () ..... ....... .. .... ......... ............ .. 504
13.3 Dynamic Translation of Program Messages ...... .... ... ............ ... ................... ...... . 507
13.3.1 Setting the Text Domain: textdornain () .... ........... ..... .. ..... ................. . 507
13.3.2 Translating Messages: gettext () .................................. .................. .... . 508
13.3 .3 Working with Plurals: ngettext () ....... .......... .......................... .... ....... . 509
13.3.4 Making get text () Easy to Use .... ...... .... ... ... .. .... ... ........ ..... ........... ...... . 510
13.3.4.1 Portable Programs: "gettext. h" ........... .. ..... ...... ..................... ..... . 511
13.3.4.2 GLIBC Only: <libintl.h> .......... ...... ..... ... ... ... ... ..... ... .. ............... . 513
13.3.5 Rearranging Word Order with printf () ..... ................ ............ ............ . 514
13.3.6 Testing Translations in a Private Directory ...... ...... ....... ...... ... ................. . 515
13.3.7 Preparing Internationalized Programs ......... ..... ...... ... ................ ..... .. .. ...... . 516
13.3.8 Creating Translations ................. .............. .................................. ...... ....... . 517
13.4 Can You Spell That for Me, Please~ ........ .................. ........ .. ... .... ...... ................ . 521
13.4. 1 Wide Characters .. ....... ............ ............... .. ............................................. ... . 523
13.4.2 Multibyte Character Encodings ...... ... .. ................................ .................... . 523
13.4.3 Languages ................................. .... ....................... ... ... ..... .. .......... .... .... ... .. . 524
13.4.4 Conclusion ................ ................ ............... ..... ...... ....... .. ... ....... .. .. .. .. ... ... ... . 525
13.5 Suggested Reading .......... ................. .................................... .... .............. ....... .... 526
13 .6 Summary .... ..................... .............................................................................. ... 526
Exercises ........... ........................ ............. .. ....... ... ... .. ... ........................... .............. ......... 527
Chapter 14 Extended Interfaces ................... ........ ....................... ................. 529
14.1 Allocating Aligned Memory: posix_rnernal ign () and memalign () ........ . 530
14.2 Locking Files ............ ................. .. ...... ...................... ........................... ......... .... . 531
Comems xv
14.2. 1 File Locking Co ncepts ..... ................ ............. ................. .. ...................... ... 531
14.2.2 POSIX Locking: f cntl () and loc H () ....... ..... .......... ....... .......... .. .. ... 533
14.2.2.1 Describing a Lock ..... ... ...... ..... ...... ..... .... .......... ... .... ................ ...... ...... 533
14.2. 2.2 O btaining and Releasi ng Locks ............. .... ................ .. .......... ........ ...... 536
14.2. 2.3 O bserving Locking Caveats .......... .. ...... .. ................. .. .. .. ...... ............... 538
14.2.3 BSD Locking: flock () ........ ........ .. ..................................... .... .... ........... 539
14.2.4 Mandatory Locking ........................ .. .. .... ..... .... ........... ... ...... ..... .......... ... ... 540
14.3 More Precise Times .. ...... .... .. .. ...................... ........ .... ........... .... ........ ................. 543
14.3. 1 Microsecond Times: get time o fday () ...... .. ....... .. .............................. . 544
14.3.2 Microsecond File Times: utimes () .... ...... ...... ........ .. .. .. ............ .. .... .. .... . 545
14.3.3 Interval Timers: seti timer () and geti timer () .. .. .. .. .. .... .. .... ........ . 546
14.3.4 More Exact Pauses: nanosleep () .. .......................................... .. ...... ..... 550
14.4 Advanced Searchin g with Binary Trees .. .. .......... .... .... .. .. ...... ........ ....... .. .. .. ........ 551
14.4.1 Introduction to Binary Trees ...... .. .......... ...... .. ............................ .... .......... 551
14.4.2 Tree Management Functions .......... .. .......................................... .. .......... .. 554
14.4.3 Tree Insertion : tsearch () ............ .. ...................................................... 554
14.4.4 Tree Lookup and Use of A Returned Poin ter: t fin d () and
tsearch () ............ ......... ................................ .... ... .............. .. ... ............. 555
14.4.5 Tree Traversal: twalk () .......... ........ ...... .... .... .. ........ .... ........................... 55 7
14.4.6 Tree Node Removal and Tree Deletion: tdelete () and tdest r oy (). 561
14.5 Summary ............. .......... ......... .. .. .. ....... ........ .... ..... .... .... ..... ......... .... ........... ... .... 562
Exercises .. ... ....................... ...... .. ... .... ... .... .... ...... ..... ....... ............................................. . 563
PART III Debugging and Final Project ........................ 565

Chapter 15 Debugging ................................................................................. 567
15.1 First T hings First .. ...................................................... .. ............................. ....... 568

15.2 Compilation for D ebugging ................ .... ...... .... .. ........ .. .............. ........ .. ........... 569
15.3 GDB Basics .... ..... .. ...... ...... ...... .......................... .. ......... ... ............ ....... .............. 570
15.3.1 Running GDB ........................................................... .. ............................ 57 1
15.3.2 Setting Breakpoints, Single-Stepping, and Setting Watchpoints .............. . 574
15.4 Programming for Debuggi ng .... .. .................................................... .. ............ .. .. 577
15.4.1 Compile-Time Debugging Code ........................... .... ............. .... .......... ... . 577
15. 4.1.1 Use Deb ugging Macros .... ............ .... .......... ..... .. ......................... .. ...... 577
15 .4. 1.2 Avoid Expression Macros If Possible ................ ............. ........ ........ ...... 580
15 .4. l. 3 Reorder Code If Necessary ............ .... ...... .... .. ........ .... ............. .. .......... 582
XVI Contents
15.4.1.4 Use Debugging Helper Functions ... ... .. ........... ...... ... ... .... ..... ........... .. .. 584
15.4.1.5 Avoid Unions When Possible ... ............. ....... .......... ...... ...... .. .. ...... ...... 591
15.4.2 Runtime Debugging Code ...... ................ ........ ....... ..... .. ........ .... ........ ....... 595
15.4.2.1 Add Debugging Options and Variables..... ... ........... .......... .. ...... ......... 595
15.4.2.2 Use Special Environment Variables ........ ...... ..... ..... ..... ........ .. .... .. ...... . 597
15.4.2.3 Add Loggi ng Code .......... ............... .. ............. ... .. .... .... ........ ..... .. ......... 601
15 .4.2.4 Runtime Debugging Files ....... ...... ..... ..... .......... .... .............. ..... ........... 602
15.4.2.5 Add Special H ooks fo r Breakpoints ..... ............ ........... .... .... .... ..... ...... . 603
15.5 D ebugging Tools .............. .. .. ..... ........ ........ ... .. .. ... ... ....... ......... ..... ......... .... .... .... 605
15.5.1 The dbug Library - A Sophisticated p r i n tf () ........ ............. ...... ..... .. . 606
15.5.2 Memory Allocation Debuggers ... ... ...... ........ .. .. ........ ................................. 612
15 .5.2.1 GNU/Linux mtrace .. ... ..... .. ..... .. ... ..... ..... ... ....... ... .... ... ... .... .. .. ........ .. 613
15.5.2.2 Electric Fence ... .......... .... .. .. .... ...... .... ...... .. .. .... ......... .. ....... ...... .... ... .... . 614
15 .5.2.3 Debugging Malloe: dmalloc ..... ............ .. .... .... ... .... ...... ...... ... ..... ..... . 619
15 .5.2.4 Valgrind: A Versatile Tool... .. .. .... ..... ....... .... .... ........ .... ........ ... .... .. .. ... . 623
15 .5.2.5 Other Malloc Debuggers .. .... ............. ..... ... .... .. ....... .. ....... .... ...... .... .... . 629
15.5.3 A Modern l i nt .. ... ........ ....... ....... .... .... .. ... ..... ... ......... ... .. .......... .... .. .. ...... 63 1
15.6 Software Testing .. .. ...... ..... .......... .... .. ... ..... .... ... ... ........ .... ......... ... ........ .... ..... ..... 632
15.7 Debugging Rules ...... ............ .... .. ......... ..... ............... .. .. ........... ..... ....... .............. 633
15.8 Suggested Reading ... .. .......... ... ........ .. ... ...... ... ............... ..... .... ... .... ...... ... .. ... ..... .. 637
15.9 Summary .. .... ...... ... .. ........ .... .... .... ....... .... .......... ....... .. ... .... .... ....... .. ......... .......... 638
Exercises ...... ... ....... .. ..... ......................... .... ......... ..... ......... .. ...... ........... .. ........ .... ......... . 639
Chapter 16 A Project That Ties Everything Together ............................ ........ 641
16.1 Project Description .. .... .... .. ........ .... ........ ..................... ...... .. ...... ........ .. .... .... .. .. .. 642
16.2 Suggested Reading ......... ..... .... ........ ........ .. ..... .. .. ..... ...... .. ..... ...... ... .. ... .... .. ...... ... 644
PART IV Appendixes ......... ................................... .. ... 647

Appendix A Teach Yourself Programming in Ten Years .............. ................... 649
Appendix B Caldera Ancient UNIX License ..................... ............................... 655
Appendix C GNU General Public License ........................ ............................... 657
Index .......................................................... ....................... ............................. 667

Preface
O ne of the best ways to learn about programming is to read well-written pro-

grams. This book teaches the fundamental Linux system call APIs-those
that form the core of any significant program-by presenting code from production
programs that you use every day.
By looking at concrete programs , you can not only see how to use the Linux APIs,
but yo u also can examine the real-world issues (performance, portability, robustness)
that arise in writing software.
While the book's title is Linux Programming by Example, everything we cover, unless
otherwise noted, applies to modern Unix systems as well. In general we use "Linux"
to mean the Linux kernel, and "GNU/Linux" to mean the total system (kernel, li-
braries, tools). Also, we often say "Linux" when we mean all ofLinux, GNU/Linux
and Unix; if something is specific to one system or the other, we mention it explicitly.
Au d ien ce
This book is intended for the person who understands programming and is familiar
with the basics ofC, at least on the level of The C Programming Language by Kernighan
and Ritchie. (Java programmers wishing to read this book should understand C pointers,
since C code makes heavy use of them.) The examples use both the 1990 version of
Standard C and Original C.
In particular, you should be familiar with all C operators, control-How structures,
variable and pointer declarations and use, the string management functions, the use of
exi t (), and the <stdio. h> suite of functions for file input/output.
You should understand the basic concepts of standard input, standard output, and
standard error and the fact that all C programs receive an array of character strings
representing invocation options and arguments . You should also be familiar with the
fundamental command-line tools, such as cd, cp, date, ln, ls, man (and info if you
XVI I
XVIII Preface
have it) , rmdir, and rm, the use oflong and short command-line options, environment
variables, and I/O redirection, including pipes.
We assume that you want to write programs that work not just under GNU/Linux
but across the range of Unix systems. To that end, we mark each interface as to its
availability (GLIBC systems only, or defined by P05IX, and so on), and portability
advice is included as an integral part of the text.
The programming taught here may be at a lower level than you're used to ; that's
OK. The system calls are the fundamental building blocks for higher operations and
are thus low-level by nature. This in turn dictates our use of C: The APIs were designed
for use from C, and code that interfaces them to higher-level languages, such as c++
and Java, will necessarily be lower level in nature, and most likely, written in C. It may
help to remember that "low level" doesn 't mean "bad," it just means "more challenging. "
What You Will Learn

This book focuses on the basic APIs that form the core of Linux programming:
• Memoty management
• File input/output
• File metadata
• Processes and signals
• Users and groups
• Programming support (sorting, argument parsing, and so on)
• Internationalization
• Debugging
We have purposely kept the list of topics short. We believe that it is intimidating to
try to learn "all there is to know" from a single book. Most readers prefer smaller, more
focused books, and the best Unix books are all written that way.
So, instead of a single giant tome, we plan several volumes: one on Interprocess
Communication (IPC) and networking, and another on software development and
code portability. We also have an eye toward possible additional volumes in a Linux
Preface XIX
Programming by Example series that will cover topics such as thread program ming and
GUI programming.
The APIs we cover include both sys tem calls and library functions . Indeed, at the C
level, both appear as sim ple function calls. A system call is a direct request for system
services, such as reading or writing a file o r creating a process. A library function, on the
o ther han d, runs at the user level , possibly never requesting any services from the oper-
ating system. System calls are doc umented in section 2 of the reference manual (viewable
online with the man command) , and library functions are documented in section 3.
Our goal is to teach yo u the use of the Linux APIs by example: in particular, through
the use, wherever possible, of both original Unix so urce code and the GNU urilities.
U nfortunately, there aren ' t as many self-contained examples as we though t there'd be.
Th us, we have written numerous small demonstration programs as well. We stress
programming principles : especially those aspects of GNU programming, such as "no
arbitrary limits ," that make the G NU utilities into exceptional programs.
T he choice of everyday programs to study is deliberate. If you've been using
GNU/Linux for any length of time, yo u already understand what programs such as ls
and cp do; it then becomes easy to dive straight into how the programs work, without
having to spend a lot of time learning what they do.
Occasionally, we present both higher-level and lower-level ways of doing things.
Usually the higher-level standard interface is implemented in terms of the lower-level
interface or co nstruct. We hope that such views of what's " under the hood" w ill help
yo u understand how things wo rk; for all the code you wri te, you should always use the
higher-level, standard interface.
Similarly, we sometimes introduce functions that provide certain functio nali ty and
then recommend (with a provided reason) that these functions be avoided! The primary
reason for this app roach is so that yo u'll be able to recognize these functions when you
see them and thus understand the code using them. A well-rounded knowledge of a
topic requires understanding not just what yo u can do, but what you should and should
not do.
Finally, each chapter co ncludes with exercises . Some involve m odifying or writing
code. Others are more in the category of "thought experiments" or "why do you
think .. . " We recommend that yo u do all of them- they will help cement yo ur under-
standing of the material.
xx Preface
Small Is Beautiful: Unix Programs

Hoare's law:
"I nsid e every large program is a small program
struggling to get out."
-CA.R. Hoare-
Initially, we planned to teach the LinuxAPI by using the code from the GNU utilities.
However, the modern versions of even simple command-line programs (like mv and
cp) are large and many-featured. This is particularly true of the GNU variants of the
standard utilities, which allow long and short options, do everything required by POSIX,
and often have additional, seemingly unrelated options as well (like output highlighting).
It then becomes reasonable to ask, "Given such a large and confusing forest , how
can we focus on the one or two important trees?" In other words, if we present the
current full-featured program, will it be possible to see the underlying core operation
of the program?
That is when Hoare's law 1 inspired us to look to the original Unix programs for ex-
ample code. The original V7 Unix utilities are small and straightforward, making it
easy to see what's going on and to understand how the system calls are used. (V7 was
released around 1979; it is the common ancestor of all modern Unix systems, including
GNU/Linux and the BSD systems.)
For many years, Unix source code was protected by copyrights and trade secret license
agreements, making it difficult to use for study and impossible to publish. This is still
true of all commercial Unix source code. However, in 2002, Caldera (currently operating
as SeO) made the original Unix code (through V7 and 32V Unix) available under an
Open Source style license (see Appendix B, "Caldera Ancient UNIX License," page 655).
This makes it possible for us to include the code from the early Unix system in this book.
Standards
Throughout the book we refer to several different formal standards. A standard is a
document describing how something works. Formal standards exist for many things,
for example, the shape, placement, and meaning of the holes in the electrical outlet in
1 This famous statement was made at The International Workshop on Efficient Production of Large Programs in
Jablonna, Poland, August 10- 14, 1970.
Preface XXI
your wall are defined by a formal standard so that all the power cords in your country
work in all the outlets.
50 , too, formal standards for computing systems define how they are supposed to
work; this enables developers and users to know what to expect from their software and
enables them to complain to their vendor when software doesn't work.
Of interest to us here are:
1. ISO/IEC International Standard 9899: Programming Languages - C, 1990.

The first formal standard for the C programming language.
2. ISO/IEC International Standard 9899: Programming Languages - C, Second
edition, 1999. The second (and current) formal standard for the C programming
language.
3. ISO/IEC International Standard 14882: Programming Languages - C+ +, 1998.
The first formal standard for the c++ programming language.
4. ISO/IEC International Standard 14882: Programming Languages- C+ +, 2003.
The second (and current) formal standard for the c++ programming language.
5. IEEE Standard 1003. 1-2001: Standard for Information Technology - Portable
Operating System Interface (POSIJ:®). The current version of the POSIX stan-
dard; describes the behavior expected of Unix and Unix-like systems. This
edition covers both the system call and library interface, as seen by the C/C++
programmer, and the shell and utilities interface, seen by the user. It consists
of several volumes:
• Base Definitions. The definitions of terms, facilities, and header files.
• Base Definitions - Rationale. Explanations and rationale for the choice of
facilities that both are and are not included in the standard.
• System Interfaces. The system calls and library functions. P05IX terms them
all "functions."
• Sheil and Utilities. The shell language and utilities available for use with shell
programs and interactively.
Although language standards aren't exciting reading, you may wish to consider pur-
chasing a copy of the C standard: It provides the final definition of the language. Copies
XXII Preface
can be purchased from ANSI 2 and from ISO.3 (The PDF version of the C standard is
quite affordable.)
The POSIX standard can be ordered from The Open Group.4 By working through
their publications catalog to the items listed under "CAE Specifications," you can find
individual pages for each part of the standard (named "C031" through "C034"). Each
one's page provides free access to the online HTML version of the particular volume.
The POSIX standard is intended for implementation on both Unix and Unix-like
systems, as well as non-Unix systems. Thus, the base functionality it provides is a subset
of what Unix systems have. However, the POSIX standard also defines optional exten-
sions-additional functionality, for example, for threads or real-time support. Of most
importance to us is the XlOpen System Interface (XSI) extension, which describes facilities
from historical Unix systems.
Throughout the book, we mark each API as to its availability: ISO C, POSIX, XSI,
GUBC only, or nonstandard but commonly available.
Features and Power: GNU Programs

Restricting ourselves to just the original Unix code would have made an interesting
histoty book, but it would not have been vety useful in the 21st century. Modern pro-
grams do not have the same constraints (memory, CPU power, disk space, and speed)
that the early Unix systems did. Furthermore, they need to operate in a multilingual
world-ASCII and American English aren't enough.
More importantly, one of the primary freedoms expressly promoted by the Free
Software Foundation and the GNU Project 5 is the "freedom to study." GNU programs
are intended to provide a large corpus of well-written programs that journeyman pro-
grammers can use as a source from which to learn.
2 http: // www . ansi. o r g

3 http: // wwvJ . is o .ch
4 http: // ~MW . opengroup . org
5 http: // www . gnu.org
Preface XXIII
By using GNU programs, we want to meet both goals: show you well-written,
modern code from which you will learn how to write good code and how to use the
APIs well.
We believe that GNU software is better because it is free (in the sense of "freedom, "
not "free beer"). But it's also recognized that GNU software is often technically better
than the corresponding Unix counterparts, and we devote space in Section 1.4, "Why
GNU Programs Are Better, " page 14, to explaining why.
A number of the GNU code examples come from g a wk (GNU aWk). The main
reason is that it's a program with which we' re very familiar, and therefore it was easy
to pick examples from it. We don 't otherwise make any special claims about it.
Summary of Chapters
Driving a car is a holistic process that involves multiple simultaneous tasks. In many
ways, Linux programming is similar, requiring understanding of multiple aspects
of the API, such as file 110, file metadata, directories, storage of time information,
and so on.
The first part of the book looks at enough of these individual items to enable studying
the first significant program, the V7 15 . Then we complete the discussion of files and
users by looking at file hierarchies and the way filesystems work and are used.
Chapter 1, '1ntroduction,"page 3,
describes the Unix and Linux file and process models , looks at the differences be-
tween Original C and 1990 Standard C, and provides an overview of the principles
that make GNU programs generally better than standard Unix programs.
Chapter 2, "Arguments, Options, and the Environment," page 23,
describes how a C program accesses and processes command-line arguments and
options and explains how to work with the environment.
Chapter 3, "User-Level Memory Management,"page 51,
provides an overview of the different kinds of memory in use and available in a
running process. User-level memory management is central to every nontrivial
application, so it's important to understand it early on.
XXIV Preface
Chapter 4, "Files and File 110," page 83,

discusses basic file I/O , showing how to create and use files. This understanding
is important for everything else that follows.
Chapter 5, "Directories and File Metadata,"page 117,
describes how directories, hard links, and symbolic links work. It then describes
file metadata, such as owners, permissions, and so on, as well as covering how to
work with directories.
Chapter 6, "General Library Interfaces - Part 1,"page 165,
looks at the first set of general programming interfaces that we need so that we
can make effective use of a file's metadata.
Chapter 7, "Putting It All Together: 1 s," page 207,
ties together everything seen so far by looking at the V7 ls program.
Chapter 8, "Filesystems and Directory Walks,"page 227,
describes how filesystems are mounted and unmounted and how a program
can tell what is mounted on the system. It also describes how a program can
easily "walk" an entire file hierarchy, taking appropriate action for each object
It encounters.
The second part of the book deals with process creation and management, interprocess
communication with pipes and signals, user and group IDs, and additional general
programming interfaces. Next, the book first describes internationalization with GNU
gettext and then several advanced APIs.
Chapter 9, "Process Management and Pipes,"page 283,

looks at process creation, program execution, IPe with pipes, and file descriptor
management, including nonblocking I/O.
Chapter 10, "Signals," page 347,
discusses signals, a simplistic form of interprocess communication. Signals also
play an important role in a parent process's management of its children.
Chapter 11, "Permissions and User and Group ID Numbers," page 403,
looks at how processes and files are identified, how permission checking works,
and how the setuid and setgid mechanisms work.
Preface xxv
Chapter 12, "General Library Interfaces - Part 2,"page 427,

looks at the rest of the general APIs; many of these are more specialized than the
first general set of APIs.
Chapter 13, "Internationalization and Localization," page 485,
explains how to enable your programs to work in multiple languages, with almost
no pam.
Chapter 14, "Extended Interfaces," page 529,
describes several extended versions of interfaces covered in previous chapters, as
well as covering file locking in full detail.
We tound the book off with a chapter on debugging, since (almost) no one gets
things right the first time, and we suggest a final project to cement your knowledge of
the APIs covered in this book.
Chapter 15, "Debugging,"page 567,

describes the basics of the GDB debugger, transmits as much of our programming
experience in this area as possible, and looks at several useful tools for doin g dif-
ferent kinds of debugging.
Chapter 16, ':11 Project That Ties Everything Together," page 641,
presents a significant programming project that makes use of juSt about everything
covered in the book.
Several appendices cover topics of interest, including the licenses for the source code
used in this book.
Appendix A, "Teach Yourself Programming in Ten Years," page 649,
invokes the famous saying, "Rome wasn't built in a day." So too, Linux/Unix ex-
pertise and understanding only come with time and practice. To that end, we
have included this essay by Peter Norvig which we highly recommend.
Appendix B, "Caldera Ancient UNIX License," page 655,
covers the Unix source code used in this book.
Appendix C, "GNU General Public License,"page 657,
covers the GNU so urce code used in this book.
XXVI Preface
Typographical Conventions
Like all books on computer-related topics, we use certain typographical conventions
to convey information. Definitions or first uses of terms appear in italics, like the word
"Definitions" at the beginning of this sentence. Italics are also used for emphasis, for
citations of other works, and for commentary in examples. Variable items such as argu-
ments or filenames , appear l i ke t hi s . Occasionally, we use a bold font when a point
needs to be made strongly.
Things that exist on a computer are in a constant-width font , such as filenames
(f aa . c ) and command names (Is, grep). Short snippets that you type are additionally
enclosed in single quotes: ' 1 s -1 *. c' .
$ and > are the Bourne shell primary and secondary prompts and are used to display
interactive examples. User input appears in a different font from regular comput e r
outpu t in examples. Examples look like this:
$ 18 -1 Look at files. Option is digit 1, not letter I
foo
bar
baz
We prefer the Bourne shell and its variants (ksh9 3 , Bash) over the C shell; thus, all
our examples show only the Bourne shell. Be aware that quoting and line-continuation
rules are different in the C shell; if you use it, you' re on your own!6
When referring to functions in programs, we append an empty pair of parentheses
to the function 's name: printf ( ) , st r cpy () . When referring to a manual page (acces-
sible with the man command), we follow the standard Unix convention of writing the
command or function name in italics and the section in parentheses after it, in regular
type: awk(1), printf(3).
Where to Get Unix and GNU Source Code

You may wish to have copies of the programs we use in this book for your own ex-
perimentation and review. All the source code is available over the Internet, and your
GNU/Linux distribution contains the source code for the GNU utilities.
6 See th e csh(l) and m hO ) man pages and the book Using csh & tcsh, by Paul DuBois, O 'Reilly & Associates, Se-
bastopol, CA, USA, 1995. ISBN: 1-56592- 132- 1.
Preface XXVII
Unix Code
Archives of various "ancient" versions of Unix are maintained by The UNIX Heritage
Society (TUHS), h ttp : // www . tuh s. org.
Of most interest is that it is possible to browse the archive of old Unix source code
on the Web. Start with http : // minnie . tuh s . org / UnixTree / . All the example code
in this book is from the Seventh Edition Research UNIX System, also known as "V7."
The TUHS si te is physically located in Australia, although there are mirrors of the
archive around the world- see http: // www . tuh s. org/archi ve_sit es . html.
This page also indicates that the archive is available for mirroring with rsync.
(See htt p: //rsync . samba . org/ if you don 't have rsync: It's standard on
GNU/Linux systems.)
You will need about 2-3 gigabytes of disk ro copy the entire archive. To copy the
archive, create an empty directoty, and in it, run the following commands:
mkdir Applicati ons 4BSD PDP-ll PDP-ll/Trees VAX Other
rsync -avz minn ie . t uhs. org : : OA_R oot .

r sync -avz minnie . tuhs . org : : OA_Applic at ions Applications
rs ync -avz minni e . tuhs . org : : OA_4BSD 4BS D
rs ync -av z minn i e . t uhs . org : : OA_PDPll PD P-ll
r sync -avz minni e .tuhs . org : : OA_PDPll_Tr ees PDP -l l/Trees
rsync -avz minnie . tuhs. org : : OA_VAX VP~
r sync -avz minnie.tuhs . org : : OA_Other Other
You may wish to omit copying the Trees directory, which contains extractions of
several versions of Unix, and occupies around 7 00 megabytes of disk.
You may also wish to consult the TUHS mailing list to see if anyone near YOLl can
provide copies of the archive on CD-ROM, to avoid transferring so much data over
the Internet.
The folks at Southern Storm Software, Pry. Ltd., in Australia, have "modernized" a
portion of the V7 user-level code so that it can be compiled and run on current systems,
most notably GNU/Linux. This code can be downloaded fro m their web site. 7
It's interesting to note that V7 code does not contain any copyright or permission
notices in it. The authors wrote the code primarily for themselves and their research,
leaving the permission issues to AT &T' s corporate licensing department.
7 http: // www.s ou th ern- storm . com . au/ v7upgrade . htm l

XXVIII Preface
GNU Code
If yo u're using GNU/Linux, then your distribution will have come with source code,
presumably in whatever packaging format it uses (Red Hat RPM files , Debian DEB
files, Slackware . tar . gz files, etc.). Many of the examples in the book are from the
GNU Coreutils, version 5.0. Find the appropriate CD-ROM for your GNU/Lin ux
distribution, and use the appropriate tool to extract the code. Or follow the instructions
in the next few paragraphs to retrieve the code.
If you prefer to retrieve the files yourself from the GNU ftp site, you will find them
atftp: // ftp.gnu . org / gnu / coreutils / coreutils-5.0 . tar. gz.
You can use the wget utility to retrieve the file:
$ wget ftp://ftp.gnu.org/gnu/coreutils/coreutils-S . O.tar.gz Retrieve the distribution
... lots of output here as file is retrieved ...
Alternatively, you can use good old-fashioned ftp to retrieve the file:
$ ftp ftp.gnu.org Connect to GNU ftp site
Connected to ftp . gnu.org ( 199.232.41 . 7).
220 GNU FTP server ready .
Name (ftp .gnu . org : arnold) : anonymous Use anonymous ftp
331 please specify the password.
Password: Password does not echo on screen
230-If you have any problems with the GNU software or its downloading,
230-please refer your questions to <gnu@gnu . org>.
Lots of verbiage deleted
230 Login successful. Have fun.
Remote system type is UNIX .
Using binary mode to transfer files.
ftp> cd /gnu/coreutils Change to Coreutils directory
250 Directory successfully changed .
ftp> bin
200 Switching to Binary mode .
ftp> hash Print # signs as progress indicators
Hash mark printing on (1024 bytes/hash mark ) .
ftp> get coreutils-S.O.tar . gz Retrieve file
local: coreutils - 5 . 0 . tar . gz remote: coreutils-5.0 . tar . gz
227 Entering Passive Mode (199 ,2 32 ,41,7,86, 107)
150 Opening BINARY mode data connection for coreutils-5 . 0 . tar.gz (6020616 bytes)
#################################################################################
#################################################################################
226 File send OK .

6020616 bytes received in 2 . 03e+03 secs (2.9 Kbytes/sec)
f tp> quit Log off
221 Goodbye .
Preface XX IX
O nce you have the file, extract it as follows :

$ gzip - d e < e oreutils - 5.0 . tar . g z I tar -xvpf - Extract files
. lots of output here as files are extracted .
Systems using GNU tar may use this incantation:

$ tar - xvp z f e oreutils-5.0 . tar . gz Extract files
.. . lots of output here as files are extracted .
In compliance with the GNU General Public License, here is the Copyright infor-
mation for all GNU programs quoted in this book. All the programs are "free software;
you can redistribute it and/or mo dify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either version 2 of the License,
or (at your option) any later versio n." See Appen dix C, "GNU General Public License, "
page 657, for the text of the GNU General Pub lic License.
Coreutils S.O File Copyright dates
l i b/sa fe - read.c Copyright© 1993-1994, 1998,2002
l ib / safe-write.c Copyrigh t© 2002
lib/utime . c Copyright © 1998, 200 1-2002
l ib / xreadlink. c Copyright © 2001
src/du . c Copyright © 1988-1991,1995- 2003
src/env. c Copyright© 1986, 1991 - 2003
src / install.c Copyright © 1989-1991,1995-2002
srcllink. c Copyright © 2001-2002
srclls . c Copyright© 1985, 1988,1990, 199 1,1995-2003
src / pathchk.c Copyright© 1991 - 2003
src / s ort. c Copyright © 1988, 1991-2002
src/sys2.h Copyright © 1997-2003
src / wc . c Copyright © 1985, 1991, 1995-2002
Gawk 3.0.6 File Copyright dates

eval . c Copyright © 1986, 1988, 1989, 199 1-2000
xxx Preface

awk.h Copyright © 1986, 1988,1989,199 1-2003
builtin . c Copyright© 1986, 1988,1989,1991 - 2003
eval.c Copyright © 1986, 1988,1989, 1991-2003
io. c Copyright © 1986, 1988, 1989, 1991-2003
main . c Copyright © 1986, 1988, 1989, 1991-2003
posix/gawkmis c . c Copyright © 1986, 1988, 1989, 1991-1998,2001-2003

builtin.c Copyright © 1986, 1988, 1989, 1991-2004
GLlBC 2.3.2 File Copyright dates

loc alellocale.h Copyright © 1991, 1992, 1995-2002
posix / uni std. h Copyright © 1991 - 2003
time /sys/time. h Copyright © 1991-1994, 1996-2003
Make 3.80 File Copyright dates

read. c Copyright © 1988-1997,2002
Where to Get the Example Programs Used in This Book

The example programs used in this book can be found at http : / / au thors .
php tr . com/robbins.
About the Cover

"This is the weapon of aJed i Knight ... , an elegant weapon for
a more civilized age. For over a thousand generations theJedi
Knights were the guardians of peace and justice in the Old
Republic. Before the dark times, before the Empire."
-Obi-Wan Kenobi-
You may be wondering why we chose to put a light saber on the cover and to use it
throughout the book's interior. What does it represent, and how does it relate to Linux
programming?
Preface XXXI
In the hands of a Jedi Knight, a light saber is both a powerful weapon and a thing
ofbeaury. Its use demonstrates the power, knowledge, control of the Force, and arduous
training of the J edi who wields it.
The elegance of the light saber mirrors the elegance of the original Unix API design.
There, too, the studied, precise use of the APls and the Software Tools and GNU design
principles lead to today's powerful, flexible, capable GNU/Linux system. This system
demonstrates the knowledge and understanding of the programmers who wrote all its
components.
And, of course, light sabers are just way cool!
Ac kn owledgmen ts
Writing a book is lots of work, and doing it well requires help from many people.
Dr. Brian W. Kernighan, Dr. Doug McIlroy, Peter Memishian, and Peter van der
Linden reviewed the initial book proposal. David J. Agans, Fred Fish, Don Marti, Jim
Meyering, Peter Norvig, and Julian Seward provided reprint permission for various
items quoted throughout the book. Thanks to Geoff Collyer, Ulrich Drepper, Yosef
Gold, Dr. CA.R. (Tony) Hoare, Dr. Manny Lehman, Jim Meyering, Dr. Dennis M.
Ritchie, Julian Seward, Henry Spencer, and Dr. Wladyslaw M. Turski, who provided
much useful general information. Thanks also to the other members of the GNITS
gang: Karl Berry, Akim DeMaille, Ulrich Drepper, Greg McGary, Jim Meyering,
Fran<;:ois Pinard, and Tom Tromey, who all provided helpful feedback about good
programming practice. Karl Berry, Alper Ersoy, and Dr. Nelson H.F. Beebe provided
valuable technical help with the T exinfo and DocBook/XML toolchains.
Good technical reviewers not only make sute that an author gets his facts right, they
also ensure that he thinks carefully about his presentation. Dr. Nelson H.F. Beebe,
Geoff Collyer, Russ Cox, Ulrich Drepper, Randy Lechlitner, Dr. Brian W. Kernighan,
Peter Memishian, Jim Meyering, Chet Ramey, and Louis Taber acted as technical re-
viewers for the entire book. Dr. Michael Brennan provided helpful comments on
Chapter 15. Both the prose and many of the example programs benefited from their
reviews. I hereby thank all of them. As most authors usually say here, "Any remaining
.
errors are mme. "
I would especially like to thank Mark Taub of Pearson Education for initiating this
project, for his enthusiasm for the series, and for his help and advice as the book moved
XXX II Preface
through its various stages. Anthony Gemmellaro did a phenomenal job of realizing my
concept for the cover, and Gail Cocker's interior design is beautiful. Faye Gemmellaro
made the production process enjoyable, instead of a chore. Dmitry Kirsanov and
Alina Kirsanova did the figures , page layout, and indexing; they were a pleasure to
work with.
Finally, my deepest gratitude and love to my wife, Miriam, for her support and en-
couragement during the book's writing.
Arnold Robbins
No/Ayalon
ISRAEL
Chapter 1 Introduction page 3
Chapter 2 Arguments, Options, and the Environment page 23
Chapter 3 User-Level Memory Management page 51
Chapter 4 Files and File I/ O page 83
Chapter 5 Directories and File Metadata page 117
Chapter 6 General Library Interfaces - Part 1 page 165
Chapter 7 Putting It All Together: 15 page 207
Chapter 8 Filesystems and Directory Walks page 227

In this chapter
• 1 .1 T he Linux/ Uni x File Model page 4
• 1.2 The Linux/ Un ix Process Mod el page 10
• 1. 3 Stand a rd C vs. Original C page 12
• 1.4 Why GNU Programs Are Better page 14
• 1 .5 Portability Revisited page 19
• 1.6 Suggested Reading page 20
• 1 .7 Summary page 2 1
• Exe rcises page 22
3
I f there is one phrase that summarizes the primary GNU/Linux (and therefore
Unix) concepts, it's "files and processes. " In this chapter we review the Linux
file and process models. These are important to understand because the system calls
are almost all concerned with modifYing some attribute or part of the state of a file
or a process.
Next, because we'll be examining code in both styles, we briefly review the major
difference between 1990 Standard C and Original C. Finally, we discuss at some
length what makes GNU programs "better," programming principles that we'll see
in use in the code.
This chapter contains a number of intentional simplifications. The full details are
covered as we progress through the book. If you're already a Linux wizard, please
forgive us.
1.1 The LinuxjUnix File Model

One of the driving goals in the original Unix design was simplicity. Simple concepts
are easy to learn and use. When the concepts are translated into simple APIs , simple
programs are then easy to design, write, and get correct. In addition, simple code is
often smaller and more efficient than more complicated designs.
The quest for simplicity was driven by two factors. From a technical point of view,
the original PDP-II minicomputers on which Unix was developed had a small address
space: 64 Kilobytes total on the smaller systems, 64K code and 64K of data on the large
ones. These restrictions applied not just to regular programs (so-called user level code),
but to the operating system itself (kernel level code). Thus, not only "Small Is Beautiful"
aesthetically, but "Small Is Beautiful" because there was no other choice!
The second factor was a negative reaction to contemporary commercial operating
systems, which were needlessly complicated, with obtuse command languages, multiple
kinds of file I/O , and little generality or symmetry. (Steve Johnson once remarked that
"Using TSO is like trying to kick a dead whale down a beach. " TSO is one of the obtuse
mainframe time-sharing systems just described. )
1.1.1 Files and Permissions

The Unix file model is as simple as it gets: A file is a linear stream of bytes. Period.
The operating system imposes no preordained structure on files: no fixed or varying
4
1.1 The Linux/ Unix File Model 5
record sizes, no indexed files , nothing. The interpretation of fil e contents is entirely up
to the application. (This isn' t quite true, as we'll see shortly, but it's close enough for
a start.)
Once you have a file, you can do three things with the file 's data: read them, write
them , or execute them.
Unix was designed for time-sharing minicomputers; this implies a multiuser environ-
m ent from the get-go. Once there are multiple users, it must be possible to specify a
file's permissions: Perhaps user jane is user fr ed's boss, and jane doesn't want fre d
to read the latest performance evaluations.
For file permission purposes, users are classified into three distinct categories: user:
the owner of a file; group: the group of users associated with this file (discussed shortly) ;
and other: anybody else. For each of these categories, every file has separate read, write,
and execute permission bits ass ociated with it, yielding a total of nine permission bits.
This shows up in the first field of the output of ' 1 s - 1':
S ls -1 progex.t e xi
- r w- r - - r- - 1 arno l d dev el 5 61 4 F e b 24 18 : 0 2 pr o gex . tex i
Here, arno l d and deve l are the owner and group ofproge x . t exi , and - r w- r- - r- -
are the file type and permissions . The first character is a dash for regular file, a d for
directories, or o ne of a small set of other characters for other kinds of files that aren't
important at the moment. Each subsequent group of three characters represents read,
write, and execute permission for the owner, group, and "other," respectively.
In this example, progex. t e xi is readable and writable by the owner, and readab le
by the group and other. The dashes indicate absent permissions, thus the fil e is no t ex-
ecutable by anyone, nor is it wri table by the group or other.
T he owner and group of a file are stored as numeric values known as the user ID
(UID) and group ID (GID); standard library functions that we present later in the book
m ake it possible to print the values as human -readable names.
A file's owner can change the permission by using the chmod (change mode)
command. (As such, file permissions are sometimes referred to as the "file mode. ")
A file's group can be changed with the chgrp (change group) and chown (change
owner) commands. 1
1 Some sysrems al low regular use rs ro cha nge rhe ownership o n rheir fi les ro someo ne else, rhus "giving rh em away."
T h e details are srandardized by POSIX bur are a bir messy. Typical GNU/Linux configurarions do nor allow it.
6 Chapter 1 • Introduction
Group permissions were intended to support cooperative work: Although one person
in a group or department may own a particular file , perhaps everyone in that group
needs to be able to modify it. (Consider a collaborative marketing paper or data from
a survey.)
When the system goes to check a file access (usually upon opening a file) , if the UID
of the process matches that of the file , the owner permissions apply. If those permissions
deny the operation (say, a write to a file with - r--rw-rw- permissions), the operation
fails; Unix and Linux do not proceed to test the group and other permissions. 2 The
same is true if the UID is different but the GID matches; if the group permissions deny
the operation, it fails.
U nix and Linux support the notion of a superuser: a user with special privileges. This
user is known as r oo t and has the UID of o. r oot is allowed to do anything; all bets
are off, all doors are open, all drawers unlocked. 3 (This can have significant security
implications, which we touch on throughout the book but do not cover exhaustively.)
Thus, even if a file is mode ----- -----, r oot can still read and write the file. (One
exception is that the file can't be executed. But as root can add execute permission,
the restriction doesn' t prevent anything.)
The user/group/other, read/write/execute permissions model is simple, yet flexible
enough to cover most situations. Other, more powerful but more complicated, models
exist and are implemented on different systems , but none of them are well enough
standardized and broadly enough implemented to be worth discussing in a general-
purpose text like this one.
1.1.2 Directories and Filenames

Once you have a file, you need someplace to keep it. This is the purpose of the direc-
tory (known as a "folder" on Windows and Apple Macintosh systems). A directory is
a special kind of file , which associates filenames with particular collections of file
metadata, known as in odes. Directories are special because they can only be updated by
the operating system, by the system calls described in Chapter 4, "Files and File I/O,"
2 The own er can always change the permission , of course. Most users don't di sable write permission fo r themselves.
'3 There are some rare exceptions to this rule, all of which are beyo nd th e scope of this book.
1.1 T he Linux/Un ix File Model 7
page 83. They are also special in that the operating system dictates the format of direc-
to ryentnes.
Filenames may contain any valid 8-bit byte except the / (forward slash) character
and ASCII NUL, the character whose bits are all zero. Early Unix systems limited file-
names to 14 bytes; modern systems allow individual filenames to be up to 255 bytes.
T he inode contains all the information abo ut a file except its name: the type, owner,
group, permissions, size, m odification and access times . It also stores the locations on
disk of the blocks containing the file 's data. All of these are data about the file, not the
file 's data itself, thus the term metadata.
Directory permissions have a slightly different m eaning from those for file permissions.
Read permission means the ability to search the directory; that is, to look through it to
see what files it contains. Write permission is the abili ty to create and remove files in
the directory. Execute permission is the abili ty to go through a directory when opening
or otherwise accessing a co ntained file or subdirectory.
J NOTE If you have write permission on a directory, you ca n remove fil es in th a t

i~ directo ry, even if they don't be lon g to you! When used interactively, the r m
m co mmand noti ces thi s, and asks you for co nfirmation in such a case.
t~
d The / tmp directory has write permission for everyon e, but your files in / tmp
ill! a re quite safe because/ tmp usually has th e so-ca lled sticky bit set on it:
I $ 1s -ld /tmp
II*
d rwxrwxrwt 1 1 root roo t 40 96 May 1 5 17 :1 1 /trop
Note the t is the last position of the first fi eld . On most directories thi s position
Im has an x in it. Wi th th e sticky bit set, only you, as the fil e's owner, or r o ot may
:ffi remove your fil es. (We discu ss this in more detail in Section 11 .5. 2 , " Directori es
I and the Sticky Bit," page 414. )
ill
1.1.3 Executable Files

Remember we said that the operating sys tem doesn't impose a structure on files?
Well, we've already seen that that was a white lie when it comes to directories. It's also
the case for binary executable files. To run a program, the kernel has to know what part
of a file represents instructions (code) and what part represents data. This leads to the
notion of an object file fo rmat, which is the definition for how these things are laid o ut
within a file on disk.
Although the kernel will only run a file laid out in the proper format, it is up to user-
level utilities to create these files. The compiler for a programming language (such as
Ada, Fortran, C , or C++) creates object files, and then a linker or loader (usually named
ld) binds the object files with library routines to create the final executable. Note that
even if a file h as all the right bits in all the right places, the kernel won' t run it if the
appropriate execute permission bit isn't turned on (or at least one execute bit for r oo t) .
Because the compiler, assembler, and loader are user-level tools, it's (relatively) easy
to change object file formats as needs develop over time; it's only necessary to "teach"
the kernel about the new format and then it can be used. The part that loads executables
is relatively small and this isn't an impossible task. Thus, Unix file formats have evolved
over time. The original format was known as a . out (Assembler OUTput) . The next
format , still used on some commercial systems, is known as COFF (Common Object
File Format), and the current, most widely used format is ELF (Extensible Linking
Format). Modern GNU/Linux systems use ELF .
The kernel recognizes that an executable file contains binary object code by looking
at the first few bytes of the file for special m agic numbers. These are sequences of two
or four bytes that the kernel recognizes as being special. For backwards compatibility,
modern Unix systems recognize multiple formats . ELF files begin with the four characters
" \ 177ELF" .
Besides binary executables, the kernel also supports executable scripts. Such a file also
begins with a magic number: in this case, the two regular characters # ! . A script is a
program executed by an interpreter, such as the shell, awk, Peri, Python, or Tcl. The
#! line provides the full path to the interpreter and, optionally, one single argument:
#! I bin l awk -f
BEGIN { print "hello, world" }
Let's assume the above contents are in a file named hello . awk and that the file is
executable. When you type 'hell o . awk' , the kernel runs the program as if you had
typed ' I bin l awk - f hell o . aWk' . Any additional command-line arguments are also
passed on to the program. In this case, awk runs the program and prints the universally
known hel lo , world message.
The # ! mechanism is an elegant way of hiding the distinction between binary exe-
cutables and script executables. If he ll o . awk is renamed to just hell o, the user typing
l.1 The Linux/ Unix File Model 9
'he llo ' can't tell (and indeed sho uldn't have to know) that hello isn' t a binary exe-
cutab le program.
1 .1.4 Devices
One of U nix's most notable innovations was the unificatio n of file I/O and device
I/0 .4 Devices appear as files in the filesystem, regular permissio ns apply to their access,
and the same I/O system calls are used for opening, reading, writing, and closing them.
All of the "magic" to make devi ces look like files is hidden in the kernel. This is just
another aspect of the driving simplicity principle in action: We might phrase it as no
special cases for user code.
Two devices appear frequently in everyday use, particularly at the shell level:
/ dev / null and / dev / tty.
/ dev/null is the "bit bucker." All data sent to Idev/null is discarded by the oper-
ating sys tem, and attempts to read from it always return end-of-file (EOF) immediately.
I dey / tty is the process's current controlling terminal-
the one to which it listens
when a user types the interrupt character (typically CTRL-C) or performs job control
(CTRL-Z).
GNU/Linux systems , and many modern Unix systems, supply /dev / stdin,
/ dev / stdout , and / dev / stderr devices, which provide a way to name the open files
each process inherits upon startup .
Other devices rep resent real hardware, such as tape and disk drives, CD-ROM drives,
and serial ports. There are also software devices, such as pseudo-ttys, that are used for
networking logins and windowing sys tems. / dey I console represents the system console,
a particular hardware device on minicomputers. On modern co mputers, / dey / c onsol e
is the screen and keyboard, but it could be a serial port.
Unfortun ately, device-naming conventions are not standardized, and each operating
system has different names for tapes, disks, and so on. (Fortunately, that's not an issue
for what we cover in this book.) Devices have either a b or c in the first character of
'ls -1' o utput:
4 T his feature firsr appeared in M ulrics, bur Mulrics was neve r widely used.
10 Chapter 1 • Inrroduc(ion
$ 16 -1 /dev/tty /dev/hda
brw-rw---- 1 root disk 3, o Aug 31 02 : 31 /dev/hda
crw-rw-rw- 1 root root 5, o Feb 26 08 : 44 / dev / tty
The initial b represents block devices, and a c represents character devices. Device files
are discussed further in Section 5.4, "Obtaining Information about Files," page 139.
1.2 The LinuxjUnix Process Model

A process is a running program.5 Processes have the following attributes:
• A unique process identifier (the PI D )
• A parent process (with an associated identifier, the PPID)
• Permission identifiers (UID, GID, groupset, and so on)
• An address space, separate from those of all other processes
• A program running in that address space
• A current working directory (' . ')
• A current root directory (/ ; changing this is an advanced topic)
• A set of open files , directories, or both
• A permissions-to-deny mask for use in creating new files
• A set of strings representing the environment
• A scheduling prioriry (an advanced topic)
• Settings for signal disposition (an advanced topic)
• A controlling terminal (also an advanced topic)
When the main () function begins execution, all of these things have already been
put in place for the running program. System calls are available to query and change
each of the above items; covering them is the purpose of this book.
New processes are always created by an existing process. The existing process is termed
the parent, and the new process is termed the child. Upon booting, the kernel handcrafts
the first , primordial process, which runs the program / sbin / ini t; it has process ID
5 Processes can be suspe nded , in which case they are not "running"; however, neither are they terminated. In any
case, in the early stages of the climb up the learning curve, it pays not ro be roo pedantic.
1.2 The Linux/ Unix Process Model 11
1 and serves several administrative functions. All other processes are descendants of
init. (init's parent is the kernel, often listed as process 10 0.)
T h e child- to-parent relationship is one-to-one; each process h as only one parent,
and thus it's easy to find out the PID of the parent. T he parent-to-child relationship
is one-to-many; any given process can create a potentially unlimited number of children.
Thus, there is no easy way for a process to find o ut the PIDs of al l its children. (In
practice, it's no t necessary, anyway.) A parent process can arrange to be notified when
a child process terminates ("dies"), and it can also explicitly wai t for such an event.
Each process's address space (memory) is separate from that of every other. U nless
two processes have made explicit arrangement to share memory, one process cannot
affect the address space of another. This is important; it provides a basic level of securiry
and system reliabiliry. (Fo r efficiency, the system arranges to share the read-only exe-
cutable code of the same program among all the processes running that program . This
is transparent to the user and to the runni ng program.)
The current working directory is the one to which relative pathnames (those that
don't start with a / ) are relative. This is the directory you are "in" whenever you issue
a 'cd s omeplac e' command to the shell.
By co nvention, all programs start out with three files already open: standard input,
standard output, and standard error. These are where input comes fro m , output goes
to, and error messages go to, respectively. In the co urse of this book, we will see h ow
these are put in place. A parent process can open addi tional files and have them already
available for a child p rocess; the child will have to know they' re there, either by way of
some convention or by a command-line argument or environment variable.
T he environment is a set of strings, each of the form 'n ame=v al ue'. Functions exist
for querying and setting environment variables, and child processes inherit the environ-
ment of their parents. Typical environment variables are things like PATH and HOME in
the shell. Many programs look fo r the exis tence and val ue of specific environment
variables in order to control their behavior.
It is important to understand that a single process may execute multiple programs
during its lifetime. U nless explicitly changed, all of the other system-maintained
attributes (cutrent directory, open files, PID, etc.) remain the same. The separation of
"starting a new process" from "choosing which program to run" is a key Unix innovation.
It makes many operations simple and straightforward. Other operating systems that
combine the two operations are less general and more complicated to use.
1.2.1 Pipes: Hooking Processes Together

You've undoubtedly used the pipe construct (' I ') in the shell to connect two or more
running programs. A pipe acts like a file: One process writes to it using the normal
write operation, and the other process reads from it using the read operation. The
processes don 't (usually) know that their input/output is a pipe and not a regular file.
Just as the kernel hides the "magic" for devices, making them act like regular files ,
so too the kernel does the work for pipes, arranging to pause the pipe's writer when the
pipe fills up and to pause the reader when no data is waiting ro be read.
The file 110 paradigm with pipes thus acts as a key mechanism for connecting running
programs; no temporary files are needed. Again , generaliry and simplicity at work: no
special cases for user code.
1.3 Standard C vs. Original C

For many years, the de facto definition of C was found in the first edition of the
book The C Programming Language, by Brian Kernighan and Dennis Ritchie. This
book described C as it existed for Unix and on the systems to which the Bell Labs de-
velopers had ported it. Throughout this book, we refer to it as "Original C," although
it's also common for it to be referred to as "K&R C," after the book's two authors.
(Dennis Ritchie designed and implemented C.)
The 1990 ISO Standard for C formalized the language's definition, including the
functions in the C library (such as printf () and fopen ()) . The C standards committee
did an admirable job of standardizing existing practice and avoided inventing new fea-
tures, with one notable exception (and a few minor ones). The most visible change in
the language was the use of function prototypes, borrowed from c++ .
Standard C, C ++ , and the Java programming language use function prototypes for
function declarations and definitions. A prototype describes not only the function's
return value but also the number and type of its arguments. With prototypes, a compiler
can do complete type checking at the point of a function call:
1.3 Srandard C vs. Original C 13
extern int myfunc ( strucc my_s truct *a, Declaration

struct my_struc t *b,
double c, int d) ;
int rny func (struct my_struct *a, Defnition

struct my_struct *b,
d ouble c , int d)
st ruct my_stru ct s , t;
in t j ;
/ * Func ti on cal l, s omewhere else : * /

j = my_func( & s, & t, 3 . 1415, 42 ) ;
This fun ction call is fine . But consider an erroneous call:

j = my_f unc ( -l, -2 , 0) ; Wrong number and types of arguments
T he compiler can immediately di agnose this call as being invalid. However, In
Original C, functions are declared w ithout the argument list being specified:
e xtern int myfunc() ; Returns int, arguments unknown
Furthermore, function defini tions list the parameter names in the fun ction header,
and then declare the parameters before the functi on body. Parameters of type int d on ' t
have to be declared, and if a function returns int, that doesn't have to be declared either:
myfunc (a , b, c, d) Return type is int
struct my_struct *a, *b ;
double C ; No te, no declaration of parameter d
{
Consider again the same erro neous function call: j = my_ fun c ( - 1, - 2 , 0);' . In
C
Original C, the compiler has no way of knowing that yo u've (accidentally, we assume)
passed the wrong arguments to my_ fune () . Such erroneo us calls generally lead to hard-
to-find runtime problems (s uch as segmentation faults, whereby the program dies), and
the Unix lint program was created to deal with these kinds of things.
So, although function prototypes were a radical departure from existing practice,
their additional type checking was deemed too important to be witho ut, and they were
added to the language with little opposition.
14 Chapter 1 • Introducti o n
In 1990 Standard C, code written in the original style, for both declarations and
definitions, is valid. This makes it possible to continue to compile millions of lines of
existing code with a standard-conforming compiler. New code, obviously, should be
written with prototypes because of the improved possibilities for compile-time
error checking.
1999 Standard C continues to allow original style declarations and definitions.
However, the "implicit int" rule was removed; functions must have a return type, and
all parameters must be declared.
Furthermore, when a ptogram called a function that had not been formally declared,
Original C would create an implicit declaration for the function, giving it a return type
of int o 1990 Standard C did the same, additionally noting that it had no information
about the parameters. 1999 Standard C no longer provides this "auto-declare" feature .
Other notable additions in Standard C are the const keyword, also from C+ +, and
the vola t ile keyword, which the committee invented. For the code you'll see in this
book, understanding the different function declaration and definition syntaxes is the
most important thing.
For V7 code using original style definitions, we have added comments showing the
equivalent prototype. Otherwise, we have left the code alone, preferring to show it ex-
actlyas it was originally written and as you'll see it if you download the code yourself.
Altho ugh 1999 C adds some additional keywords and features beyond the 1990
version, we have chosen to stick to the 1990 dialect, since C99 compilers are not yet
commonplace. Practically speaking, this doesn 't matter: C89 code should compile and
run without change when a C99 compiler is used, and the new C99 features don't affect
our discussion or use of the fundamental Linux/Unix APIs.
1.4 Why GNU Programs Are Better

What is it that makes a GNU program a GNU program?6 What makes GNU software
"better" than other (free or non-free) software? T he most obvious difference is the GNU
General Public License (GPL) , which describes the distribution terms for GNU software.
But this is usually not the reason yo u hear people saying "Get the GNU version of xy z,
6 This section is adapted from an articl e by the author that appeared in Issue 16 of Linux Journal. (See
h ttp : // www .li nu x j ournal . com / article . php? s id=11 3 5.) Reprinted and adapted by permission.
1.4 Why GNU Programs Are Bener 15
it's much better. " GNU software is generally more robust, and performs better, than
standard Unix versions. In this section we look at some of the reasons why, and at the
document that describes the principles of GNU software design.
The GNU Coding Standards describes how to write software for the GNU
project. It covers a range of topics. You can read the GNU Coding Standards online at
ht tp: // www . gnu . org / prep / standa rds . h t ml. See the online version for pointers
to the source files in other formats.
In this section, we describe only those parts of the GNU Coding Standards that relate
to program design and implementation.
1.4.1 Program Design

Chapter 3 of the GNU Coding Standards ptovides general advice about program de-
sign. The four main issues are compatibility (with standards and Unix), the language
to write in, reliance o n nonstandard features of other programs (in a word, "none"),
and the meaning of" portability."
Compatibility with Standard C and POSIX, and to a lesser extent, with Berkeley
Unix is an important goal. But it's not an overriding one. The general idea is to provide
all necessary functionality, with command-line switches to provide a strict ISO or
POSIX mode.
C is the preferred language for writing GNU software since it is the most commonly
available language. In the Unix world, Standard C is now common, but if you can
easily support Original C, you should do so. Although the coding standards prefer C
over C++, C+ + is now commonplace too. One widely used GNU package written in
C++ is groff (GNU troff). With GCC supporting C++, it has been our experience
that installing gro f f is not difficult.
The standards state that portability is a bit of a red herring. GNU utilities are ulti-
mately intended to run on the GNU kernel with the GNU C Library. ? But since the
kernel isn't finished yet and users are using GNU tools on non-GNU systems, portabil-
ity is desirable, JUSt not paramount. The standard recommends using Autoconf for
achieving portability among different Unix systems.
7 T hi s statement refers to the HURD kern el, which is srill under develo pment (as of early 2004) . GCC and GN U
C Library (GLIBC) development rake place mostly on Linux-based sysrems today.
1.4.2 Program Behavior

Chapter 4 of the GNU Coding Standards provides general advice about program be-
havior. We will return to look at one of its sections in detail, below. The chapter focuses
on program design, formatting error messages, writing libraries (by making them
reentrant) , and standards for the command-line interface.
Error message formatting is important since several tools, notably Emacs, use the
error messages to help you go straight to the point in the source file or data file at which
an error occurred.
GNU utilities should use a function named get op t_long () for processing
the command line. This function provides command-line option parsing for both
traditional Unix-style options ('gawk -F: ... ') and GNU-style long options
('gawk --f ield- sepa r at or=: ... '). All programs should provide - -help and
--version options, and when a long name is used in one program, it should be used
the same way in other GNU programs. To this end, there is a rather exhaustive list of
long options used by current GNU programs.
As a simple yet obvious example, --verbose is spelled exactly the same way in all
GNU programs. Contrast this to -v, -v, - d, etc., in many Unix programs. Most of
Chapter 2, "Arguments, Options, and the Environment," page 23 , is devoted to the
mechanics of argument and option parsing.
1.4.3 C Code Programming

The most substantive part of the GNU Coding Standards is Chapter 5, which describes
how to write C code, covering things like formatting the code, correct use of comments,
using C cleanly, naming your functions and variables, and declaring, or not declaring,
standard system functions that you wish to use.
Code formatting is a religious issue; many people have different styles that they prefer.
We personally don't like the FSF's style, and if you look at gawk, which we maintain,
you'll see it's formatted in standard K&R style (the code layout style used in both edi-
tions of the Kernighan and Ritchie book). But this is the only variation in gawk from
this part of the coding standards.
Nevertheless, even though we don 't like the FSF's style, we feel that when modifying
some other program, sticking to the coding style already used is of the utmost impor-
tance. Having a consistent coding style is more important than which coding style you
1.4 Why GNU Programs Are Better 17
pick. The GNU Coding Standards also makes this point. (So metimes, there is no de-
tectable consistent coding style, in which case the program is probably overdue for a
trip through either GNU indent or Unix's cb.)
What we find important about the chapter on C coding is that the advice is good
for any C coding, not just if you happen to be working on a GNU program. So, if
yo u' re just learning C or even if yo u've been working in C (o r C++) for a while, we
recommend this chapter to you since it encapsulates many years of experience.
1.4.4 Things That Make a GNU Program Better

We now examine the section titled Writing Robust Programs in Chapter 4, Program
Behavior for All Programs, of the GNU Coding Standards. T his sectio n provides the
principles of software design that make GNU programs better than their Unix counter-
parts. We quote selected parts of the chapter, with some examples of cases in which
these principles have paid off.
Avoid arbitrary limits on the length or number of any data structure, including
file names, lines, files , and symbols, by allocating all data structures dynami-
cally. In most Unix utilities, "long lines are silently truncated. " This is not
accep table in a GNU utility.
This rule is perhaps the single most important rule in GNU software design-no
arbitrary Limits. All GNU utilities should be able to manage arbi trary amounts of data.
While this requirement perhaps makes it harder for the programmer, it makes things
much better for the user. At one point, we had a gawk user who regularly ran an awk
program on more than 650,000 files (no, that's n ot a typo) to gather statistics. gawk
would grow to over 192 megabytes of data space, and the program ran fo r around seven
CPU hours. He would not have been able to run his program using another awk
implementation. 8
Utilities reading files should not drop NUL characters, or any other nonprint-
ing characters incLuding those with codes above 0177 The only sensible excep-
tions would be utilities specifically intended for interface to certain types of
terminals or printers that can't handle those characters.
8 T his situatio n occurred circa 1993; [he truism is eve n more obvious roday, as users process gigabytes of log files
with gawk .
It is also well known that Emacs can edit any arbitrary file, including files containing
binary data!
Whenever possible, try to make programs work properly with sequences of
bytes that represent multi byte characters, using encodings such as UTF-8
and others. 9 Check every system call for an error return, unless you know
you wish to ignore errors. Include the system error text (from perro r or
equivalent) in every error message resulting from a failing system call, as well
as the name of the file if any and the name of the utility. Just "cannot open
foo .c" or "stat failed" is not sufficient.
Checking every system call provides robustness. This is another case in which life is
harder for the programmer but better for the user. An error message detailing what ex-
actly went wrong makes finding and solving any problems much easier. 1o
Finally, we quote from Chapter 1 of the GNU Coding Standards, which discusses
how to write your program differently from the way a Unix program may have
been written.
For example, Unix utilities were generally optimized to minimize memory
use; if you go for speed instead, your program will be very different. You
could keep the entire input file in core and scan it there instead of using
stdio. Use a smarter algorithm discovered more recently than the Unix pro-
gram. Eliminate use of temporary files. Do it in one pass instead of two (we
did this in the assembler).
Or, on the contrary, emphasize simplicity instead of speed. For some appli-
cations, the speed of today's computers makes simpler algorithms adequate.
Or go for generality. For example, Unix programs often have static tables or
fixed-size strings, which make for arbitrary limits; use dynamic allocation
instead. Make sure your program handles NULs and other funny characters
in the input files. Add a programming language for extensibility and write
part of the program in that language.
9 Sectio n 13.4 , "Can You Spell That for M e, Please?", page 521 , provides an overvi ew of mu!tibyre characters and
encodings.
10 The m echanics of checking for and reporting errors are discussed in Section 4.3, "Determining What Went
Wrong," page 86.
1.5 Porcability Revisited 19
Or turn some parts of the program into independently usable libraries. Or

use a simple garbage collector instead of tracking precisely when to free
memory, or use a new GNU facility such as obstacks.
An excellent example of the difference an algorithm can make is GNU diff. One
of our system's early incarnations was an AT&T 3B1: a system with a MC68010 pro-
cessor, a whopping two megabytes of memory and 80 megabytes of disk. We did
(and do) lots of editing on the manual for gawk, a file that is almost 28 ,000 lines long
(although at the time, it was only in the 10,OOO-lines range) . We used to use 'd iff - c '
quite frequently to look at our changes. On this slow system, switching to GNU diff
made a stunning difference in the amount of time it took for the context diff to appear.
The difference is almost entirely due to the better algorithm that GNU di f f uses.
The final paragraph mentions the idea of structuring a program as an independently
usab le library, with a command-line wrapper or other interface around it. One example
of this is GOB, the GNU debugger, wh ich is partially implemented as a command-line
tool on top of a debugging library. (The separation of the GDB core functionality from
the command interface is an ongoing development project.) This implementation makes
it possible to write a graphical debugging interface on top of the basic debugging
functionali ty.
1.4.5 Part ing Thoughts about the "GNU Coding Standards"

The GNU Coding Standards is a worthwhile document to read if you wish to develop
new GNU software, enhance existing GNU software, or just learn how to be a better
programmer. The principles and techniques it espouses are what make GNU software
the preferred choice of the Unix community.
1.5 Portability Revisited

Portability is something of a holy grail; always so ught after, bur not always obtainable,
and certainly not easily. There are several aspects to writing portable code. The GNU
Coding Standards discusses many of them. But there are others as well. Keep portability
in mind at both higher and lower levels as you develop. We recommend these practices:
Code to standards.
Although it can be challenging, it pays to be familiar with the formal standards
for the language you're using. In particular, pay attention to the 1990 and 1999
ISO standards for C and the 2003 standard for c++ since most Linux programming
is done in one of those two languages.
Also, the POSIX standard for library and system call interfaces, while large, has
broad industry support. Writing to POSIX greatly improves the chances of suc-
cessfully moving your code to other systems besides GNU/Linux. This standard
is quite readable; it distills decades of experience and good practice.
Pick the best interface for the job.
If a standard interface does what you need, use it in your code. Use Autoconf to
detect an unavailable interface, and supply a replacement version of it for deficient
systems. (For example, some older systems lack the memmove () function, which
is fairly easy to code by hand or to pull from the GLIBC library.)
Isolate portability problems behind new interfaces.
Sometimes, you may need to do operating-system-specific tasks that apply on
some systems but not on others. (For example, on some systems, each program
has to expand command-line wildcards instead of the shell doing it.) Create a new
interface that does nothing on systems that don't need it but does the correct thing
on systems that do.
Use Autoconffor configuration.
Avoid #ifdef if possible. If not, bury it in low-level library code. Use Autoconf
to do the checking for the tests to be performed with #ifdef.
1.6 Suggested Reading

1. The C Programming Language, 2nd edition, by Brian W . Kernighan and Dennis
M. Ritchie. Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1989. ISBN:
0-13-1103 70-9.
This is the "bible" for C, covering the 1990 version of Standard C. It is a rather
dense book, with lots of information packed into a startlingly small number
of pages . You may need to read it through more than once; doing so is well
worth the trouble.
2. C, A Reference Manual, 5th edition, by Samuel P. Harbison III and Guy L.
Steele, Jr. Prentice-Hall, Upper Saddle River, New Jersey, USA, 2002. ISBN:
0-13-089592-X.
l. 7 Summary 21
This book is also a classic. It covers Original C as well as the 1990 and 1999
standards. Because it is current, it makes a valuable companion to The C Pro-
gramming Language. It covers many important items , such as internationaliza-
tion-related types and library functions, that aren ' t in the Kernighan and
Ritchie book.
3. Notes on Programming in C, by Rob Pike, February 21, 1989. Available
on the Web from many sites. Perhaps the most widely cited location is
http : // www . lysato r. liu .s e /c/ pikestyle . h t rnl. (Many other useful
articles are available from one level up: http : // www . lysato r. liu . se /c/ .)
Rob Pike worked for many years at the Bell Labs research center where C and
Unix were invented and did pioneering development there. His notes distill
many years of experience into a "philosophy of clarity in programming" that
is well worth reading.
4. The various links at ht tp: // www . c hris-l o tt . o rg / r es o u r ces / cstyl e/ .
This site includes Rob Pike's notes and several articles by Henry Spencer. Of
particular note is the Recommended C StyLe and Coding Standards, originally
written at the Bell Labs Indian Hill site.
1.7 Summary
• "Files and processes" summarizes the Linux/Unix worldview. The treatment of
fi les as byte streams and devices as files, and the use of standard input, output,
and error, simplify program design and unify the data access model. The permis-
sions model is simple, yet Bexible, applying to both files and direcrories.
• Processes are running programs that have user and group identifiers associated
with them for permission checking, as well as other attributes such as open files
and a current working directory.
• The most visible difference between Standard C and Original C is the use of
function prototypes for stricter type checking. A good C programmer should be
able to read Original-style code, since many existing programs use it. New code
should be written using prototypes.
• The GNU Coding Standards describe how to write GNU programs. They provide
numerous valuable techniques and guiding principles for producing robust, usable
software. The "no arbitrary limits" principle is perhaps the single most important
of these. This document is required reading for serious programmers.
• Making programs portable is a significant challenge. Guidelines and tools help ,
but ultimately experience is needed too.
Exercises
1. Read and comment on the article "The GNU Project",l1 by Richard M.

Stallman, originally written in August of 1998.
Il http : //www . gnu . org/gnu/thegnuproject . html

In this chapter
• 2. 1 Opti o n and Argument Conventions page 24

• 2.2 Basi c Command-Line Processing page 28
• 2.3 Option Parsing: ge t opt () and getopt_l ong () page 30
• 2 .4 Th e Environment page 40
• 2.5 Summary page 49
23
C ommand-line option and argument interpretation is usually the first task of
any program. This chapter examines how C (and C++) programs access their
command-line arguments, describes standard routines for parsing options, and takes
a look at the environment.
2.1 Option and Argument Conventions

The word arguments has two meanings. The more technical definition is "all the
'words' on the command line." For example:
$ Is main . c opts.c process.c
Here, the user typed four "words." All four words are made available ro the program
as Its arguments.
The second definition is more informal: Arguments are all the words on the command
line except the command name. By default, Unix shells separate arguments from each
other with whitespace (spaces or TAB characters). Quoting allows arguments to include
whitespace:
$ echo here are lots of spaces
here are lots of spaces Th e shell "eats" the spaces
$ echo "here are lots of spaces"
here are lots of spaces Spaces are preserved
Quoting is transparent to the running program; e cho never sees the double-quote
characters. (Double and single quotes are different in the shell; a discussion of the rules
is beyond the scope of this book, which focuses on C programming.)
Arguments can be further classified as options or operands. In the previous two exam-
ples all the arguments were operands: files for I s and raw text for echo.
Options are special arguments that each program interprets. Options change a pro-
gram's behavior, or they provide information to the program. By ancient convention,
(almost) universally adhered to, options start with a dash (a.k.a. hyphen, minus sign)
and consist of a single letter. Option arguments are information needed by an option,
as opposed to regular operand arguments. For example, the fgrep program's - f option
means "use the contents of the following file as a list of strings to search for." See
Figure 2.1.
24
2. 1 O ption and Argumem Co nvemions 25
Command name
Option
Option argument
-
I r-----rr~r
fg rep - f p atfi le fo o . c bar . c baz . c

Operands
FIGURE 2.1
Command-line components
Thus, patfile is not a data file to search, but rather it's for use by fgre p in defining
the list of strings to search for.
2.1.1 POSIX Conventions

The POSIX standard describes a number of conventions that standard-conforming
programs adhere to. Nothing requires that your programs adhere to these standards,
but it's a good idea for them to do so : Linux and Unix users the world over understand
and use these conventions, and if your program doesn't follow them, yo ur users will
be unhappy. (Or you won't have any users !) Furthermore, the fun ctio ns we discuss
later in this chapter relieve you of the burden of manually adhering to these conventions
for each program you write. Here they are, paraphrased from the standard:
1. Program names should h ave no less than rwo and no more than nine characters.
2. Program names should consist of only lowercase letters and digits.
3. Option names should be single alphanumeric characters. Multidigit options
sho uld not be allowed . For vendors implementing the POSIX utilities, the - w
option is reserved for vendor-specific options.
4 . All options should begin with a '-' character.
5. For options that don' t require option arguments, it sh ould be possible to group
multiple options after a single '- ' character. (For example, 'f o o -a - b -c'
and 'foo - abc' should be treated the same way.)
6. When an option does require an option argument, the argument should be
separated from the option by a space (for example, 'f grep -f pa tfile').
26 Chapter 2 • Argumems, Oprions, and rhe Environmem
The standard, however, does allow for historical practice, whereby sometimes
the option and the operand could be in the same string: ' f g r ep - f p atfile' .
In practice, the getopt () and getopt_ long () functions interpret '- fpatfile'
as '-f patfile', not as ' - f -p -a - t . . . '.
7. Option arguments should not be optional.
This means that when a program documents an option as requiring an option
argument, that option 's argument must always be present or else the program
will fail. GNU getopt () does provide for optional option arguments since
they' re occasionally useful.
8. If an option takes an argument that may have multiple values, the program
should receive that argument as a single string, with values separated by commas
or whitespace.
For example, suppose a h ypothetical program myprog requires a list of users
for its -u option. Then, it should be invoked in one of these two ways:
myprog -u "arnold, joe, jane" Separate with commas
myprog -u "arnold joe jane" Separate with whitespace
In such a case, you're on your own for splitting out and processing each value
(that is, there is no standard routine), but doing so manually is usually
straightforward.
9. Options should come first on the command line, before operands. Unix versions
of getopt () enforce this convention . GNU getopt () does not by default,
although you can tell it to .
10. The special argument ' - -' indicates the end of all options. Any subsequent ar-
guments on the command line are treated as operands, even if they begin with
a dash.
11. The order in which options are given should not matter. However, for mutu-
ally exclusive options, when one option overrides the setting of another, then
(so to speak) the last one wins. If an option that has arguments is repeated, the
program should process the arguments in order. For example, 'myprog - u
arnold - u jane' is the same as 'myprog - u "arno l d , j ane"'. (You have
to enforce this yourself; getopt ( ) doesn ' t help you. )
12. It is OK for the order of operands to matter to a program. Each program should
document such things.
2. 1 Option and Argument Conventions 27
13. Programs that read or write named files should treat the single argument' -' as
meaning standard input or standard output, as is appropriate for the program.
Note that many standard programs don't follow all of the above conventions . The
primary reason is historical compatibility; many such programs predate the codi fYing
of these conventions.
2.1.2 GNU Long Options

As we saw in Section 1.4 .2 , "Program Behavior," page 16, GNU programs are en-
co uraged to use lo ng options of the form --help, - -verbos e , and so on. Such op tio ns,
since they start with' - -', do not conRict with the POSIX conventions . They also can
be easier to remember, and they provide the opportunity for consistency across all GNU
utilities. (For example, - - help is the same everywhere, as compared with - h for "help ,"
- i for "information," and so on.) GNU long options have their own conventions, im-
plemented by the getopt_long () function:
1. For programs implementing POSIX utilities, every short (single-letter) option

should also have a long option.
2. Additional GNU-specific long options need not have a corresponding short
option, but we recommend that they do.
3. Long options can be abbreviated to the shortest string that remains unique.
For example, if there are two options --verbos e and --verbatim, the
shortest possible abbreviations are --verbo and --verba.
4. Option arguments are separated from long options either by whitespace or by
an = sign. For example, --s ourcefile= /some / f ile or --sourcef ile
I some l fil e.
5. Options and arguments may be interspersed with operands on the command
line; getopt_ long () will rearrange things so that all options are processed
and then all operands are available sequentially. (This behavior can be sup-
pressed. )
6. Option arguments can be optional. For such optio ns, the argument is deemed
to be present if it's in the same string as the option. This works only for short
options. For example, if -x is such an option , given 'f oo -xYANKEE S -y', the
argument to -x is 'YANKE ES'. For 'foo -x -y', there is no argument to -x.
28 Chapter 2 • Arguments, Options, and the Environment
7. Programs can choose to allow long options to begin with a single dash. (This
is common with many X Window programs.)
Much of this will become clearer when we examine getopt_long () later In
the chapter.
The GNU Coding Standards devotes considerable space to listing all the long and
shorr options used by GNU programs. If you're writing a program that accepts long
options, see if option names already in use might make sense for you to use as well.
2.2 Basic Command-Line Processing

A C program accesses its command-line arguments through its parameters, argc
and argv. The argc parameter is an integer, indicating the number of arguments there
are, including the command name. There are rwo common ways to decl are main ( ) ,
varying in how argv is declared:
in t ma i n ( int argc , c har *argv[]) i n t main(in t a r gc , char ** argv )
Practically speaking, there's no difference berween the rwo declaratio n s, although

the first is conceptually clearer: argv is an array of pointers to characters. The second
is more commonly used: ar gv is a pointer to a pointer. Also, the second definition is
technically more correct, and it is what we use. Figure 2.2 depicts this situation.
c ha r ** char *
* -.,----1..... "ca t " C strings, terminated with ' \ 0 '

- + ---1..... "filel"
argv - -ii-- - I..... " file2 "
NULL pointer, binary zero
FIGURE 2.2
Memory for argv
By convention, argv [0 1 is the program's name. (For details, see Section 9.1.4.3,
"Program Names and argv [0 1," page 297.) Subsequent entries are the command line
arguments. The final entry in the argv array is a NULL pointer.
2.2 Basic Comma nd-Line Processing 29
argc indicates how m an y arguments there are; since C is zero-based , it is always true
that 'argv [a rgc] == NULL' . Because of this, particul arly in Unix code, you will see
different ways of checking for the end of arguments, such as looping until a counter is
greater than or equal to argc , or until 'argv [i] == 0' or while ' * argv ! = NULL' and
so on . These are all equivalent.
2.2.1 The V7 echo Program

Perhaps the simplest example of command-line processing is the V7 echo program ,
which prints its arguments to standard output, separated by spaces and terminated with
a newline. If the first argument is -n, then the trailing newline is omitted. (This is used
for prompting from shell scripts.) Here's the code: l
1 #include <stdio . h>
2
3 main(argc, argv) int main(int argc, char **argv)
4 int argc;
5 char *argv[] ;
6 (
7 register int i , nflg ;
8
9 nflg = 0 ;
10 if(argc > 1 && argv[l] [0] && argv[l] [1] 'n') (
11 nflg++ ;
12 argc- -;
13 argv+ +;
14
15 for(i=l; i<argc; i++) (
16 fputs(a rgv[i ], stdout) ;
17 if ( i < a rgc -1 )
18 putchar( ' ' ) ;
19
20 if(nflg == 0)
21 putchar( '\n');
22 exit (0) ;
23
Only 23 lines! There are two points of interest. First , decrementing argc and simul-
taneo usly incrementing argv (lines 12 and 13) are common ways of skipping initial
arguments. Second, the check for -n (line 10) is simplistic. - no-newl ine-at-the-
end also works. (Compile it and try it!)
I See /u sr / src / cmd/ echo . c in (h e V7 dimibu(io n .

30 Chapter 2 • Argumenrs, Options, and the Environmenr
Manual option parsing is common in V7 code because the ge topt () function hadn't
been invented yet.
Finally, here and in other places throughout the book, we see use of the r eg i s ter
keyword. At one time, this keyword provided a hint to the compiler that the given
variables should be placed in CPU registers, if possible. Use of this keyword is obsolete;
modern compilers all base register assignment on analysis of the source code, ignoring
the r egister keyword. We've chosen to leave code using it alone, but you should be
aware that it has no real use anymore. 2
2.3 Option Parsing: getopt () and getopt_long ( )

Circa 1980, for System III, the Unix Support Group within AT&T noted that each
Unix program used ad hoc techniques for parsing arguments. To make things easier
for users and developers , they developed most of the conventions we listed earlier. (The
statement in the System III intro(l) man page is considerably less formal than what's
in the POSIX standard, though.)
The Unix Support Group also developed the ge t op t () function, along with several
external variables, to make it easy to write code that follows the standard conventions.
The GNU getopt_long () function supplies a compatible version of getopt ( ) , as
well as making it easy to parse long options of the form described earlier.
2.3.1 Single-Letter Options

The getopt () function is declared as follows:
#include <unistd . h> POSIX
int getopt(int argc, char *const argyl], const char *optstring ) ;
extern char *optarg;

extern int optind, opterr, optopt;
The arguments arg c and argv are normally passed straight from those of main ( ) .
op t string is a string of option letters. If any letter in the string is followed by a colon,
then that option is expected to have an argument.
2 Wh en we asked Jim M eyerin g, the C oreutils maintainer, abour instances of register in the GNU Coreurils,
he gave us an interesting response. H e removes them wh en modifYing code, bur oth erwise leaves them alon e to
make it easier to inregrate changes submirred against existing versions.
2. 3 Op(io n Parsing: g e top t () and getopt_ long ( ) 31
To use getop t ( ) , call it repeatedly from a whi le loo p unti l it returns - 1. Each time
that it finds a valid option letter, it returns that letter. If the option takes an argument,
opta rg is set to point to it. Consider a program that accepts a - a option that doesn't
take an argument and a - b argument that does:
in t oc ; / * op ti on chara c ter * /
char *b_ opt_arg ;
wh ile «(oc = ge t o pt (argc , argv , "ab : " )) ' = -1 ) {

s wi t ch (oc) (
ca se ' a ' :
/ * handl e - a, se t a fla g, whatever * /
b r eak ;
c ase ' b ':
/ * handl e - b, ge t a rg va l ue from opta rg * /
b_opt_ arg = o ptarg;
br e ak;
c as e
/ * error h a n dl ing , see te xt * /
c ase '? I :
d efault :
/ * e rr or han dling , see t e x t * /
As it works, get op t () sets several variables that control error handling.
c ha r *optarg
The argument for an optio n, if the option accepts one.
int opt ind
The current index in argv. When the wh i le loop has finished , rema1l11l1g
operands are found in ar gv [op t ind] through argv [argc- l] . (Remember that
'argv[argc] == NULL' .)
int op t er r
When this variable is nonzero (which it is by default) , ge topt () prints itS own
error messages for invalid options and for missing option argum ents.
int optopt
When an invalid optio n character is fo und, getop t () returns either a ' ? ' or a
, : ' (see below), and opt op t contains the invalid character that was found.
People being human, it is inevitable that programs will be invoked incorrectly, either
with an invalid option or with a missing option argument. In the normal case, getopt ( )
prints its own messages for these cases and returns the ' ? ' character. However, you
can change its behavior in two ways.
First, by setting opt err to 0 before invoking getop t ( ) , you can force get op t ( )
to remain silent when it finds a problem.
Second, if the first character in the opts tring argument is a colon, then getopt ( )
is silent and it returns a different character depending upon the error, as follows:
Invalid option
get opt () returns a ' ?' and optopt contains the invalid option character. (This
is the normal behavior.)
Missing option argument
getopt () returns a ' : ' . If the first character of optst ring is not a colon, then
getopt () returns a '?', making this case indistinguishable from the invalid
optlon case.
Thus, making the first character of op tstring a colon is a good idea since it allows
you to distinguish between "invalid option" and "missing option argument." The cost
is that using the colon also silences get opt ( ) , forcing you to supply your own error
messages. Here is the previous example, this time with error message handling:
int o c; / * option character * /
char *b_opt_ arg;
while ((o c = get op t( argc, argv, " : ab:" )) != -1 ) (

swi tch (oc) (
case . a I:
/ * handle -a, set a flag, whatever * /
break;
case 'b':
/ * handle -b, get arg value from optarg * /
b_opt_arg = optarg;
break;
case I : I :
/ * missing option argument * /

fprintf(stderr, "%s: option ' -%c' requires an argument\n" ,
argv[ O] , optopt ) ;
break;
case '? I :
default :
/ * invalid option * /
fprintf(stderr, "%s: option ' -%c' is invalid : ignored \ n",
argv[ O], optopt ) ;
break;
2.3 Option Parsing: getopt () and getopt_long ( ) 33
A word about flag or option variab le-naming conventions: Much Unix code uses
names of the form xfl g for any given option letter x (for example, nflg in the V7
echo; xflag is also common). This may be great for the program's author, who happens
to know what the x option does wi thout having to check the documentation. But it's
unkind to someone else trying to read the code who doesn' t know the meaning of all
the option letters by heart. It is much better to use names that convey the option's
meaning, such as no_newline for ech o's -n option.
2.3.2 GNU getopt () and Option Ordering

The standard getopt () function sto ps looking for options as soo n as it encounters
a command-line argument that doesn ' t start with a ' - '. GNU getopt () is different:
It scans the entire command line looking fo r o ptions . As it goes along, it permutes
(rearranges) the elements of argv, so that when it's done, all the options have been
moved to the front and code that proceeds to examine ar gv [optind] through
a rgv [argc -1] works correctly. In all cases, the special argument '- -' terminates
. .
optio n scannmg.
You can change the default behavior by usi ng a special first character in opts tring,
as follows:
opt string [O] == '+'

GNU getopt () behaves like standard getopt ( ) ; it returns options in the order
in which they are found, stopping at the first nonoption argument. T his will also
be true if POSIXLY CORRECT exists in the environment.
optstring[ O] == ' - '
GNU get opt () returns every command-line argument, whether or not it repre-
sents an argument. In this case, for each such argument, the function returns the
integer 1 and sets optarg to point to the string.
As for standard get opt (), if the first character of optstring is a ' : " then GNU
getopt () distinguishes between "invalid option" and "missing option argument" by
returning' ?' or ' : ' , respectively. The' : ' in opts tring can be the second character
if the first character is ' +' or ' - '.
Finally, if an option letter in opts tring is followed by two colon characters, then
that option is allowed to have an optional option argument. (Say that three times fast!)
Such an argument is deemed to be present if it's in the same argv element as the option,
and absent otherwise. In the case that it's absent, GNU getopt () returns the option
letter and sets optarg to NULL. For example, given-
whil e ((c = getopt(argc, argv, "ab ::" )) ! = 1)
-for - bYANKEES , the return value is 'b ', and op targ points to "YANKEE S", while
for - b or ' - b YANKEE S', the return value is still 'b' but optarg is set to NULL. In the
latter case, "YANKEE S " is a separate command-line argument.
2.3.3 Long Options

The getopt_l ong () function handles the parsing of long options of the form de-
scribed earlier. An additional routine, getopt_ long_only () works identically, but it
is used for programs where all options are long and options begin with a single' -'
character. Otherwise, both work just like the simpler GNU get opt () function. (For
brevity, whenever we say "getopt_l ong () ," it's as if we'd said "get opt_ long () and
getopt_l ong_only ( ) .") Here are the declarations, from the GNU/Linux getopt(3)
manpage:
#include <getopt . h> CLiBe
inc ge topt_long(int argc, char *c onst argv[] ,

const char *opts tr ing,
const struct option * longopts , int * long index ) ;
int get opt_long _ only (int argc, char *cons t argY ll,
const char *optst ring,
const s truct opt ion *l ongopts , in t *longindex ) ;
The first three arguments are the same as for get opt ( ) . The next option is a pointer
to an array of st ruc t opt ion, which we refer to as the long options table and which
is described shortly. The longindex parameter, if not set to NULL, points to a variable
which is filled in with the index in longopts of the long option that was found. This
is useful for error diagnostics, for example.
2.3.3 .1 Long Options Table

Long options are described with an array of struc t option structures. The struc t
opti on is declared in <getopt. h > ; it looks like this:
2.3 Op rio n Parsing: getopt () and getopt_l ong () 35
struct option {
co nst char *name ;
int has _arg ;
int *f lag;
int va l;
};
T he elements in the structure are as follows:
const char *name

T his is the name of the option, without any leading dashes, for example, "help "
or "v e rbos e" .
int has _ar g
This describes whether the long option has an argument, and if so , what kind of
argument. The value must be o ne of those presenred in Table 2 . 1.
The symbolic constants are macros for the numeric values given in the table. While
the numeric values work, the symbolic cons tants are considerably easier to read,
and yo u sho uld use them instead of the co rresponding numbers in any code that
yo u wnte.
int *flag
If this pointer is NU L L, then get opt_l ong () returns the value in the va l field of
the structure. If it's not NULL, the variable it points to is filled in with the value
in val and getopt_ long () returns o. If the flag isn't NULL but the long option
is never seen, the n the poinred-to variable is not ch anged.
int va l
This is the val ue to return if the long option is seen or to load into * fla g if fla g
is not NUL L. Typically, if flag is not NULL, then val is a true/false value, such as
lor O. O n the other hand, if flag is NULL, then va l is usually a character constant.
If the long option corresponds to a short one, the character constant should be
the same one that appears in the op tstri ng argument for this option. (All of this
will become clearer shortly when we see some examples.)
Each long option has a single entry with the values appropriately filled in. The last
element in the array should have zeros for all the values. The array need not be so rted;
get opt_long () does a linear search. H owever, sorting it by long name may make it
easier for a programmer to read.
TABLE 2.1
Values for has_arg
Symbolic constant Numeric value Meaning

no_argument o The option does not take an argument.
required_argument 1 The option requires an argument.
op ti onal_argument 2 The option 's argument is optional.
The use of flag and v al seems confusing at first encounter. Let's step back for a
moment and examine why it works the way it does. Most of the time, option processing
consists of setting different Bag variables when different option letters are seen, like so:
while ((c = getopt(argc, argv , ":af :hv " )) != -1) {
switch (c) (
case 'a':
do all 1;
break;
case 'f' :
myfile optarg ;
break;
case 'h' :
do_help 1;
break;
case 'v' :
do_verbose 1;
break;
Error handling code here
When flag is not NULL, getopt_long () sets the variable for you. This reduces the
three cases in the previous swi tch to one case. Here is an example long options table
and the code to go wi th it:
int do_all, do_ help, do_verbose ; / * flag variabl es * /
char *myfile;
struct option longopts[] = {

{ "all", no_argument, & do_all, 1 },
{ " file", required_argument, NULL, , f' },
"help", no_argument, & do_help, 1 },
"verbose", no_argument, & do_ verbose, 1 },
0, 0 , 0, 0
};
2.3 Option Parsing: getopt () and get o p t_lo ng () 37
wh i l e ((c = g e topt_ l ong(a rgc, argv , " : f : " , longopts, NULL )) ! = - 1) {

s witch (c) (
case ' f ' :
myfi le o p ta rg ;
br e a k;
case 0 :
/ * ge t op t_ l o n g() set a var iabl e, jus t k eep go i ng * /
break ;
Error handling code here
No tice that the value passed for the op ts tring argument no longer contains' a ' ,
, h' , or ' v ' . This means that the corresponding short optio ns are not accep ted. To allow
both long and short options, yo u would have to restore the corresponding cas e s fro m
the first example to the swi t c h.
Practically speaking, yo u sho uld write your programs such that each short option
also has a co rresp onding long option. In this case, it's easiest to have fl ag be NULL and
val be the corresponding single letter.
2.3.3 .2 Long Options, POSIX Style

The POSIX standard reserves the - w option for vend or-specific features. Thus, by
definition , -w isn' t portable across different systems.
lf w appears in the optst ri ng argument followed by a semicolon (note: nota colon) ,
then getopt_ l ong () treats - Wlongop t the same as --l ongopt . Thus, in the previous
example, change the call to be:
whi le ( ( c = get opt_ l o n g ( a rgc , a rgv , " : f : W; ", long o p ts , NULL )) ! = -1 )
With this change, - Wall is the same as -- all and -Wfil e =myfile is the same as
--fi le =myfile. The use of a semicolon makes it possible for a program to use - Was
a regular option, if desired. (For example, Gee uses it as a regular option, whereas
gawk uses it for POSIX conformance.)
2.3.3.3 getopt_l ong () Return Value Summary

As should be clear by now, g et opt_l ong ( ) provides a flexible mechanism for optio n
parsing. Table 2.2 summarizes the possible return values and their meaning.
TABLE 2.2
getopt_l ong () return values
Return code Meaning

o getopt_l ong () set a flag as found in the long option table.
1 op targ points at a plain command-line argument.
I? I Invalid option.
, ., Missing option argument.
'x' Option character 'x ' .
-1 End of options.
Finally, we enhance the previous example code, showing the full swi tch statement:
int do_all, do_help, do_verbose; / * flag variables * /
char *myfile, *user; /* input file, user name * /
struct option longopts [l = {

"all" , no_argument, & do_all, 1 },
" file" , required_argument, NULL, 'f' },
"help" , no_argument, & do_ help, 1 } ,
"verbose" , no_argument, & do_verbose, 1 },
"userI' opti o nal _ argument, NULL, 'u' },
};
°, 0, 0, }
°
while {(c = get o pt_long (argc, argv, ": ahvf : u :: W;", l o ngopts, NULL }) != -1 ) {
switch (c ) {
case 'a' :
d o all 1;
br eak;
case 'f':
myfile optarg;
break;
case 'h':
do_help 1;
break;
case 'u':
if (optarg ! = NULL )
user optarg;
else
user " root " ;
break;
case 'v' :
do_verbose = 1;
break ;
case 0 : / * getopt_long( ) set a variable, just keep going * /
break;
2.3 Option Parsing: getopt ( ) and getopt_lo ng ( ) 39
#i f 0
case 1 :
1*
* Use this case if getopt_ long() should go through all
* ar guments . If so , add a leading character to opts tri ng.
* Actual code , if any, go es here .
'I
break;
#endif
case ' : ' : 1 * missing op tion argument * 1
fprin tf(stderr , "%s : opt ion '-%c ' requir es an argument\n ",
argv [O], optopt ) ;
break;
case '?' :
default : 1 * invalid option * 1
fprintf(stderr, "%s : option '- %c' is inva lid : i gnored\n",
argv[O] , opto p t) ;
br eak;
In your programs, you may wish to have comments for each option letter explaining
what each one does. However, if you've used descriptive variable names for each option
letter, comments are not as necessary. (Compare do_verbos e to v f lg .)
2.3.3.4 GNU getopt () or getopt_l o ng () in User Programs

You may wish to use GNU getop t () or getopt_long () in your own programs
and have them run on non-Linux systems. That's OK; just copy the source files from
a GNU program or from the GNU C Library (GUBC) CVS archive. 3 The so urce files
are get op t . h, get opt . c , and getop tl. c . They are licensed under the GNU Lesser
General Public License, which allows library functions to be included even in proprietary
programs. You should include a copy of the file COPYING . LI B with your program ,
along with the files getopt. h, getopt . c, and getoptl . c .
Include the source files in your distribution, and compile them along with any other
source files. In your source code that calls get opt_long (), use '#include
<getopt. h>', not '# include "get opt . h "'. Then , when compiling, add -I. to the
C compiler's command line. That way, the local copy of the header file will be
found first.
3 See http : //source s . r edhat.com.

You may be wondering, "Gee, I already use GNU/Linux. Why should I include
getopt_long () in my executable, making it bigger, if the routine is already in the C
library?" That's a good question. However, there's nothing to worry about. The source
code is set up so that if it's compiled on a system that uses GLIBC, the compiled files
will not contain any code! Here's the proof, on our system:
$ uname -a Show system name and type
Linux example 2.4.18-14 #1 Wed Sep 4 13 : 35:50 EDT 2002 i686 i686 i386 GNU/Linux
$ Is -1 getopt.o getoptl.o Show file sizes
-rw-r--r-- 1 arnold devel 9836 Mar 24 13 : 55 getopt.o
-rw-r--r-- 1 arnold devel 10324 Mar 24 13:55 getopt1 . o
$ size getopt.o getoptl.o Show sizes included in executable
text data bss dec hex filename
o 0 0 0 o getopt. o
o o o o o getoptl.o
The size command prints the sizes of the various parts of a binary object or exe-
cutable file. We explain the output in Section 3.1 , "Linux/UnixAddress Space," page 52.
What's important to understand right now is that, despite the nonzero sizes of the files
themselves, they don't contribute anything to the final executable. (We think this is
pretty neat.)
2.4 The Environment

The environment is a set of 'name=valu e' pairs for each program. These pairs are
termed environment variables. Each name consists of one to any number of alphanumeric
characters or underscores C-'), but the name may not start with a digit. (This rule is
enforced by the shell; the C API can put anything it wants to into the environment, at
the likely cost of confusing subsequent programs.)
Environment variables are often used to control program behavior. For example, if
P OS IXLY_CORRECT exists in the environment, many GNU programs disable extensions
or historical behavior that isn't compatible with the POSIX standard.
You can decide (and should document) the environment variables that your program
will use to control its behavior. For example, you may wish to use an environment
variable for debugging options instead of a command-line argument. The advantage
of using environment variables is that users can set them in their startup file and not
have to remember to always supply a particular set of command-line options.
2.4 The Enviro nment 41
Of course, the disadvantage to using environment variables is that they can silently
change a program's behavior. Jim Meyering, the maintainer of the Coreutils, put it
this way:
It makes it easy for the user to customize how the program works without
changing how the program is invoked. That can be both a blessing and a
curse. If yo u write a script that depends on your having a certain environment
variab le set, but then have someone else use that same script, it m ay well fail
(o r worse, silently pro duce invalid results) if that other person d oesn't have
the same environment settings.
2.4.1 Environment Management Functions

Several functions let you retrieve the values of environment variables, change their
values, o r remove them. Here are the declarations:
#include <stdlib . h>
c h ar *ge t env(cons t char *name) ; /50 C: Retrieve environment variable

int setenv (c o nst char * name , co nst char *value, pas/x: Set environment varia ble
int overwr i te ) ;
in t pute nv(char * string) ; XS/: Set environment variable, uses string
v oid uns e tenv(const char *name) ; pas/x: Remove environment va riable
i nt clea r env (v o id ) ; Common: Clear entire environment
The getenv ( ) functio n is the o ne you will use 99 percent of the time. The argument
is the environment variable name to look up, such as "HOME" or "PATH " . If the variable
exists, get env () returns a pointer to the character string val ue. If no t, it returns NULL .
For example:
char *pa thval;
1 * Loo k f or PATH; i f not present , s upply a default va lue */

if ( (pathva l = getenv( " PATH" )) == NULL)
pat hva l = " / bi n : lusr/ bi n : /us r / ucb ";
Occasionally, environment variables exist, but with empty values. In this case, the
return value will be non-NULL, but the first character pointed to will be the zero byte,
which is the C string terminator, ' \ 0 ' . Your code should be careful to check that the
return value pointed to is not NULL. Even if it isn 't NULL, also check that the string is
not empty if you intend to use its value for something. In any case, don ' t just blindly
use the returned value.
To change an envIronment variable or to add a new one to the envIronment,

use setenv ( ) :
if ( setenv ( "PATH", " / bin : l usr / bin : /usr / ucb", 1 ) != 0) {
1* handle failure * 1
It's possible that a variable already exists in the environment. If the third argument
is true (nonzero) , then the supplied value overwrites the previous one. Otherwise, it
doesn 't. The return value is -1 if there was no memory for the new variable, and 0
otherwise. s e t env () makes private copies of both the variable name and the new value
for storing in the environment.
A simpler alternative to s et env () is putenv ( ) , which takes a single" n ame= v al u e"
string and places it in the environment:
if (putenv ( "PATH= / bin : l usr/bin : lusr/ucb") != 0) {
1* handle fai l ure *1
}
pu tenv () blindly replaces any previous value for the same variable. Also, and perhaps
more importantly, the string passed to puten v () is placed directly into the environment.
This means that if your code later modifies this string (for example, if it was an array,
not a string constant) , the environment is modified also. This in turn means that you
should not use a local variable as the parameter for putenv ( ) . For all these reasons
set env () is preferred.
I NOTE The GNU putenv ( ) has an additional (documented) quirk to its

II behavior. If the argument string is a name, then without an = character, the
named variable is removed. The GNU env program, which we look at later in
I this chapter, relies on this behavior.
The unse t env () function removes a variable from the environment:

unsetenv ( "PATH" ) ;
Finally, the cl earenv () function clears the environment entirely:

if ( clearenv () != 0) {
1* handle failure *1
}
This function is not standardized by POSIX, although it's available in GNU/Linux

and several commercial Unix variants. You should use it if your application must be
very security conscious and you want it to build its own environment entirely from
2.4 T he Enviro nment 43
scratch. If cl earenv () is not available, the GNU/ Linux clearenv(3) manpage recom-
mends using ' envi ron = NUL L ; ' to accomplish the task.
2.4.2 The Entire Environment: envi r o n

The correct way to deal with the environment is through the functions described in
the previo us sectio n. H owever, it's worth a look at how things are managed "under
the hood."
The external variable env iron provides access to the envi to nment in the same way
that a rgv provides access to the command-line arguments. You must declare the variable
yourself. Although standardized by POSIX, envi r on is purposely not declared by any
standardized header fil e. (This seems to evolve from historical practice.) H ere is the
declaration:
extern c ha r **env iron; / * Look Ma, no he a der f ile ! */ POSIX
Like a rgv, the final element in environ is NU L L. T here is no "environment count"

variable that corresponds to argc , however. This simple program prints out the entire
enVIro nment:
/ * ch02-printenv . c --- Print out the environment . * /
# include <stdio . h >
ex tern char **envir on;
int main(int argc, char ** argv)
i nt i;
if ( env iron != NULL)

for (i = 0; environ[i] != NULL; i++)
print f ( "'s\n", environ[i ]) ;
retur n 0;
Although it's unlikely to happen, this program makes sure that environ isn' t NUL L
before attempting to use it.
Variables are kept in the environment in random order. Although some Unix shells
keep the environment sorted by variable name, there is no formal requirement that this
be so , and many shells don 't keep them sorted.
As something of a quirk of the implementation, you can access the environment by

declaring a third parameter to main ( ) :
int main(int argc, char **argv, char **envp )
You can then use envp as you would have used environ. Although you may see this
occasionally in old code, we don' t recommend its use; envir o n is the official, standard,
portable way to access the entire environment, should you need to do so.
2.4.3 GNU env

To round off the chapter, here is the GNU version of the env command. This command
adds variables to the environment for the duration of one command. It can also be used
to clear the environment for that command or to remove specific environment variables.
The program serves double-duty for us, since it demonstrates both getopt_ long ( )
and several of the functions discussed in this section. Here is how the program is invoked:
$ env --help
Usage : env [OPTION] ... [-] [NAME =VALUE] . .. [COMMAND [ARG] ... ]
Set each NAME to VALUE in the environment and run COMMAND .
-i, --ignore-environment start with an empty environment

-u, --unset=NAME remove variable from the environment
--help display this help and exit
--version output version information and exit
A mere - implies -i. If no COMMAND, print the resulting environment .
Report bugs to <bug-coreutils@gnu.org>.
Here are some sample invocations:

$ env - myprog argl Clear environment, run program with args
$ env - PATH=/bin:/usr/bin myprog argl Clear environment, add PATH, run program
$ env -u IFS PATH=/bin:/usr/bin myprog argl Unset IFS, add PATH, run program
The code begins with a standard GNU copyright statement and explanatory comment.
We have omitted both for brevity. (The copyright statement is discussed in Appendix C ,
"GNU General Public License, " page 657. The --help output shown previously is
enough to understand how the program works.) Following the copyright and comments
2.4 The Environment 4S
are header includes and declarations. The 'N_ ( " s tr ing" ) , macro invocati on (line 9 3)
is for use in internationalization and localization of the software, topics covered in
C hapter 13, "Internationalization and Localization," page 485 . For now, you can treat
it as if it were the contained string co nstant.
80 #include <config . h>
82 #include <getopt . h>
83 #include <sys/ types . h>
84 #include <ge topt . h >
85
86 #include "syst em . h"
87 #include "erro r. h"
88 # incl ude "clos e out.h"
89
90 1* The official name of this program (e .g. , no 'g' prefi x ) . *1
91 #define PROGRAM_NAME "env"
92
93 #defin e AUTHORS N_ ("Richard Mlynar i k and Davi d MacKenzie " )
94
95 in t putenv () ;
96
97 e x tern cha r **envi r on;
98
99 1 * The name by wh ich this p r ogram was run . * 1
100 char *program_name;
101
102 static struct o ption const longopts[]
103
104 {" ignore -env ironmen t ", no_ argument , NULL , 'i'} ,
105 {"unset", required_argumen t, NULL, ' u ' },
106 {GETOPT_HEL P_OPTION_DECL},
107 {GETOPT_VERSION_OPTION_DECL } ,
108 {NULL , 0 , NULL , O}
109 };
The GNU Coreutils contain a large number of programs, m any of which perform
the same common tasks (for example, argument parsing) , To make maintenance easier,
m any commo n idioms are defined as macros. GETOPT_ HELP _ OPTI ON_DECL and
GETOPT_VERSION_ OPT I ON (lines 106 and 107) are two such. We examine their defini-
tions shortly. The first function , usage ( ), prints the usage information and exits.
T he _ ( "stri ng " ) macro (line 115, and used throughout the program) is also for
internationalization, and for now you should also treat it as if it were the contained
stnng co nstant.
46 Chapter 2 • Arguments, Opcions, and che Enviro nment
111 void
112 usage (int status )
11 3
11 4 if (status != 0)
115 fprintf (stderr, _ ( "Try '% s --help' for more information. \n"),
11 6 program_name ) ;
117 else
118
119 printf (_ ( " \
120 Usage : %s [O PTION] ... [-] [NAME=VALUE] ... [COMMAND [ARG] . .. ]\ n" ) ,
121 program_name ) ;
122 fputs (_ ( " \
123 Set each NAME to VALUE in the environment and run COMMAND . \ n \
124 \n\
125 -i, --igno re-environment start with an empty environment \n\
126 -u, --unset=NAME remove variable from the environment\n \
127 " ), s tdou t ) ;
128 fputs (HELP_OPTION_DESCRIPTION, stdout ) ;
129 fputs (VERSION_OPTION_DES CR IPTION, stdout);
130 fputs (_ ( " \
131 \ n\
132 A mere - implies -i. If no COMMAND , print the resulting environment . \ n \
133 " ) , stdout ) ;
134 printf (_ ( " \nR eport bugs to <%s>.\n"), PACKAGE_BUGREPORT ) ;
135
136 exit (status);
137 }
The first part of main () declares variables and sets up the internationalization. The
functions setlocale (), bindtextdomain (), and textdomain () (lines 147-149)
are all discussed in Chapter 13, "Internationalization and Localization," page 485. Note
that this program does use the envp argument to main () (line 140). It is the only one
of the Coreutils programs to do so. Finally, the call to a texi t () on line 151 (see Sec-
tion 9.1.5 .3, "Exiting Functions," page 302) registers a Coreutils library function thac
Bushes all pending output and closes stdout, reporting a message if there were problems.
The next bit processes the command-line arguments, using getopt_ long ( ) .
139 int
140 main (register int argc, register char **argv, char **envp )
141
142 char *dummy_environ [ l ] ;
143 int optc ;
144 int ignore_environment = 0 ;
145
146 program_name = argv[O];
147 setlocale (LC_ALL , "" ) ;
148 bindtextdomain (PACKAGE, LOCALEDIR ) ;
149 textdomain (PACKAG E) ;
150
151 atexit (close_stdout) ;
2.4 The Enviro nmenr 47
152
153 whi le (( o p t c = ge top t_long (argc, argv, "+ iu : ", longopts , NULL)) != - 1)
154 (
155 swi tch (optc )
156
157 ca se 0 :
158 b reak;
159 c ase ' i' :
160 ignor e _envir onment 1;
161 brea k ;
162 c ase !u ' :
163 b reak ;
164 c as e _GETOPT_ HELP_CHAR ;
165 ca s e _ GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS) ;
166 defaul t :
167 u s ag e (2);
168
169
170
171 if (optind ! = argc && ! strcmp (a r gv [ optind l , "-"))
172 ignore_e nvir olli~e n t = 1;
H ere are the macros, from src / sys2 . h in the Coreutils distribution, that define
the declarations we saw earlier and the 'cas e_GETOPT_xxx' m acros used above (lines
164- 165);
/ * Factor out s ome of the common --help and -- version processi n g c o d e . */
/ * These enum values ca nnot possibly conflict with the option valu es
o rdinaril y u s ed by commands, including CHAR_MAX + 1, etc . Avoid
CHAR_MIN - 1, as it may equal -1, the getopt end-of-options v alue . */
enum
GETOPT_HELP_CHAR = (CHAR_ MI N - 2) ,
GETOPT_VERSION_CHAR = (CHAR_ MI N - 3 )
};
#define GETOPT_HELP_OPTI ON_DEC L \

"help " , no_ a rgument, 0, GETOPT_HELP_CHAR
#define GETOPT_ VERS ION_OPTION_DECL \
"version", no_ a r gume nt, 0, GETOPT_VERSI ON_ CHAR
#define ca s e_GETOPT HELP CHAR

case GETOPT_HELP_CHAR :
usage (EXIT_SUCCES S ) ;
break;
#def ine cas e_GETOPT_VERSION_ CHAR ( Progra m_name , Authors )

c ase GETOPT_ VERSION_C HAR :
versi o n _etc (stdout , Prog r am_name , PACKAGE , VERSI ON, Author s) ;
exit ( EX IT_SUCCES S ) ;
b reak;
The upshot of this code is that --help prints the usage message and --version
prints version information. Both exit successfully. ("Success" and "failure" exit statuses
are described in Section 9.1.5.1 , "Defining Process Exit Status," page 300.) Given that
the Coreutils have dozens of utilities , it makes sense to factor out and standardize as
much repetitive code as possible.
Returning to env. c:
174 environ = dummy_environ;
175 envir on[O) = NULL;
176
177 if ( ! ignore_environment )
178 for ( ; *envp; envp++ )
179 putenv ( *envp ) ;
180
181 optind = 0 ; / * Force GNU getopt to re-initialize . * /
182 while ((optc = getopt_long (argc, argv, "+iu:", longopts, NULL )) != -1 )
183 i f (optc == 'u' )
184 putenv ( optarg ) ; /* Requires GNU putenv. */
185
186 if (optind ! = argc && !strcmp (argv[optindJ, "-" )) Skip options
187 ++ optind;
188
189 while (optind < argc && strchr (argv[ optind), ' =' )) Set environment variables
190 putenv (argv[optind++ )) ;
191
192 / * If no program is specified, print the environment and exit. * /
193 if (optind == argc )
194 {
195 while ( *environ )
196 puts (* envir on++ ) ;
197 exit ( EXIT_SUCCESS ) ;
198
Lines 174-179 copy the existing environment into a fresh copy of the environment.
The global variable environ is set to point to an empty local array. The envp parameter
maintains access to the original environment.
Lines 181-184 remove any environment variables as requested by the -u option.
The program does this by rescanning the command line and removing names listed
there. Environment variable removal relies on the GNU putenv () behavior discussed
earlier: that when called with a plain variable name, putenv () removes the environment
variable.
After any options, new or replac~ment environment variables are supplied on the
command line. Lines 189-190 continue scanning the command line, looking for envi-
ronment variable settings of the form 'name=value'.
2.5 Summary 49
Upon reaching line 192, if nothing is left on the command line, env is supposed to
print the new environment, and exit. It does so (lines 195-197).
If arguments are left, they represent a command name to run and arguments to pass
to that new command. This is done with the execvp () system call (line 200), which
replaces the current program with the new one. (This call is discussed in Section 9.l.4 ,
"S tarting New Programs: The exec () Family," page 293; don 't worry about the details
for now.) If this call returns to the current program, it failed. In such a case, env prints
an error message and exits .
200 e xe cvp (argv[optind] , &argv[optind]) ;
201
202
203 int exit_status = (errno == ENOENT ? 127 : 126) ;
204 error (0 , errno , "%s", argv[optind]);
205 exit (e xit_status ) ;
206
207
The exit status values, 126 and 127 (determined on line 203) , conform to POSIX.
127 means the program that execvp () attempted to run didn ' t exist. (ENOENT means
the file doesn ' t have an entry in the directory.) 126 means that the file exists, but
something else went wrong.
2. 5 Summary
• C programs access their command-line arguments through the parameters argc
and argv. The getopt () function provides a standard way for consistent parsing
of options and their arguments. The GNU version of getopt () provides some
extensions, and getopt_1ong () and getopt_1ong_ on1y () make it possible to
easily parse long-style options.
• The environment is a set of 'name=value' pairs that each program inherits from
its parent. Programs can, at their author's whim, use environment variables to
change their behavior, in addition to any command-line arguments. Standard
routines (getenv ( ), setenv () , putenv ( ) , and unsetenv ( )) exist for retrieving
environment variable values, changing them, or removing them. If necessary, the
entire environment is available through the external variable environ or
through the char * * envp third argument to rna in ( ). The latter technique is
discouraged.
Exercises
1. Assume a program accepts options -a, -b, and -c, and that - b requires an ar-
gument. Write the manual argument parsing code for this program, without
using getopt () or getopt_ long ( ). Accept - - to end option processing.
Make sure that - ac works, as do -bYANKEES, -b YANKEES, and -abYANKEES.
Test your program.
2. Implement getopt ( ) . For the first version, don't worry about the case in which
'optstring [0 1 == ,:, '. You may also ignore opterr .
3. Add code for 'optstring [0 1 ' . " and opt err to your verSlOn of
get opt ().
4. Print and read the GNU get opt. h, get op t . c and get opt l . c files.
5. Write a program that declares both env ir on and envp and compares their
values.
6. Parsing command line arguments and options is a wheel that many people
can't refrain from reinventing. Besides get opt () and getopt_l ong ( ) , you
may wish to examine different argument-parsing packages, such as:
• The Plan 9 From Bell Labs arg(2) argument-parsing library,4
• Argp, 5
• Argv,6
• Autoopts'?
• GNU Gengetopt,8
• Opt,9
• PopelO See also the popt(3) manpage on a GNU/Linux system.
7. Extra credit: Why can't a C compiler completely ignore the regist er keyword?
Hint: What operation cannot be applied to a register variable?
4 http : // plan9 . bell-labs . c om/ magic / man2html/2 / arg

5 http: // www . gnu.org / manua l/ glibc/html_node / Argp . html
6 http: // 25 6 . c om/ s o u r ces / argv
7 http : // autogen.sourceforge . ne t / autoopts.htm l
8 ftp : // ftp . gnu . org/gnu/gengetopt /
9 http://nis-www . lanl . gov / -jt / Software / opt /opt-3 . 19 . tar . gz
10 ht tp: / / freshmeat . net / proj ects / popt I?topic_id=809
In this chapter
• 3. 1 Linux/ UnixAddressSpace page 52

• 3.2 Memory Al locat ion page 56
• Exercises page 8 1
51
W ithout memory for storing data, it's impossible for a program to get any
work done. (Or rather, it's impossible to get any useful work done.) Real-
world programs can't afford to rely on fixed-size buffers or arrays of data structures.
They have to be able ro handle inputs of varying sizes, from small ro large. This in
turn leads to the use of dynamically allocated memory--memory allocated at runtime
instead of at compile time. This is how the GNU "no arbitrary limits" principle is
put into action.
Because dynamically allocated memory is such a basic building block for real-world
programs, we cover it early, before looking at everything else there is to do. Our
discussion focuses exclusively on the user-level view of the process and its memory;
it has nothing to do with CPU architecture.
3 .1 Linuxj Unix Add ress Space

For a working definition, we've said that a process is a running program. This means
that the operating system has loaded the executable file for the program into memory,
has arranged for it to have access to its command-line arguments and environment
variables, and has started it running. A process has five conceptually different areas of
memory allocated to it:
Code
Often referred to as the text segment, this is the area in which the executable in-
structions reside. Linux and Unix arrange things so that multiple running instances
of the same program share their code if possible; only one copy of the instructions
for the same program resides in memory at any time. (This is transparent to the
running programs .) The portion of the executable file containing the text segment
is the text section.
Initialized data
Statically allocated and global data that are initialized with nonzero values live in
the data segment. Each process running the same program has its own data segment.
The portion of the executable file containing the data segment is the data section.
52
3.1 Linux/Unix Address Space 53
Zero-initialized data
Global and statically allocated data that are initialized to zero by default are kept
in what is colloquially called the BSS area of the process. ! Each process running
the same program has its own BSS area. When running, the BSS data are placed
in the data segment. In the executable file, they are stored in the BSS section.
The format of a Linux/U nix executable is such that only variables that are initialized
to a nonzero value occupy space in the executable's disk file. Thus, a large array
declared 'stat ic char somebuf [2048] ; ', which is automatically zero-filled,
does not take up 2 KB worth of disk space. (So me compilers have options that let
you place zero-initialized data into the data segment.)
Heap
The heap is where dynamic memory (obtained by ma llo c () and friends) comes
from. As memory is allocated on the heap , the process's address space grows, as
you can see by watching a running program with the ps command.
Although it is possible to give memory back to the system and shrink a process's
address space, this is almost neve r done. (We distinguish between releasing no-
longer-needed dynami c memory and shrinking the address space; this is discussed
in more detail later in this chapter.)
It is rypical for the heap to "grow upward. " This means that successive items that
are added to the heap are added at addresses that are numerically greater than
previous items. It is also rypical for the heap to start immediately after the BSS
area of the data segment.
Stack
The stack segment is where local variables are allocated. Local variables are all
variables declared inside the opening left brace of a function body (or other left
brace) that aren't defined as s tatic .
On most architectures, function parameters are also placed on the stack, as well
as "invisible" bookkeeping information generated by the compiler, such as room
for a function return value and storage for the return address representing the return
from a function to its caller. (Some architectures do all this with registers.)
I BSS is an acronym for "B lock Started by Symbol," a mnemonic from the IBM 7094 asse mbler.
S4 Chapter 3 • User-Level Memory Managemenr
It is the use of a stack for function parameters and return values that makes it
convenient to write recursive functions (functions that call themselves).
Variables stored on the stack" disappear" when the function containing them re-
turns; the space on the stack is reused for subsequent function calls.
On most modern architectures, the stack "grows downward," meaning that items
deeper in the call chain are at numerically lower addresses.
When a program is running, the initialized data, BSS, and heap areas are usually
placed into a single contiguous area: the data segment. The stack segment and code
segment are separate from the data segment and from each other. This is illustrated in
Figure 3.1.
High Address
Program Stack
STACK SEGMENT
Stack grows downward
Possible "hole"
in address space
Heap grows upward
Heap
BSS: zero·filled DATA SEGMENT

variables
Globals and
Static variables
(Data)
Low Address
Executable code
(shared)
TEXT SEGMENT
FIGURE 3.1
LinuxjUnix process address space
3.1 Linux/U nix Address Space ss
Although it's theoretically possible for the stack and heap to grow into each other,
the operating system prevents that event, and any program that tries to make it happen
is asking for trouble. This is particularly true on modern systems, on which process
address spaces are large and the gap between the top of the stack and the end of the
heap is a big one. The different memory areas can have different hardware memory
protection assigned to them. For example, the text segment might be marked "execute
only," whereas the data and stack segments would have execute permission disabled.
This practice can prevent certain kinds of security attacks. The details, of course, are
hardware and operating-system specific and likely to change over time. Of note is that
both Standard C and c++ allow canst items to be placed in read-only memory. The
relationship among the different segments is summarized in Table 3.l.
TABLE 3 .1
Executable program segments an d their locations
Program me mory Address space segment Executable file section

Code Text Text
Initialized data Data Data
BSS Data BSS
Heap Data
Stack Stack
The size program prints out the size in bytes of each of the text, data, and BSS
sections, along with the total size in decimal and hexadecimal. (The c h 0 3 -memaddr. c
program is shown later in this chapter; see Section 3.2.5, "Address Space Examination,"
page 78.)
$ cc -0 ch03-memaddr . c -0 ch03-memaddr Compile the program
$ Is -1 ch03-memaddr Show total size
-rwxr- x r-x 1 arnold devel 12320 Nov 24 16 : 4 5 ch03-memaddr
$ size ch03-memaddr Show component sizes
tex t data bss dec hex fil ename
1458 276 8 1742 6ce c h 03-memaddr
$ strip ch03-memaddr Remove symbols
$ Is -1 chO 3 -memaddr Show total size again
- rwxr-xr - x 1 arnold devel 3480 Nov 24 16 : 45 ch03-memaddr
$ size chO 3 -memaddr Component sizes haven't changed
tex t da t a bss dec hex filename
1458 276 8 1742 6ce ch03 - memaddr
56 Chapter 3 • User-Level Memory Management
The total size of what gets loaded into memory is only 1742 bytes, in a file that is
12,320 bytes long. Most of that space is occupied by the symbols, a list of the program's
variables and function names. (The symbols are not loaded into memory when the
program runs.) The strip program removes the symbols from the object file . This can
save significant disk space for a large program, at the cost of making it impossible to
debug a core dump2 should one occur. (On modern systems this isn' t worth the trouble;
don ' t use strip.) Even after removing the symbols, the file is still larger than what gets
loaded into memory since the object file format maintains additional data about the
program, such as what shared libraries it may use, if any. 3
Finally, we'll mention that threads represent multiple threads of execution within a
single address space. Typically, each thread has its own stack, and a way to get thread
local data, that is, dynamically allocated data for private use by the thread. We don ' t
otherwise cover threads in this book, since they are an advanced topic.
3.2 Memory Allocation

Four library functions form the basis for dynamic memory management from C.
We describe them first , followed by descriptions of the two system calls upon which
these library functions are built. The C library functions in turn are usually used to
implement other library functions that allocate memory and the c ++ new and delete
operators.
Finally, we discuss a function that you will see used frequently, but which we don't
recommend.
3.2.1 Library Calls: malloc ( ) , calloc ( ) , realloc ( ) , free ( )

Dynamic memory is allocated by either the malloe () or calloe () functions. These
functions return pointers to the allocated memory. Once you have a block of memory
2 A core dump is the memory image of a running process created when the process terminates unexpectedly. It may
be used later for debugging. Unix systems named the file core, and GNU/Linux systems use core. pid, where
pi d is the process 10 of the process that died.
3 The description here is a deliberate simplificati on. Runn ing programs occupy much more space than the size
progran1 indicates, since shared libraries are included in the address space. Also, the data segment will grow as a
program allocates memory.
3.2 Memory Allocarion 57
of a certain initial size, you can change its size with the rea lloc () function. D ynamic
memory is released with the fr ee () function.
Debugging the use of dynamic memory is an important top ic in its own right. We
discuss tools for this purpose in Section 15.5.2, "Memory Allocation Debuggers, "
page 612.
3.2.1.1 Examining C Language Details

Here are the function declarations from the GNU/Linux mal/oc(3) manpage:
~in clude <stdli b . h> ISO C
voi d *call oc(s ize_ t nmemb, s ize_t siz e ) ; Allocate and zero fill
v oid *mal loc ( si ze_ t si z e ) ; Allocate raw memory
vo id fr ee (v o i d *ptr ) ; Release memory
void *real loc(vo i d *ptr, s ize_t size) ; Change size of existing allocation
The allocation functions all return rype void *. This is a typefess or generic pointer;
all yo u can do with such a pointer is cast it to a different rype and assign it to a ryped
pointer. Examples are coming up,
The rype s iz e _ t is an unsigned integral rype that represents amounts of memory.
It is used for dynamic memory allocation, and we see many uses of it throughout the
book. On most modern systems, size_t is uns igned l ong, but it's better to use
size_t explicitly than to use a plain uns i gned integral rype.
The ptrdi f f _t rype is used for address calculations in pointer arithmetic, such as
calculating where in an array a pointer may be pointing:
#de fin e MAXBUF ...
c har *p;
c ha r buf [MAXBUF ] ;
p trdiff_ t where;
p = buf;
while ( s ome condi t ion )
P += something ;
where = p - buf; / * wha t index are we a t ? */

}
The <stdlib.h> header file declares many of the standard C library routines and
rypes (such as siz e _ t), and it also defines the preprocessor constant NULL, which rep-
resents the" null" or invalid pointer. (This is a zero value, such as 0 or ' ( (va id *) 0)' .
The C++ idiom is to use 0 explicitly; in C, however, NULL is preferred, and we find it
to be much more readable for C code.)
3.2.1.2 Initially Allocating Memory: rnalloc ( )

Memory is allocated initially with mall oe () . The value passed in is the total number
of bytes requested. The return value is a pointer to the newly allocated memory or NULL
if memory could not be allocated. In the latter event, errno will be set to indicate the
error. (errno is a special variable that system calls and library functions set to indicate
what went wrong. It's described in Section 4.3, "Determining What Went Wrong, "
page 86.) For example, suppose we wish to allocate a variable number of some structure.
The code looks something like this:
struct c oo rd { /* 3D coordinates * /
int x, y, z;
} *coo rdinates;
unsigned int count ; / * how many we need * /
size_t amount; /* total amount of memo ry * /
/ * ... determine count somehow ... * /

amount = count * sizeof(struct coord ) ; / * how many bytes to allocate * /
coordinates = (struct coord * ) malloc (amount) ; / * get the space */

if (coordinates == NULL ) {
/* report error, recover or give up * /
/* use coordinates ... * /
The steps shown here are quite boilerplate. The order is as follows:
1. Declare a pointer of the proper rype to point to the allocated memory.
2. Calculate the size in bytes of the memory to be allocated. This involves multi-
plying a count of objects needed by the size of the individual object. This size
in turn is retrieved from the C sizeof operator, which exists for this purpose
(among others). Thus, while the size of a particular struet may vary across
compilers and architectures, sizeof always returns the correct value and the
source code remains correct and portable.
When allocating arrays for character strings or other data of type ehar, it is
not necessary to multiply by sizeof (ehar) , since by definition this is always
1. But it won' t hurt anything either.
3. Allocate the storage by calling mall oe ( ) , assigning the function's return value
to the pointer variable. It is good practice to cast the return value ofmalloc ()
3.2 M emo ry Allocarion 59
ro that of the variable being assigned co . In C ie's n ot required (although the

compiler may generate a warning). We strongly recommend always casting the
return value.
Note that in C++, assignment of a pointer value of one type ro a pointer of
another type does requires a cast, whatever the context. For dynamic memory
management, C++ program s sho uld use new and delet e , ro avoid type prob-
lems, and not mallo c () and f ree ( ) .
4. Check the return value. N ever assume that memory allocation will succeed. If
the allocation fails , malloc () returns NULL . If you use the value without
checking, it is likely that your program will immediately die from a segmentation
violation (o r segf'ault), which is an attempt ro use m emory n o t in you r address
space.
If you check the return value, you can at least print a diagnostic message and
terminate gracefully. Or you can attempt so me other method of recovery.
Once we've allocated memory and set coordi nates to point to it, we can then treat
coo r dinat e s as if it were an array, although it's really a pointer:
int cur_x, cur_Y t cur_Z i
siz e_t an_index;
an_i nde x = someth ing;
cur_x coordina t es (an_ index].x;
cur_y = coordina tes (an_i ndex] . y ;
cur_z = co ordina tes [an_index] . z;
The compiler generates correct code for indexing through the pointer ro retrieve the
members of the sttucture at coordi na t es [an_index 1 .
I NOTE The memory returned byma lloc () is not initialized. It can contain any
j[ random garbage. You should immediately initialize the memory with valid data
I or at least with zeros . To do the latter, use mems et () (discussed in Section 12 .2,
m " Low-Level Memory: The memXXX () Functions, " page 432 ):
ffi memset(coordinates , '\0 ' , amount) ;
I
ill
Another option is to use ca lloc ( ) , described shortly.
Geoff Collyer recommends the following technique for allocating memory:

some_type *pointer;
poi nter = mal l oc(count * sizeof(*pointer));

This approach guarantees that the mall oc () will allocate the correct amount of
memory without your having to consult the declaration of p ointer. If p o inter's type
later changes, the s i zeo f operator automatically ensures that the count of byres to al-
locate stays correct. (Geoffs technique omits the cast that we just discussed. Having
the cast there also ensures a diagnostic if po i n t e r's type changes and the call to
mallo c ( ) isn't updated.)
3.2.1.3 Releasing Memory: free ( )

When you're done using the memory, you "give it back" by using the free ( )
function . The single argument is a pointer previously obtained from one of the other
allocation routines. It is safe (although useless) to pass a null pointer to fr ee ( ) :
fr ee (coordinates) ;
coo r d inates = NULL; / * n ot required, but a g ood i dea * /
Once fr ee (coo r dinates) is called, the memory pointed to by coordi nat e s is

off limits. It now "belongs" to the allocation subroutines, and they are free to manage
it as they see fit. They can change the contents of the memory or even release it from
the process's address space! There are thus several common errors to watch out for with
fr ee ( ) :
Accessingfreed memory
If unchanged, c oo r d i nates continues to point at memory that no longer belongs
to the application. This is called a dangling pointer. In many systems, you can get
away with continuing to access this memory, at least until the next time more
memory is allocated or freed. In many others though, such access won ' t work.
In sum, accessing freed memory is a bad idea: It's not portable or reliable, and the
GNU Coding Standards disallows it. For this reason, it's a good idea to immediately
set the program's pointer variable to NULL. If you then accidentally attempt to
access freed memory, YOut program will immediately fai l with a segmentation
fault (before you've released it to the world, we hope).
Freeing the same pointer twice
This causes "undefined behavior. " Once the memory has been handed back to
the allocation routines, they may merge the freed block with other free storage
under management. Freeing something that's already been freed is likely to lead
to confusion or crashes at best, and so-called double frees have been known to
lead to security problems.
3. 2 Memory Allocarion 61
Passing a pointer not obtained.from mall oe () , ea lloe ( ) , or reall oe ( )

This seems obvious , but it's important nonetheless. Even passing in a pointer to
somewhere in the middle of dynamically allocated memory is bad:
free(coordinat e s + 10) ; / * Release all bu t first 10 elements . * /
This call won't work, and it's likely to lead to disastrous consequences, such as a
crash. (This is because m any ma lloe () implememations keep "bookkeeping"
information in font o/the ret urned data. When free () goes to use that informa-
tion, it will find invalid data there. Other implemematio ns have the bookkeeping
information at the end of the allocated ch unk; the same issues apply.)
Buffer overruns and underruns
Accessing memory outside an allocated chunk also leads to undefined behavior,
again because this is likely to be bookkeeping information or possibly memory
that's not even in the address space. Writing into such memory is much worse,
since it's likely to destroy the bookkeeping data.
Failure to .free memory
Any dynamic memory that's not needed should be released. In particular, memory
that is allocated inside loo ps or recursive or deeply nested function calls should
be carefully managed and released. Failure to take care leads to memory leaks,
whereby the process's memory can grow without bounds ; eventually, the process
dies from lack of memory.
This si tuation can be particularly pernicious if memory is allocated per input
record or as some other function of the input: The memory leak won' t be noticed
when run on small inputs but can suddenly become obvious (and embarrassing)
when run on large ones . T his error is even worse for systems that must run contin-
uously, such as telephone switching systems. A memory leak that crashes such a
system can lead to significant monetary or other damage.
Even if the program never dies for lack of memory, constantly growing programs
suffer in performance, because the operating sys tem has to manage keeping in-use
data in physical m emory. In the worst case, this can lead to behavior known as
thrashing, whereby the operating system is so busy m oving the contents of the
address space into and out of physical memory that no real work gets done.
62 Chapter 3 • User-Level M em ory Management
While it's possible for f r ee () to hand released memory back to the system and shrink
the process address space, this is almost never done. Instead, the released memory is
kept available for allocation by the next call to malloc ( ) , calloc ( ) , or reall oc ( ) .
Given that released memory continues to reside in the process's address space, it may
pay to zero it out before releasing it. Security-sensitive programs may choose to do this,
for example.
See Section 15.5.2, "Memory Allocation Debuggers, " page 612, for discussion of a
number of useful dynamic-memory debugging tools.
3 .2 .1 .4 Changing Size: realloc ( )

Dynamic memory has a significant advantage over statically declared arrays, which
is that it's possible to use exactly as much memory as you need, and no more. It's not
necessary to declare a global, s ta tic , or automatic array of some fixed size and hope
that it's (a) big enough and (b) not too big. Instead, you can allocate exactly as much
as you need, no more and no less.
Additionally, it's possible to change the size of a dynamically allocated memory area.
Although it's possible to shrink a block of memory, more typically, the block is grown.
Changing the size is handled with reall oc () . Continuing with the coo rdinates
example, typical code goes like this :
int new_coun t ;
size_t new_amount;
struct coord *newcoords;
/ * set new_ count , for example : * /

new_count = count * 2; / * double the storage * /
new_amount = new_count * s i z eof(struct coord ) ;
newcoords = (struct coord *) realloc(coord i nates, new_amount ) ;

if (ne wcoords == NULL) {
/ * report error, r e cover or give u p * /
coordinates = newcoords;
/ * cont inue usi n g coordinates ... */
As with mall oc ( ) , the steps are boilerplate in nature and are similar in concept:
1. Compute the new size to allocate, in byres.
2. Call rea lloc () with the original pointer obtained from malloc () (or from
calloc () or an earlier call to r ealloc ( ) ) an d the new size.
3. Cast and assign the return value of realloe (). More discussion of this shortly.
4. As for malloe ( ), check the return value to make sure it's not NULL. Any
memory allocation routine can fail.
When growing a block of memory, realloe () often allocates a new block of the
right size, copies the data from the old block into the new one, and returns a pointer
to the new one.
When shrinking a block of data, realloe () can often just update the internal
bookkeeping information and return the same pointer. This saves having to copy the
original data. However, if this happens, don't assume you can still use the memory beyond
the new size!
In either case, you can assume that if realloe () doesn ' t return NULL , the old data
has been copied for yo u into the new memory. Furthermore, the old pointer is no
longer valid, as if yo u had called free () with it, and you should not use it. This is true
of all pointers into that block of data, not just the particular one used to call free () .
You may have noticed that our example code used a separate variable to point to the
changed storage block. It would be possible (but a bad idea) to use the same initial
variable, like so:
coordinates = realloc(coordinates, new_amount);
This is a bad idea for the following reason. When realloe () returns NULL, the
original pointer is still valid; it's safe to continue using that memory. However, if yo u
reuse the same variable and realloc () returns NULL, you've now lost the pointer to
the original memory. That memory can no longer be used. More important, that
memory can no longer be freed! This creates a memory leak, which is to be avoided.
There are some special cases for the Standard C version of reall oc ( ) : When the
ptr argument is NULL, realloe () acts like malloe () and allocates a fresh block of
storage. When the size argument is 0 , realloe () acts like free () and releases the
memory that ptr points to. Because (a) this can be confusing and (b) older systems
don' t implement this feature, we recommend using malloe () when you mean
malloe () and free () when you mean free ( ) .
Here is another, fairly subtle, "gotcha.,,4 Consider a routine that maintains a stati c
pointer to some dynamically allocated data, which the routine occasionally has to grow.
It may also maintain automatic (that is, local) pointers into this data. (For brevity, we
omit error checking code. In production code, don' t do that.) For example:
void manage_table (void)
{
static struct table *table;
struct table *cur, *p;
int i;
size_t count;
table (struct table * ) rnalloc(count * sizeof (struct table) ) ;

/ * fill table * /
cur = & table[i]; / * point at i'th item * /
cur->i = j; /* use pointer * /
if ( some condition ) /* need to grow table * /

count += count/2;
p = (struct table * ) realloc(table, count * sizeof(struct table));
table = p;
cur->i j; / * PROBLEM 1: update table element * /
other_routine ( ) ; / * PROBLEM 2 : see text * /

cur->j = k; / * PROBLEM 2: see text * /
This looks straightforward; ma nage_table () allocates the data, uses it, changes the
size, and so on. But there are some problems that don't jump off the page (or the screen)
when you are looking at this code.
In the line marked 'PROBLEM 1', the c ur pointer is used to update a table element.
However, c ur was assigned on the basis of the initial value of table. If some
c ondi bon was true and reall o c () returned a different block of memory, cur now
points into the original, freed memory! Whenever table changes, any pointers into
the memory need to be updated too. What's missing here is the statement ' cur = &
table [ i 1 ;' after table is reassigned following the call to reall o c ( ) .
4 It is derived from real-life experience with gawk.

3.2 Memory Alloca[ion 65
The two lines marked ' PROBLEM 2' are even more subtle. In particular, suppose
other_ r outine () makes a recursive call to manage_table ( ) . T he t able variable
could be changed again, completely invisib ly! Upon return from other_r outine () ,
the value of cu r co uld once again be invalid.
One might think (as we did) that the only solution is to be aware of this and supply
a suitably commented reassignment to cu r after the function call. However, Brian
Kernighan kindly set us straight. If we use indexing, the pointer maintenance iss ue
doesn't even arise:
table = ( struc c table *) mal loc (count * siz e of (struct cable )) ;
/ * f i ll table * /
tabl e [ i ] . i = j ; /* Updat e a membe r of the i'th e l ement */
if (some con d i ti on) / * ne ed to g r ow table * /

count += c ount/2 ;
p = (st r uct table * ) rea lloc ( table, count * si ze of ( struct table )) ;
tabl e = p;
table [i] . i = j ; / * PROBLEM 1 goes away * /

ocher_rou tine () ; / * Recursively calls us, modifies tabl e * /
cable[i] . j = k; / * PROBLEM 2 goes away al s o * /
Using indexing doesn' t solve the problem if you have a global copy of the original
pointer to the allocated data; in that case, you still have to worry about updating your
global structures after calling r ealloe ( ) .
m
I NOTE As with malloe ( ) , whe n you grow a piece of memory, the newly
~ allocated memory returned from realloe () is not zero-filled . You must clear
I it you rself with mems et () if that's necessary, since realloe () only allocates
@ th e fresh memory; it doesn 't do anything else.
!li
3.2.1.5 Allocating and Zero-filling: call oc ( )

The eall oe () function is a straightforward wrapper around mall oe ( ) . Its primary
advantage is that it zeros the dynamically allocated memory. It also performs the size
calculation for you by taking as parameters the number of items and the size of each:
coo rdinates = ( struct coo rd *) c al loc(count, sizeof(st ruc t coord}} ;
Conceptually, at least, the ealloe () code is fairly simple. Here is one possible
implementation :
66 Chapter 3 • User-Level Memory M anagement
void *cal loc ( size_t nmemb , size_t size )

(
void *p;
total = nmemb * size; Compute size

p = malloc(total); Allocate the memory
i f (p ! = NULL ) Ifit worked ,,'

memse t(p, '\0' , total); Fill it with zeros
return p; Return value is NULL or pointer
Many experienced programmers prefer to use eal loc () si nce then there's never any
question about the contents of the newly allocated memory.
Also, if you know you'll need zero-filled memory, you should use ea lloe ( ) , because
it's possible that the memory mallo e () returns is already zero-filled. AJthough yo u,
the programmer, can' t know this, ealloe () can know about it and avoid the call
to memset ( ) .
3.2.1.6 Summarizing from the GNU Coding Standards

To summarize, here is what the GNU Coding Standards has to say about using the
memory allocation routines:
Check every call to malloe or reallo e to see if it returned zero . Check

rea lloe even if you are making the block smaller; in a system that rounds
block sizes to a power of 2, rea lloe may get a different block if you ask for
less space.
In Unix, realloe can destroy the storage block if it returns zero. GNU
rea lloe does not have this bug: If it fails, the origi nal block is unchanged.
Feel free to assume the bug is fixed. If you wish to run your program on
Unix, and wish to avoid lossage in this case, you can use the GNU mall oe .
You must expect free to alter the contents of the block that was freed.
Anything you want to fetch from the block, yo u must fetch before calling
free .
In three short paragraphs, Ri chard Stallman has distilled the important principles
for doing dynamic memory management with malloe ( ). It is the use of dynamic
3.2 Memory Allocation 67
memory and the "no arbitrary limits" principle that makes GNU programs so robust
and more capable than their Unix counterparts.
W e do wish to point out that the C standard requires r ealloe () to not destroy the
original block if it returns NU LL .
3.2.1.7 Using Private All ocators

The mallo e () suite is a general-purpose memory allocator. It has to be ab le to
handle requests for arbitrarily large or small amounts of memory and do all the book-
keeping when different chunks of allocated memory are released. If your program does
considerable dynamic memory allocation, you may thus find that it spends a large
propo rtion of its time in the malloe () functions.
One thing you can do is write a private allocator-a set of functions or m acros that
allocates large chunks of memory from mall oe ( ) and then parcels out small chunks
one at a time. This technique is particularly useful if you allocate many individual in-
stan ces of the same relatively small structure.
For example, GNU awk (g awk) uses this technique. From the file awk . h in the gawk
distribution (edited slightly to fit the page):
#de fi n e getnode (n ) if ( ne x t fr ee ) n = nextfree, nextfree = nextfree->nextp ; \
else n = more_nodes ()
#def ine freenode(n) ((n)-> f l ags = 0, (n)->exec_ count = 0,\

(n)->nextp = n e x tfree, next free = (n) )
The nextfr ee variable points to a linked list of NOD E structures. The getnode ( ) macro
pulls the first structure off the list if one is there. Otherwise, it calls mor e_nodes () to
allocate a new list of free NODES. T he fr eenode ( ) macro releases a NOD E by putting it
at the head of the list.
I~ NOTE When firstwritingyourapplication, do itthe simple way: use mall oe ()

and f r ee () directly. I{and only i{profiling your program shows you that it's
I. sPhendlding a signifidcant a.n: ount oftime iln the memory-a llocation functions
l
fJ s ou you co ns l er wntlng a pnvate a ocator.
~
3,2. 1.8 Example: Readin g Arbi trarily Long Lines

Since this is, after all, Linux Programming by Example, it's time for a real-life
example. The following code is the readl i n e () function from GNU Make 3.80
(ftp: // ftp.gnu. o rg / gnu / make / make- 3 .80.tar.gz). It can be found in the file
read. c .
Following the "no arbitrary limits" principle, lines in a Makefile can be of any
length. Thus, this routine's primary job is to read lines of any length and make sure
that they fit into the buffer being used.
A secondary job is to deal with continuation lines. As in C, lines that end with a
backslash logically continue to the next line. The strategy used is to maintain a buffer.
As many lines as will fit in the buffer are kept there, with pointers keeping track of the
start of the buffer, the current line, and the next line. Here is the structure:
struct ebuffer
char *buffer; /* Start of the current line i n the buffer . * /

char *bufnext ; /* Start of the next line in the buffer . * /
char *bufstart ; /* Start of the entire buffer . * /
unsigned int size; /* Malloc'd size of buffer . * /
FILE *fp; /* File, or NULL if this is an i nternal buffer . */
struct floc floc; /* Info o n the file in fp (if any). */
};
The size field tracks the size of the entire buffer, and f p is the FILE pointer for the
input file. The fl oc structure isn' t of interest for studying the routine.
The function returns the number of lines in the buffer. (The line numbers here are
relative to the start of the function , not the source file. )
1 static long
2 readline (ebuf ) static long readline(struct ebuffer *ebuf)
3 struct ebuffer *ebuf;
4
5 char *p;
6 char *end;
7 char *start ;
8 long nlines = 0 ;
9
10 / * The behaviors between string and stream buffers are differ e nt enough to
11 warrant different functions . Do the Right Thing . */
12
13 if ( !ebuf->fp)
14 return readstring (ebuf);
15
16 / * When reading from a file, we always start over at t h e beginning of the
17 buffer for each new line. */
18
19 p = start = ebuf->bufstart;
20 end = p + ebuf->size ;
21 *p= '\0 ';
3.2 Memory Alloca(ion 69
We start by noticin g that GNU Make is written in K&R C for maximal portability.
T he initial part declares variables, and if the input is coming from a string (s uch as
fro m the expansion of a macro) , the code hands things off to a different function,
re adstring () (lines 13 and 14) . The test ' ! ebuf -> fp' (line 13) is a shorter (and less
clear, in our opinion) test for a null pointer; it's the same as 'ebu f ->fp == NULL' .
Lines 19- 21 initialize the pointers, and insert a NUL byte, which is the C string
terminator character, at the end of the buffer. The function then starts a loop (lines
23-9 5) , which runs as lo ng as there is more inp ut.
23 wh i l e ( fgets (p, end - p, ebu f->f p) ! = 0)
24 (
25 char *p2 ;
26 unsigned l ong len;
21 int backs lash ;
28
29 len = strlen (p ) ;
30 if (len == 0)
31 (
32 / * This only happens when the fir st thing on the li n e is a '\0 '.
33 It is a pre tty hop eless cas e , but (wonder of wonders ) Athena
34 lossage strike s again! (xmkrnf p uts NUL s in its make file s . )
35 There is nothi ng really to be d one; we syn thesiz e a newli ne so
36 the following line doe sn't appe ar to be part of this line . */
37 error (&ebuf ->floc,
38 _( "warni ng : NUL cha ra cter seen; re st of line ignored " ));
39 p [0 J = ' \ n';
40 len = 1;
41 }
The fg ets () function (line 23) takes a pointer to a buffer, a co unt of bytes to read,
and a FILE * vari able for the fi le to read from. It reads one less than the count so that
it can terminate the buffer with ' \ 0 ' . T his fun ction is good since it allows yo u to avoid
buffer overflows . It stops upon encountering a newline or end-of-file, and if the newline
is there, it's placed in the buffer. It returns NULL on failure or the (pointer) value of the
first argument on success.
In this case, the arguments are a pointer to the free area of the buffer, the amo unt
of room left in the buffer, and the FILE pointer to read from.
The comment o n lines 32- 36 is self-explanatory; if a zero byte is encountered, the
program prints an error message and pretends it was an empty line. After compensating
for the NUL byte (lines 30-4 1), the code continues .
43 /* Jump past the text we just read. */

44 p += len;
45
46 / * If the last char isn't a newline, the whole line didn't fit int o the
47 buffer. Get some mo re buffer and try again . */
48 i f (p[-lJ ! = ' \n')
49 goto more_buffer;
50
51 / * We got a newline , so add one to the count of lines . */
52 ++n l ines;
Lines 43-52 increment the pointer into the buffer past the data just read. The code
then checks whether the last character read was a newline. The construct p [-1] (line 48)
looks at the character in front ofp, just as p [ 0] is the current character and p [1] is the
next. This looks strange at first, but if you translate it into terms of pointer math,
* (p-1) , ir makes more sense, and the indexing form is possibly easier to read.
If the last character was not a newline, this means that we've run out of space, and
the code goes off (with g o t o ) to get more (line 49). Otherwise, the line count is
incremented.
54 #if !def ined (WINDOWS32) && ! defined ( __ MSDOS __ )
55 / * Check to see if the line was really ended with CRLF; if so ignore
56 the CR . * /
57 if ((p - start) > 1 && p[-2J == ' \r' )
58 {
59 - -p;
60 p[-lJ '\n' ;
61
62 #endif
Lines 54-62 deal with input lines that follow the Microsoft convention of ending
with a Carriage Return-Line Feed (CR-LF) combination, and not just a Line Feed (or
newline), which is the Linux/Unix convention. Note that the #ifdef excludes the code
on Microsoft systems; apparently the <stdi o . h > library on those systems handles this
conversion automatically. This is also true of other non-Unix systems that support
Standard C.
64 backs lash 0;
65 for (p2 = p - 2 ; p2 >= start; --p2)
66
67 if (*p2 ! = ' \ \ ' )
68 break;
69 backslash = ! backslash;
70
71
3.2 Memory Allocation 71
72 if (! backs lash)
73 {
74 p[-lJ = '\0' ;
75 brea k;
76
77
78 /* It was a backslash/newline combo . If we h ave mo re space, read
79 anothe r line . */
80 if (end - p >= 80)
81 continue ;
82
83 / * We need more space at the end of our buffer,
so realloc it .
84 Make sure to preserve the current offset of p . */
85 more_buffer :
86
87 unsigned long off = p - start ;
88 ebuf->size *= 2 ;
89 start = ebuf->buffer = ebuf->bufstart (char * ) xrealloc ( start,
90 ebuf->size) ;
91 p = start + off ;
92 end = start + ebuf->size;
93 *p = ' \ 0' ;
94
95
So far we've dealt with the mechanics of getting at least one complete line into the
buffer. The next chunk handles the case of a continuation line. It has to make sure,
though, that the final backslash isn't part of multiple backslashes at the end of the line.
It tracks whether the total number of such backslashes is odd or even by toggling the
backs l ash variable from 0 to 1 and back. (Lines 64-70.)
If the number is even, the test'! bac ks la s h' (line 72) will be true. In this case, the
final newline is replaced with a NUL byte, and the code leaves the loop.
On the other hand, if the number is odd, then the line contained an even number
of backslash pairs (representing escaped backslashes, \ \ as in C), and a final backslash-
newline combination. 5 In this case, if at least 80 free bytes are left in the buffer, the
program continues around the loop to read another line (lines 78-81). (The use of
the magic number 80 isn't great; it would have been better to define and use a symbolic
constant.)
5 This code has the scent of practical experience abo ut it: It wo uldn 't be surprising to lea rn that earli er versions
simply checked for a final backslash before the newline, until so meone co mplained th at it didn 't wo rk when there
we re multiple backslashes at th e end of the line.
Upon reaching line 83, the program needs more space in the buffer. Here's where
the dynamic memory management comes into play. Note the comment about preserving
p (lines 83-84); we discussed this earlier in terms of reinitializing pointers into dynamic
memory. end is also reset. Line 89 resizes the memory.
Note that here the function being called is xrealloe ( ) . Many GNU programs use
"wrapper" functions around malloe () and realloe () that automatically print an
error message and exit if the standard routines return NULL. Such a wrapper might look
like this:
extern const char *myname; / * se t in main( ) * /
v o id *xreall oc(vo i d *ptr, size_t amount )
void *p = reall o c (ptr, amount ) ;
if (p == NULL ) {
fprintf(stderr, "%s: out of memory' \ n", myname ) ;
exit (1 ) ;
Thus, if xrealloe () returns, it's guaranteed to return a valid pointer. (This strategy
complies with the "check every call for errors" principle while avoiding the code clutter
that comes with doing so using the standard routines directly.) In addition, this allows
valid use of the construct 'ptr = xrealloc (ptr , new_size)', which we otherwise
warned against earlier.
Note that it is not always appropriate to use such a wrapper. If you wish to handle
errors youtself, you shouldn't use it. On the other hand, if running out of memory is
always a fatal error, then such a wrapper is quite handy.
97 if (ferr or (ebuf->fp ) )
98 pfatal_with_name (ebuf->fl o c . filenm ) ;
99
100 / * If we found some lines , return how many .
101 If we didn ' t, but we did find _something_, that indicates we read the last
102 line of a file with n o final newline; return 1 .
103 If we read nothing, we're at EOF; return -1 . */
104
105 return nlines ? nlines : p == ebuf->bufstart ? - 1 : 1;
106
3. 2 Memory Allocation 73
Finally, the read l i ne () ro utine checks for I/O errors, and then returns a descriptive
return value. The fun ction pf atal_wi th_name () (line 98) doesn ' t return.
3 .2.1.9 GLlBC Only: Re ading Entire Lines: getl ine () and getde lim ( )
Now that yo u've seen how to read an arbitrary-length line, you can breathe a sigh
of relief that yo u don ' t h ave to write such a function for yourself. GLIBC provides two
functions to do this for you:
#de fine _GNU_SOURCE 1 CUBe
#inc l ude <stdio . h>
#include <sys/typ e s. h> / * for ssize_ t */
ssi ze_t getline(c har * *linept r, s ize_t *n, FI LE * stream) ;

ssi ze_ t getdelim( char **lineptr, size_ t *n , in t delim , FILE *stream) ;
Defining the constant _GNU_SOURCE brings in the declaration of the getl ine ( )
and getdelim() functions. Otherwise, they're implicitly declared as returning int o
<sys / typ es. h> is needed so you can declare a variable of eype ssi ze_t ro hold the
return value. (An ss i ze_ t is a "signed size_t ." It's meant for the same use as a size_t ,
bur for places where yo u need to be ab le to hold n egative values as well.)
Both functi ons manage dynamic storage for you, ensuring that the buffer containing
an input line is always big enough to hold the input line. They differ in that getline ( )
reads until a newline character, and get delim () uses a user-provided delimiter character.
The com mon arguments are as fo llows:
char ** lineptr
A pointer to a char * pointer to hold the address of a dynamically allocated
buffer. It sho uld be initialized to NULL if yo u want getl ine () to do all the work.
O therwise, it should point to storage previo usly obtained from malloc ( ) .
size t *n
An indication of the size of the buffer. If yo u allocated yo ur own buffer, *n sho uld
co ntain the buffer's size. Both functions update *n to the new buffer size if they
change it.
FI LE *str eam
The location from which to ge t input characters.
74 Chapter 3 • U ser-Level Memo ry M anagement
The functions return -1 upon end-of-file or error. The strings hold the terminating
newline or delimiter (if there was one) , as well as a terminating zero byte. Using
get1ine () is easy, as shown in ch 03-getline . c :
/ * ch03- getl i n e . c -- - d e monstrate g e tl i ne() . */
#define _ GNU_SOURCE 1
#include <s t dio.h>
# include <s ys /types . h>
/ * main -- - re ad a li n e and echo it back ou t until EOF. */
int main (v oi d )
char * line = NULL;

siz e _ t si ze =0;
ss iz e_t r et ;
while (( r et = getl in e (& line, & s ize, stdin)) != -1)

pri n tf("(%lu ) %s " , size , l i n e );
return 0 ;
H ere it is in action, showing the size of the buffer. The third input and output lines
are purposely long, to force get l ine () ro grow the buffer; thus, they wrap around:
$ ch03-getline Run the program
this is a line
(120) th i s i s a line
And another line.
( 12 0) And a nother line .
A llllllllllllllllloooooooooooooooooooooooooooooooonnnnnnnnnnnnnnnnnnngggg
gggggggg llliiiiiiiiiiiiiiiiiiinnnnnnnnnnnnnnnnnnnneeeeeeeeee
(240) A ll l llll llllll llll oooooooooo o ooooooooooooo ooo ooooonnnnnnnnnnnnnnnn g
nnnggggggggggg llli ii i iiii iiii iiii ii innnnnnnnnnnnnnnnnnnn e e e e eeeeee
3.2.2 String Copying: strdup ()

One extremely common operation is ro allocate storage for a copy of a string. It's so
common that many programs provide a simple function for it instead of using inline
code, and often that functi on is named s trdup () :
3.2 M emory Alloca(ion 75
#include <str ing . h>
/ * strdup --- malloc() storage for a copy o f string and copy it * /
char *strdup ( const char *str )

(
size_t len;
char *copy ;
len = strlen (str ) + 1 ; / * include room for terminating ' \0' */

copy = malloc(len) ;
if (copy != NULL )
strcpy ( copy, str ) ;
return copy; / * return s NULL i f err o r * /

}
With the 200 1 POSIX standard, programmers the world over can breathe a little
easier: This function is now part of POSIX as an XSI extension:
#include <string . h> XSI
c har *strdup(cons t c har *str ) ; Duplicate str

The return value is NULL if there was an error or a pointer to dynamically allocated
sto rage holding a copy of str . The returned value should be freed with free () when
it's no longer needed.
3.2.3 System Calls: brk () and sbrk ( )

The four routines we've covered (rnall oc ( ), calloc ( ), real loc (), and free () )
are the standard, portable functio ns to use for dynamic memory managem ent.
On Unix systems, the standard functions are implemented on top of two additional,
very primitive routines, which directly change the size of a process's address space. We
present them here to help yo u understand how GNU/Linux and Unix work ("under
the hood" again); it is highly unlikely that you will ever need to use these functio ns in
a regular program. They are declared as follows:
#include <unistd . h> Common
#inc lude <malloc . h> / * Ne cessary for GLIBC 2 systems */
int brk(void *end_data_segment) ;

void *sbrk(ptrdiff_t increment) ;
The brk () system call actually changes the process's address space. The address is a
pointer representing the end of the data segment (really the heap area, as shown earlier
in Figure 3.1). Its argument is an absolute logical address representing the new end of
the address space. It returns 0 on success or - 1 on failure.
The s b r k () function is easier to use; its argument is the increment in bytes by which
to change the address space. By calling it with an increment of 0, you can determine
where the address space currently ends. Thus, to increase your address space by 32
bytes, use code like this:
char *p = (char *) sbrk (O) ; /* get current end of address space * /
if (brk(p + 32) < 0) (
/ * handle error * /
/ * else, change worked * /
Practically speaking, you would not use brk () directly. Instead, you would use
sbr k () exclusively to grow (or even shrink) the address space. (We show how to do
this shortly, in Section 3.2.5, "Address Space Examination," page 78.)
Even more practically, you should never use these routines. A program using them
can' t then use rna ll oc () also, and this is a big problem, since many parts of the standard
library rely on being able to use rna l l oc ( ) . Using b r k () or sbr k () is thus likely to
lead to hard-to-find program crashes.
But it's worth knowing about the low-level mechanics , and indeed, the rnalloc ( )
suite of routines is implemented with sbr k () and brk ( ) .
3.2.4 Lazy Programmer Calls: alloca ()

"Danger, Will Robinson! Danger! "
-The Robot-
There is one additional memory allocation function that you should know about.
We discuss it only so that you'll understand it when you see it, but you should not use
it in new programs! This function is named allo c a ( ); it's declared as follows:
/ * Header on GNU/Linux, possibly not all Unix systems * / Common
#include <alloca . h>
void *alloca (size_t size ) ;

3.2 Memory Allocatio n 77
The all oca () function allocates size bytes from the stack. What's nice about this
is that the allocated storage disappears when the function returns. There's no need to
explicitly free it because it goes away automatically, just as local variables do .
At first glance, alloca () seems like a programming panacea; memory can be allo-
cated that doesn't have to be managed at all. Like the Dark Side of the Force, this is
indeed seductive. And it is similarly to be avoided, for the following reasons:
• The function is nonstandard; it is not included in any formal standard, either ISO
Cor POSDC
• The function is not portable. Although it exists on many Unix systems and
GNU/Linux, it doesn't exist on non-Unix systems. This is a problem, since it's
often important for code to be multiplatform, above and beyo nd just Linux
and Unix.
• On some systems, alloca () can't even be implemented. All the world is not an
Intel x86 processor, nor is all the world GCe.
• Quoting the manpage (emphasis added): "The all oca function is machine
and compiler dependent. On many systems its implementation is buggy. Its use is
disco uraged."
• Quoting the manpage again : "On many systems all oca cannot be used inside
the list of arguments of a function call, because the stack space reserved by alloca
would appear on the stack in the middle of the space for the fun ction arguments."
• It encourages sloppy coding. Careful and correct memory management isn't hard;
you just to have to think about what you're doing and plan ahead.
GCC generally uses a built-in version of the function that operates by using inline
code. As a result, there are other consequences of alloca ( ). Quoting again from
the manpage:
The fact that the code is inlined means that it is impossible to take the address
of this function, or to change its behavior by linking with a different library.
The inlined code often consists of a single instruction adjusting the stack
pointer, and does not check for stack overflow. Thus, there is no NULL error
return.
78 Chapter 3 • User-Level Memory Managemem
The manual page doesn 't go quite far enough in describing the problem with Gee's
built-in alloca ( ) . If there's a stack overflow, the return value is garbage. And you have
no way to tell! This Haw makes GCC's alloca () impossible to use in robust code.
All of this should convince you to stay away from alloca () for any new code that
you m ay write. If yo u' re going to have to write portable code using malloc () and
free () anyway, there's no reason to also write code using alloca ( ) .
3.2.5 Address Space Examination

The following program, ch 03 -mernaddr . c, summarizes everything we've seen about
the address space. It does many things that yo u should not do in practice, such as call
all oc a () or use brk () and sbrk () directly:
1 / *
2 * ch03-memaddr . c Show address of code, data and stack sections,
3 as well as BSS and dynamic memory .
4 */
5
7 #include <malloc . h> / * for definition of ptrdifCt on GLIBC * /
8 #include <unistd . h>
9 #include <alloca . h> / * for demonstration only * /
10
11 extern void afunc (void) ; / * a function for showing stack growth * /
12
13 int bss_var ; / * auto init to 0 , should be in BSS * /
14 int data_var 42; / * init to nonzero / should be data * /
15
16 int
17 main(int argc, char **argv ) / * arguments aren't used * /
18
19 char *p, *b, *nb;
20
21 printf ( 'Text Locations : \n');
22 printf('\tAddress of main : %p\n', main ) ;
23 printf('\tAddress of afunc: %p\n', afunc);
24
25 printf('Stack Locat i ons:\n');
26 afunc () ;
27
28 p = (char *) alloca(32 );
29 if (p '= NULL) {
30 printf('\tStart of alloca() 'ed array : %p\n', p);
31 printf('\tEnd of alloca() 'ed array: %p\n', p + 31) ;
32
33
34 printf ( "Data Locations : \n " ) ;

35 printf( "\tAddress of data_var : %p\n", & data_var) ;
36
37 printf("BSS Locations : \n " ) ;
38 printf("\tAddress of bss_var : %p\n", & bss_var) ;
39
40 b = sbrk((ptrdiff_t) 32); 1* grow address space *1
41 nb = sbrk((ptrdiff_t) 0) ;
42 printf("Heap Locations : \ n" ) ;
43 printf("\tInitial end of heap : %p\n", b) ;
44 printf("\tNew end of heap : %p \n", nb) ;
45
46 b = sbrk( (ptrdiff_t) -16) ; 1* shrink it *1
47 nb = sbrk( (ptrdiff _ t) 0) ;
48 printf(" \tF inal end of heap : %p\n", nb) ;
49
50
51 void
52 afunc (void)
53 {
54 static int level = 0 ; 1* recursion level *1
55 auto int stack_var; 1* automatic variable, on stack * 1
56
57 if ( ++ level == 3) 1* a v oid infinite recursion * 1
58 return;
59
60 printf("\tStack leve l %d : address of stack_var : %p\n ",
61 level , & stack_var);
62 afunc () ; 1 * recursive call * 1
63
T his program prints the locations of the two functions main () and afune () (lines
22-23). It then shows how the stack grows downward, letting a f une ( ) (lines 51 -63)
print the address of successive instantiations of its local variable s taek_var. (s t a e k_v ar
is purposely declared a ut o, to emphasize that it's on the stack.) It then shows the loca-
tion of memory allocated by a lloe a () (lines 28-32). Finally it prints the locations of
data and BSS variables (lines 34-38), and then of memory allocated directly through
sbrk () (lines 40-48). Here are the results when the program is run on an Intel
GNU/Lin ux system:
$ ch03-memaddr
Text Locations :
Address of main : Ox 804838c
Address of afunc : Ox8 0484a8
Stack Locations :
Stack level 1 : address of stack_var : Oxbfff f864
Stack l evel 2 : address o f stack_var : Oxbff ff844 Stack grows downward
Start of alloca()'ed array : Oxbffff8 60
End o f alloca() 'ed array : Ox bf fff 87f Addresses are on the stack
Data Locations:
Address of data_var: Ox80496b8
BSS Locations :
Address of bss_var: Ox80497c4 ass is above data variables
Heap Locations :
Initial end of heap: Ox80497c8 Heap is immediately above ass
New end o f heap: Ox80497e8 And grows upward
Final end of heap: Ox80497d8 Address spaces can shrink
3.3 Summary
• Every Linux (and Unix) program has different memory areas. They are stored in
separate parts of the executable program's disk file. Some of the sections are loaded
into the same part of memory when the program is run. All running copies of the
same program share the executable code (the text segment). The size program
shows the sizes of the different areas for relocatable object files and fully linked
executable files.
• The address space of a running program m ay have holes in it, and the size of the
address space can change as memory is allocated and released. On modern systems,
address 0 is not part of the address space, so don ' t attempt to dereference
NULL pointers.
• At the C level, memory IS allocated or reallocated with one of mall oe ( ) ,

ealloe ( ) , or realloe ( ) . M emory is freed with free ( ) . (Although rea lloe ( )
can do everything, using it that way isn' t recommended). It is unusual for freed
memory to be removed from the address space; instead, it is reused for
later allocations.
• Extreme care must be taken to
• Free only memory received from the allocation routines,
• Free such memory once and only once,
• Free unused memory, and
• Not "leak" any dynamically allocated memory.
• POSIX provides the s trdup () function as a convenience, and GLIBC provides
getline () and getdelim() for reading arbitrary-length lines.
3.4 Exercises 81
• The low-level system call interface functions, brk () and sbrk ( ) , provide direct
but primitive access to memory allocation and deallocation. Unless yo u are wri ting
your own storage allo cator, you should not use them.
• The alloca () function for allocating memory on the stack exists, but is not rec-
ommended. Like being able to recognize poiso n ivy, you should know it only so
that yo u'll know to avoid it.
Exercises
1. Starting with the structure-

struct line {
char *buf;
FILE * fp;
};
-write yo ur own r eadline () function that will read an any-length line.

Don't worry about backslash continuation li nes. Instead of using fge ts () to
read lines, use g etc () to read characters one at a time.
2. Does your function preserve the terminating newline? Explain why o r why not.
3. How does your function handle lines that end in CR-LF?
4. How d o you initialize the structure? With a separate routine? With a document-
ed requirement for specific values in the structure?
5. How do yo u indicate end-of-file? How do you indicate that an I/O error has
occurred? For errors , should your function print an error message? Explain why
or why not.
6. Write a program that uses yo ur functi on to test it, and another program to
generate input data to the first program. Test your functio n.
7. Rewrite your function to use fgets () and test it. Is the new code more complex
or less complex? H ow does its performance compare to the getc () version?
8. Study the V7 end(3) manpage (/ usr / man / man3 / end. 3 in the V7 distribution).
Does it shed any light on how 'sbrk (0)' might work?
9. Enhance ch03 - memaddr. c to print our the location of the arguments and the
environment. In which part of the address space do they reside?
In this chapter
• 4 .1 Introducing th e Linuxj Unix I/ O Mod el page 84

• 4.2 Presenting a Bas ic Program Structure page 84
• 4 .3 Determining What Went Wrong page 86
• 4.4 Doing Input and Output page 91

• 4.5 Rand om Access : Moving Aro und wit hin a Fi le page 10 2
• 4 .6 Creating Fi les page 106
• 4 .7 Fo rcin g Data to Disk page 11 3
• 4.8 Setting File Length page 114
• 4.9 Sum m ary page 115
• Exercise s page 115
83
T his chapter describes basic file operations: opening and creating files , reading
and writing them, moving around in them, and closing them. Along the way
it presents the standard mechanisms for detecting and reporting errors. The chapter
ends off by describing how to set a file's length and force file data and metadata
to disk.
4.1 Introducing the Linux/Unix I/O Model

The Linux/Unix API model for I/O is straightforward. It can be summed up in four
words: open, read, write, close. In fact , those are the names of the system calls: open ( ) ,
read ( ) , write (), close () . H ere are their declarations:
#include <sys /types.h> POSIX
#include <sys/stat . h> /* for mode - t * /
#include <fcntl .h> / * for flags for open ( ) * /
#include <unistd.h> /* for ssize- t * /
int open(const char *pathname, int flags, mode_t mode ) ;

ssize_t read ( int fd, void *buf, size_t count ) ;
ssize_t write(int fd, canst void *buf, size_t count ) ;
int close (int fd);
In the next and subsequent sections, we illustrate the model by writing a very simple
version of cat . It's so simple that it doesn 't even have options; all it does is concatenate
the contents of the named files to standard output. It does do minimal error reporting.
Once it's written, we compare it to the V7 cat .
We present the program top-down, starting with the command line. In succeeding
sections, we present error reporting and then get down to brass tacks, showing how to
do actual file I/O .
4.2 Presenting a Basic Program Structure

Our version of ca t follows a structure that is generally useful. The first part starts
with an explanatory comment, header includes, declarations, and the main () function:
1 / *
2 * ch04-cat . c Demonstrate open( ) , read () , write(), close(),
3 errno and strerror () .
4 */
5
6 #include <stdio.h> / * for fprintf () , stderr, BUFSIZ * /
7 #include <errno . h> /* declare errno * /
8 #include <fcntl.h> / * for flags for open() * /
9 #include <string.h> / * declare strerror () * /
84
4 .2 Preseming a Basic Program Structure 8S
10 #include <unistd . h> / * f or ssize_t * /

11 #include <sys/types . h>
12 #incl ude <sys/stat . h> / * for mode_t */
13
14 char *myname;
15 int proc es s (char *file);
16
17 / * main - -- loop over file argume n ts */
18
19 int
20 main(in t argc , char **argv )
21
22 int i;
23 in t errs = 0;
24
25 myname = argv [O];
26
27 if (argc == 1)
28 errs process( " -") ;
29 el se
30 for (i 1· i < argc; i++ )
31 errs += proc es s (argv[ i ]) ;
32
33 retu rn (er r s ! = 0) ;
34
... co ntinued later in the chapter ..
The myname variable (line 14) is used later for error messages ; mai n () sets it to the
program name (argv [01) as its first action (line 25). Then ma in () loo ps over the ar-
guments. For each argument, it calls a function named proces s () to d o the wo rk.
When given the filen ame - (a single dash, or minus sign) , Unix cat reads standard
input instead of trying to open a file n amed -. In addition , with n o arguments, ca t
reads standard input. ch04-cat implements both of these behaviors. T he check for
'arg c == l' (line 27) is true when there are no filen ame arguments; in this case, ma in ( )
passes" -" to proc ess ( ) . Otherwise, main () loops over all the arguments, treating
them as files to be processed. If one of them happens to be "-", the program then
processes standard input.
If process () returns a n onzero value, it means that so mething went wrong. Errors
are added up in the er rs variable (lines 28 and 3 1) . When main () ends, it returns 0
if there were no errors, and 1 if there were (line 33) . This is a fairly standard co nventio n ,
whose meaning is discussed in more detail in Section 9.1.5.1, "Defining Process Exit
Status," page 300.
86 Chapter 4 • Files and File 1/0
The structure presented in main () is quite generic: pro ces s () could do anything
we want to the file. For example (ignoring the special use of" - "), process () could
just as easily remove files as concatenate them!
Before looking at the process () function, we have to describe how system call errors
are represented and then how I/O is done. The process () function itself is presented
in Section 4.4.3 , "Reading and Writing, " page 96.
4.3 Determining What Went Wrong

"If anything can go wrong, it will."
-Murphy's Law-
"Be prepared ."
-The Boy Scouts-
Errors can occur anytime. Disks can fill up, users can enter invalid data , the server
on a network from which a file is being read can crash, the network can die, and so on.
It is important to always check every operation for success or failure.
The basic Linux system calls almost universally return -1 on error, and 0 or a positive
value on success. This lets you know that the operation has succeeded or failed:
int result;
result = some_system_call(paraml, param2 ) ;

if (result < 0 ) (
/ * error occurred, do something * /
else
/* all ok, proceed * /
Knowing that an error occurred isn't enough. It's necessary to know what error oc-
curred. For that, each process has a predefined variable named errno. Whenever a
system call fails, errno is set to one of a set of predefined error values. errno and the
predefined values are declared in the <errno . h> header file:
#include <errno . h> ISO C
e x tern int errno ;
errno itself may be a macro that acts like an int variable; it need not be a real integer.
In particular, in threaded environments, each thread will have its own private version
of errno. Practically speaking, though , for all the system calls and functions in this
book, you can treat errno like a simple into
4.3 Derermining Whar Wenr Wrong 87
4.3.1 Values for errno

The 200 1 POSIX standard defines a large number of possible values for errno .
Many of these are related to networking, IPe, or other specialized tasks. The man page
for each system call describes the possible errno values that can occur; thus, you can
write code to check for particular errors and handle them specially if need be. The
possible values are defined by symbolic constants. Table 4.1 lists the constants provided
by CUBe.
TABLE 4 .1
GLlBC values for errno
Name Meaning
E2BIG Argument list too long.
EACCES Permiss ion denied.
EADDRlNUSE Address in use.
EADDRNOTAVAIL Address not available.
EAFNOSUPP ORT Address family not supported.
EAGAIN Resource unavailable, try again (may be the same value as EWOULDBLOCK).
EALREADY Connection already in progress.
EBADF Bad file descriptor.
EBADMSG Bad message.
EBUSY Device or resource busy.
ECANC ELED Operation canceled.
ECHILD No child processes.
ECONNABORTED Connection aborted.
ECONNREFUSED Connection refused.
ECONNRESET Connection reset.
EDEADLK Resource deadlock would occur.
EDESTADDRREQ Destination address required.
EDOM Mathematics argument out of domain of function.
EDQUOT Reserved.
EEXI ST File exists.
88 Chapter 4 • Files and File I/O
TABLE 4 .1 (Continued)
Name Meaning
EFAULT Bad address.

EFBI G File too large.
EHOSTUNREACH Host is unreachable.
EIDRM Identifier removed.
EILSEQ Illegal byte sequence.
EINPROGRES S Operation in progress.
EINTR Interrupted function.
EINVAL Invalid argument.
EI O 110 error.
EIS CONN Socket is connected.
EISDIR Is a directory.
ELOOP Too many levels of symbolic links.
EMFILE Too many open files.
EMLINK Too many links.
EMSGSIZE Message too large.
EMULTIHOP Reserved.
ENAMETOOLONG Filename too long.
ENETDOWN Network is down.
ENETRE SET Connection aborted by network.
ENETUNREACH Network unreachable.
ENFILE Too many files open in system.
ENOBUFS No buffer space available.
ENODEV No such device.
ENOENT No such file or directory.
ENOEXEC Executable file format error.
ENOLCK No locks available.
ENOLINK Reserved.
ENOMEM Not enough space.
4.3 Determining What Wem Wrong 89
TABLE 4.1 (Continued)
Name Meaning
ENOMSG No message of the desired type.

ENOPROTOOPT Protoco l not available.
ENOS PC No space left on device.
ENOS YS Function not supported.
ENOTCONN The socket is not connected.
ENOTDIR Not a directory.
ENOTEMPTY Directory not empty.
ENOTSOCK Not a socket.
ENOT SUP Not supported.
ENOTTY Inappropriate I/O control operation .
ENX IO No such device or address.
EOPNOTSU PP Operation not supported on socket.
EOVERFLOW Value too large to be stored in data type.
EPE RM Operation not permitted.
EPIPE Broken pipe.
EPROTO Protocol error.
EPROTONOSU PPORT Protocol not supported.
EP ROTOTYPE Protocol wrong type for socket.
ERANGE Result too large.
EROFS Read-only fil e sys tem.
ESPIPE Invalid seek.
ESRCH No such process.
ESTALE Reserved.
ETIMEDOUT Connection timed o ut.
ETXTBSY Text file busy.
EWOULDBLOCK Operation would block (may be the same value as EAGAIN).
EXDEV Cross-device link.
Many systems provide other error values as well, and older systems may not have all
the errors just listed. You should check your local intro (2) and errno (2) manpages for
the full story.
II NOTE errn o should be examined only after an error has occurred and before
. further system calls are mad e. Its in itial value is o. However, noth ing changes
:~ errn o between errors, meaning that a successful system call does not reset it
I to O. You can , of course , manually set it to 0 initially or whenever you like, but
I this is rarely done.
Initially, we use e rrno only for error reporting. There are two useful functions for
error reporting. The first is pe r r or ( ) :
#include <stdio.h> ISOC
void perror(const char *s ) ;
The pe rr or ( ) function prints a program-s upplied string, followed by a colon, and

then a string describing the value of errno:
if (some_ system_ call (param1, param2) < 0 )
perror ( "system cal l failed " ) ;
return 1 ;
We prefer the s trerror ( ) function , which takes an error value parameter and returns
a pointer to a string describing the error:
#inc l ude <string . h> ISO C
char * strerro r ( in t errnum ) ;
st r err or ( ) provides maximum flexibiliry in error reportIng, Slllce fpr intf ( )

makes it possible to print the error in any way we like:
if ( s ome_system_call (paraml, param2 ) < 0 ) {
fprintf ( stderr, "%s : %d, %d : some_ s y stem_call fail e d : %s \ n",
a r gv[O ] , param1 , param2 , s t r e r ro r (e rrn o)) ;
return 1 ;
You will see many examples of both functions throughout the book.
4.3.2 Error Message Style

C provides several special macros for use in error reporting. The most widely used
are __ F I LE __ and _ _ LI NE_ _ , which expand to the name of the source file and the
4.4 Doing Inpur and Outpur 91
current line number in that file. These have been available in C since its beginning.
C 99 defines an additional predefined identifier, __ f unc __ , which represents the name
of the current function as a character string. The macros are used like this :
if (some_s ystem_c al l(paraml, param2 ) < 0) {
fpri n t f(st de rr, " %s : %s (%s %d): some _sys tem_c al l (%d , %d) faile d : %s\n ",
a r gv [0 l , __ func __ , __ FI LE__ , __ L I NE__ ,
paraml, param2, s trerror ( errno)) ;
retu rn 1 ;
Here, the error message includes not only the program 's name but also the func tion
name, source file name, and line number. The full list of identifiers useful for diagnostics
is provided in Table 4.2.
TABLE 4 .2
e99 diagnostic identifiers
Identifier eversion Meaning

DATE C89 Date of compilation in the form "Mnun nn yyyy".
FILE Original Source-file name in the form "progr am . c ".
LINE- - Original Source-file line number in the form 42.
TIME- - C89 Time of compilation in the form "hh:mm: s s" .
func C99 Name of current function , as if declared
cons t char __ fun c __ [1 = "n ame".
T he use of _ _ FI LE __ and _ _ LI NE__ was quite popular in the early days of Unix,
when most people had source code and could find the error and fix it. As Unix systems
became more commercial, use of these identifiers gradually diminished, since knowing
the source code location isn't of much help ro someone who only has a binary executable.
Today, although GNU/Linux systems come with source code, said source code often
isn ' t installed by default. Thus, using these identifiers for error messages doesn 't seem
ro provide much additional value. T he GNU Coding Standards don't even mention them.
4.4 Doing Input and Output

All 110 in Linux is accomplished through file descriptors. This section introduces file
descriptors, describes how ro obtain and release them, and explains how ro do 110
with them.
4.4.1 Understanding File Descriptors

A file descriptor is an integer value. Valid file descriptors start at 0 and go up to some
system-defined limit. These integers are in fact simple indexes into each process's table
of open files. (This table is maintained inside the operating system; it is not accessible
to a running program.) On most modern systems, the size of the table is large. The
command 'u limi t -n' prints the value:
$ ulimit -n
1024
From C, the maximum number of open files is returned by the getdtablesiz e ( )

(get descriptor table size) function:
#include <unistd . h> Common
int getdtablesize(void) ;
This small program prints the result of the function:

/ * ch04-maxfds . c --- Demonstrate getdtablesize() . * /
#include <stdi o . h> / * for fprintf(), stderr, BUFSIZ * /

#include <unistd . h> / * for ssize_t * /
int
main ( int argc, char * *argv )
printf('max fds: %d\n', getdtablesize ()) ;

exit (0) ;
}
When compiled and run, not surprisingly the program prints the same value as
printed by ulimi t :
$ ch04-maxfds
max fds : 1024
File descriptors are held in normal int variables; it is typical to see declarations of the
form ' int fd ' for use with 110 system calls. There is no predefined type for
file descriptors.
In the usual case, every program starts running with three file descriptors already
opened for it. These are standard input, standard output, and standard error, on file
descriptors 0, 1, and 2 , respectively. (If not otherwise redirected, each one is connected
to your keyboard and screen.)
4.4 Doing Input and Output 93
Obvious Man ifest Constants. An Oxymoron?

When working with file-descriptor-based system calls and the standard input, output
and error, it is common practice to use the integer constants 0 ,1, and 2 directly in code.
In the overwhelming majority of cases, such manifest constants are a bad idea. You never
know what the meaning is of so me random integer constant and whether the same
constant used elsewhere is related to it or not. To this end, the POSIX standard requires
the definition of the fo llowing symbolic constants in <uni s td . h>:
STDIN_ FILENO T he "file n umber" fo r standard input: O.
STDOUT_FILENO The fi le number for standard output: 1.
STDERR_FILENO The file number for standard error: 2.
However, in our humble opinion, using these macros is overki ll. First, it's painfol to
rype 12 or 13 characters instead of JUSt 1. Second, the use of 0 , 1, and 2 is 50 standard
and 50 well known that there's really no grounds for confusion as to the meaning of these
particular manifest constants.
On the other hand, use of these constants leaves no do ubt as to what was intended.
Co nsider this statement:
int fd = 0;
Is fd being initialized to refer to standard input, or is the programmer being careful to

initialize his variables to a reasonable value ? You can 't tell.
One approach (as recommended by Geoff Collyer) is to use the following enurn definition:
enum { Stdin, Stdout, Stderr };
These co nstants can then be used in place of 0, 1, and 2 . They are both readable and
eas ier to type.
4.4 .2 Opening and Closing Files

New file descriptors are obtained (among o ther sources) from the open () system
call. This sys tem call opens a file for reading or writing and returns a new file descriptor
for subsequent operations on the file . We saw the declaration earlier:
#include <sys /types . h> POSIX
#include <sys/sta t . h>
#include <fcntl . h>
#include <unistd . h>
int open(c onst char *pathname, in t flags, mode_t mode) ;
The three arguments are as follows :

cons t char *pathnarne

A C string, representing the name of the file to open.
int fla g s
The bitwise-OR of one or more of the constants defined in <f cntl.h> . We de-
scribe them shortly.
mode_t mode
The permissions mode of a file being created. This is discussed later in the chapter,
see Section 4.6 , "Creating Files," page 106. When opening an existing file , omit
this parameter. 1
The return value from open () is either the new file descriptor or - 1 to indicate an
error, in which case errno will be set. For simple I/O, the fla gs argument should be
one of the values in Table 4.3.
TABLE 4 .3
Flag values for open ( )
Symbolic constant Value Meaning

O_RDONLY o Open file only for reading; writes will fail.
O_ WRONLY 1 Open fil e only for writing; reads will fail.
O_RDWR 2 Open fil e for reading and writing.
We will see example code shortly. Additional values for flags are described in Sec-
tion 4.6 , "Creating Files," page 106. Much early Unix code didn't use the symbolic
values. Instead, the numeric value was used. Today this is considered bad practice, but
we present the values so that you'll recognize their meanings if you see them.
The close () system call closes a file: The entry for it in the system's file descriptor
table is marked as unused, and no further operations may be done with that file descrip-
tor. The declaration is
int close ( int fd ) ;
1 open () is one of the few variadic system calls.

4.4 Doing Inpu[ and OU[PU[ 95
°
The return value is on success, -1 on error. There isn't much you can do if an error
does occur, other than report it. Errors closing files are unusual, but not unheard of,
particularly for files being accessed over a network. Thus, it's good practice to check
the return value, particularly for files opened for writing.
If yo u choose to ignore the return value, specifically cast it to vo id, to signify that
you don't care about the result:
(vo id) close(fd) ; / * throwaway return va lue */
The flip side of this advice is that too many casts to void tend to the clutter the code.
For example, despite the "always check the return value" principle, it's exceedingly rare
to see code that checks the return value of printf () or bothers to cast it to void. As
with many aspects of C programming, experience and judgment sho uld be applied
here too.
As mentioned, the number of open files , while large, is limited, and you should always
close files when you're done with them. If you don't, you will eventually run out of file
descriptors, a situation that leads to a lack of robustness on the part of your program.
The system closes all open files when a process exits, but-except for 0,1 , and 2-it's
bad form to rely on this.
When open () returns a new file descriptor, it always returns the lowest unused integer
value. Always. Thus, if file descriptors 0- 6 are open and the program closes file descriptor
5 , then the next call to open () returns 5, not 7. This behavior is important; we see
later in the book how it's used to cleanly implement many important Unix features ,
such as 110 redirection and piping.
4.4.2. 1 Mapp ing FILE * Vari ables to File Descriptors

The Standard 110 library functions and FILE * variables from <stdio. h>, such as
stdin, stdout, and stderr, are built on top of the file-descriptor-based system calls.
Occasionally, it's useful to directly access the file descriptor associated with a
<stdio . h> file pointer if yo u need to do something not defined by the ISO C standard.
The fileno () function returns the underlying file descriptor:
#include <stdio . h> POSIX
int fileno( FILE *stream ) ;
We will see an example later, in Section 4.4.4, "Example: Unix cat ," page 99.
4.4.2 .2 Closing All Open Files

Open files are inherited by child processes from their parent processes. They are, in
effect, shared. In particular, the position in the file is shared. We leave the details for
discussion later, in Section 9.1.1.2, "File Descriptor Sharing," page 286.
Since programs can inherit open files, you may occasionally see programs that close
all their files in order to start out with a "clean slate." In particular, code like this
is typical:
int i;
1 * leave 0, 1, and 2 alone *1

for (i = 3; i < getdtablesize ( ) ; i++ )
(vo id ) close ( i ) ;
Assume that the result of get dtablesize () is 1024. This code works, bur it makes
(1024 - 3) * 2 = 2042 system calls. 1020 of them are needless, since the return value
from getdtablesize () doesn't change. Here is a better way to write this code:
int i, fds;
for (i = 3, fds = getdt ab lesize(); i < fds; i ++)

(void) close(i) ;
Such an optimization does not affect the readability of the code, and it can make a
difference, particularly on slow systems. In general, it's worth looking for cases in which
loops compute the same result repeatedly, to see if such a computation can't be pulled
our of the loop. In all such cases, though, be sure that you (a) preserve the code's cor-
rectness and (b) preserve its readability!
4.4.3 Reading and Writing

I/O is accomplished with the read () and wri te () system calls, respectively:
#include <sys/types .h> POSIX
#include <sys / stat . h>
#inc lude <fcntl . h>
#include <unistd.h>
ssize_t re a d(int fd, void *buf, size t count);

ssize_t wri te(int fd, const void *buf, size_t count);
4.4 D oing Inpm and O m pu( 97
Each function is about as simple as can be. The arguments are the file descriptor for
the open file, a pointer to a buffer to read data into or to write data from, and the
number of bytes to read or write.
The return value is the number of bytes actually read or written. (This number can
be smaller than the requested amount: For a read operation this happens when fewer
than count bytes are left in the file, and for a write operatio n it happens if a disk fill s
up or some other error occurs .) The return value is -1 if an error occurred, in which
case errno indicates the error. When r ead () returns 0, it means that end-of-file has
been reached.
We can now show the rest of the code for ch04 -cat. The p roc ess () routine uses
° if the input fil ename is "-", for standard input (lines 50 and 51) . Otherwise, it opens
the given file:
36 /*
37 * p r o ce ss do s ometh i ng with the file, in thi s case,
38 send it to stdout ( fd 1) .
39 Returns 0 if all OK, 1 otherwi se .
40 */
41
42 int
43 process(char *file)
44 {
45 int fd:
46 ssi ze_ t rcount, wcount :
47 cha r buffer[ BUFS IZ ] :
48 int errors = 0:
49
50 if ( st rcmp ( fi le, "-") == 0)
51 fd = 0 :
52 el se if ((f d = open(fil e , O_RDONLY) ) < 0) (
53 fprintf ( stderr, "%s : %s : c annot open for read ing : %s \ n",
54 myname, file, st rerr or( e r r n o )) :
55 re turn 1 :
56 }
The buffer buffe r (line 47) is of size BUFS IZ; this constant is defined by <stdio . h>
to be the "optimal" block size for I/O. Although the value for BUF SIZ varies across
systems, code that uses this constant is clean and portable.
The core of the routine is the following loop, which repeatedly reads data until either
end-of-file or an error is encountered:
58 while ( (rcount = read(fd, buffer, size o f buffer) ) > 0 ) (

59 wcount = write(l, buffer, rc ount ) ;
60 if (wcount != rcount ) (
61 fprintf ( stderr, "%s: %s: write error: %s \ n",
62 myname, file, strerror ( errno )) ;
63 errors++;
64 break;
65
66
The re ount and we aunt variables (line 45) are of type ssize_ t, "signed size_t,"
which allows them to hold negative values. Note that the count value passed to wri te ( )
is the return value from read () (line 59). While we want to read fixed-size BUFSIZ
chunks, it is unlikely that the file itself is a multiple of BUFSIZ bytes big. When the
final, smaller, chunk of bytes is read from the file, the return value indicates how many
bytes of buffer received new data. Only those bytes should be copied to standard
output, not the entire buffer.
The test 'wcount ! = reount' on line 60 is the correct way to check for write errors;
if some, but not all, of the data were written, then wcount will be positive but smaller
than reoun t.
Finally, proce ss () checks for read errors (lines 68-72) and then attempts to close
the file. In the (unlikely) event that close () fails (line 7 5) , it prints an error message.
Avoiding the close of standard input isn' t strictly necessary in this program, but it's a
good habit to develop for writing larger programs, in case other code elsewhere wants
to do something with it or if a child program will inherit it. The last statement (line 82)
returns 1 if there were errors, 0 otherwise.
68 if (rc ount < 0) (
69 fprintf ( stderr, "%s: %s : read error : %s \ n " ,
70 myname, file, strerror(errno));
71 err o rs++;
72
73
74 if ( f d '= 0) (
75 if (c l ose ( fd ) < 0 ) (
76 fprintf(stderr, " %s : %s: close error: %s \ n",
77 myname, file, strerror (errno )) ;
78 errors++;
79
80
81
82 return (errors ! = 0) ;
83
4.4 Doing Inpur and Outpur 99
ch04-c at checks every sys tem call for errors. While this is tedious, it provides ro-
bustness (or at least clarity): When so mething goes wrong, c h04-cat prints an error
message that is as specific as possible. The combination of er rno and strerror ()
makes this easy (0 do. That's it for ch04- cat , only 88 lines of code!
To sum up , there are several points (0 understand about Unix I/O:
flO is uninterpreted.
The I/O system calls merely move bytes around. They do no interpretation of the
data; all interpretation is up to the user-level program. This makes reading and
writing binary suuctures just as easy as reading and writing lines of text (easier,
really, although using binary data introduces portability problems).
flO is flexible.
You can read or write as many bytes at a time as you like. You can even read and
write data one byte at a time, although doing so for large amounts of data is more
expensive that doing so in large chunks.
110 is simple.
The three-valued return (negative for error, zero for end-of-file, positive for a
co unt) makes programming straightforward and obvious.
110 can be partial.
Both read () and wri te () can transfer fewer bytes than requested. Application
code (that is, your code) must always be aware of this.
4.4.4 Example: Unix cat

As promised, here is the V7 version of ca t. 2 It begins by checking for options. The
V7 cat accepts a single option, -u, for doing unbuffered output.
The basic design is similar (0 the one shown above; it loops over the files named by
the command-line arguments and reads each file , one character at a time, sending the
characters (0 standard outp ut. Unlike our version, it uses the <stdi o. h> facilities. In
many ways code using the Standard 1/0 library is easier to read and write, since all
buffering issues are hidden by the library.
2 See /usr / src / cmd/ cat. c in the V7 distribution. Th e program co mpiles without change under GNU/Linux.
1 1*
2 * Concatenate files .
3 *1
4
5 #include <st dio.h>
7 #include <sys/stat.h>
8
9 char stdbuf [BUFSI Z ] ;
10
11 main(argc , argyl int main (int argc, char ""argy)
12 char **ar gv;
13
14 int fflg = 0;
15 register FILE *fi;
16 register c;
17 int dey, ino = -1;
18 struct stat statb;
19
20 setbuf ( stdout, stdbuf);
21 for t ; argc >l && argv[1] [0]==' -' ; argc -- ,argv++ ) {
22 swi tch ( argv [1] [1]) { Process options
23 case 0 :
24 break;
25 case 'u':
26 setbuf (stdout, ( char * ) NULL ) ;
27 co ntinue;
28
29 brea k;
30
31 fsta t(fil eno(stdout), &statb ) ; Lines 3 1- 36 explained in Chapter 5
32 statb.st_mode &= S_IFMT;
33
34 dey statb . st_dev ;
35 inc = stat b.st_ino;
36
37 if (argc < 2 )
38 argc = 2;
39 fflg++;
40
41 whil e (--argc > 0) { Loop over files
42 if (fflg II (*++argv) [0] == ' - ' && ( *argv) [1]== ' \0')
43 fi = stdin;
44 else {
45 if ( ( f i = f open(*argv, "r " )) == NULL ) {
46 fprint f (stderr, "cat : can't op en %s \n" , *argv ) ;
47 conti nue;
48
49
4.4 Doing In pu[ and Outp ut 101
50 fstat ( filen o ( fi ) , &statb ) ; Lines 50- 56 explained in Chapter 5

51 if ( stat b . st_dev==dev && s ta tb . st_i no==ino) (
52 fp rin tf ( stderr , "cat: input %s is output \ n",
53 ff lg? " - ": *a rgv ) ;
54 f close(fi ) ;
55 co ntinue ;
56
57 while ({ c = get c( fi )) ! = EOF ) Copy file contents to stdout
58 pu tchar (c ) ;
59 if (f i!=st d i n)
60 f close (f i ) ;
61
62 re turn ( O) ;
63
Of note is that the program always exits successfully (line 62); it co uld have been
written to note errors and indicate them in ma in ( ) 's return value. (The mechanics of
process exiting and the meaning of differe nt exit status values are discussed in Sec-
tion 9.l.5.1, "Defining Process Exit Status," page 300.)
T he code dealing with the struet stat and the fsta t () functio n (lines 31-36
and 50-56) is undoubtedly opaque, since we haven't yet covered these functions, and
won ' t until the next chapter. (But do note the use of fileno () on line 50 to get at the
underlying file descriptor asso cia ted with the FILE * variables .) The idea behind the
code is to make sure that no input file is the same as the o utput file. This is intended
to prevent infinite file growth , in case of a co mmand like this:
$ cat myfile » myfile Append one copy of myfile onto itself?
And indeed, the check works:
$ echo hi > myfile Create a file
S v7cat myfile » myfile Attempt to append it onto itself
c at : input myfile is outpu t
If yo u try this with eh04-eat, it will keep running, and myfile will keep growing
until yo u interrupt it. The GNU version of c at does perform the check. Note that
something like the following is beyond eat's control:
S v7cat < myfile > myfile
ca t : input - is output
S Is -1 myfile
-rw-r--r -- 1 arnol d devel o Ma r 24 14: 17 myf i l e
In this case, it's too late because the shell truncated myf i le (with the > operator) before
cat ever gets a chance to examine the file!
In Section 5.4.4.2, "The V7 cat Revisited," page 150, we explain the struet stat
code.
4.5 Random Access: Moving Around within a File

So far, we have discussed sequentialI/O, whereby data are read or written beginning
at the front of the file and continuing until the end. Often, this is all a program needs
to do. However, it is possible to do random access I/O; that is, read data from an arbitrary
position in the file , without having to read everything before that position first.
The offiet of a file descriptor is the position within an open file at which the next
read or write will occur. A program sets the offset with the lseek () system call:
#include <sys/types . h> / * for off_ t * / POSIX
#include <un i std.h> / * decla res lseek () and whence values * /
off_t ls eek ( i nt fd, o ff_t offset , int whence) ;
The type ofet (offset type) is a signed integer type representing byte positions
(offsets from the beginning) within a file. On 32-bit systems , the type is usually a long.
However, many modern systems allow very large files, in which case off_t may be a
more unusual type, such as a C99 int64_ t or some other extended type. lseek () takes
three arguments, as follows:
in t fd
The file descriptor for the open file.
off t of fset
A position to which to move. The interpretation of this value depends on the
whenc e parameter. offset can be positive or negative: Negative values move to-
ward the front of the file; positive values move toward the end of the file.
int whence
Describes the location in the file to which o ffset is relative. See Table 4.4.
TABLE 4.4
whence values for lseek ( )

o o ffset is absolute, that is, relative to the beginning of the
file .
1 of fset is relative to the current position in the file .
2 o ffset is relative to the end of the file .

4.5 Random Access: Moving Around within a File 103
Much old code uses the numeric values shown in Table 4.4. However, any new code
you write should use the symbolic values, whose meanings are clearer.
The meaning of the values and their effects upon file positio n are shown in Figure 4. 1.
Assuming that the file has 3000 bytes and that the current offset is 2000 before each
call to lseek ( ), the new position after each call is as shown:
File start: 0 Current: 2000 File end: offset 3000
I---------------~I ···· · ........·

New position:
3040
2960
2040
1960
40
l L Lb""k"d'
lseek(fd,
l s eek ( fd ,
lse ek ( fd,
(o ff_t)
(ofCt) 40 ,
l s eek ( fd,
(o iCt ) 40,
-4 0 , SEEK_CUR) ;
SE EK_SET ) ;
10 ff. 1
(off t) -40,
SEEK_CUR ) ;
40, SEE K_END ) ;
SE EK_ END ) ;
FIGURE 4.1
Offsets for l s eek ( )
N egative offsets relative to the beginning of the file are meaningless; they fai l with
an "invalid argument" error.
The return value is the new position in the file. Thus, to find our where in the file
you are, use
curpos = l s e e k ( fd , (o ff_ t) 0, SE EK_CUR) ;
T he 1 in ls e e k () stands for long. l s eek () was inttoduced in V7 Unix when file

sizes were extended ; V6 had a simple seek ( ) system call. As a result, much old docu-
mentation (and code) treats the o ffset parameter as if it had type long, and instead
of a cast to o f f_ t , it's not unusual to see an L suffix on constant offset val ues:
curp o s = lseek ( fd , OL, SEEK_CUR ) ;
On systems with a Standard C compiler, where lseek () is declared with a prototype,

such old code continues to work since the comp iler automatically promotes the OL
from long to o fC t if they are different typ es .
One interesting and important aspect of lseek ( ) is that it is possible to seek beyond
the end of a file . Any data that are subsequently written at that point go into the file,
but with a "gap" or "hole" between the data at the previous end of the file and the new
data. Data in the gap read as if they are all zeros.
The following program demonstrates the creation of holes. It writes three instances
of a s tru c t at the beginning, middle, and far end of a file. The offsets chosen (lines
16-18, the third element of each structure) are arbitrary but big enough to demonstrate
the point:
/ * ch 04-ho l es. c Demonstrate lseek() and holes in files . * 1
2
3 #include <s t di o . h> 1* for fp r intf () , stderr, BUFSIZ * 1
4 #include <errno . h> 1* decla r e errno * 1
5 #include <fcnt l .h> 1* f or fl a gs for open ( ) * 1
6 #include <string . h> 1* decla r e strerror ( ) * 1
7 #include <unistd . h> 1* for s si z e - t * 1
8 #include <sys / types . h> 1* f or off _ t , etc. * 1
9 #include <sys / stat . h> 1* for mode - t * /
10
11 struct person (
12 char name [ 1 0] ; 1* first name *1
13 char id [1 0] ; 1* ID n umber * I
14 off_t pos; 1* posit i on in file, for demonstration * 1
15 peop l e [] = {
16 { "arno l d ", " 123456789", 0 l.
17 { "mi riam", "987654321", 10240 l.
18 "j oe " , " 192837465", 81920 },
19 };
20
21 in t
22 main ( in t argc , char * * argv )
23
24 int f d ;
25 int i, j;
26
27 if (argc < 2 ) (
28 fprintf ( stderr, "usage : %s file \ n", argv[O ]) ;
29 return 1;
30
31
32 fd = open (argv[l], O_ RDWR l o_CREATl o_TRUNC, 0666 ) ;
33 if (fd < 0 ) (
34 fprintf ( stderr, "%s : %s : cannot open for read / write : %s \ n" ,
35 a r gv[O], argv[l] , strerror(er r no )) ;
36 return 1;
37
38
39 j = sizeof (people ) I sizeof(people[O] ) ; 1* count of elements * 1
Lines 27-30 make sure that the program was invoked properly. Lines 32-37 open
the named file and verifY that the open succeeded.
4. 5 Random Access: Moving Around wirhin a File 10S
The calculati on on line 39 of j , the array element co unt, uses a lovely, portable trick:
The number of elements is the size of the entire array divided by the size of the first
element. The beauty of this idiom is that it's always right: No matter how many elements
yo u add to or remove from such an array, the compiler will figure it out. It also doesn' t
require a terminating sentinel element; that is, one in which all the fields are set to zero,
NU LL, or some such.
The work is done by a loop (lines 41-55), which seeks to the byte offset given in
each structure (line 42) and then writes the structure out (line 49):
41 for (i = 0; i < j ; i ++ ) {
42 if (lsee k (fd , people [i] .pos, SEE K_SET ) < 0) (
43 fprintf(stde rr, "%s : %s : see k er ror : %s\n ",
44 argv [O] , argv [ l], st rerro r (errno)) ;
45 (void) close ( fd ) ;
46 ret urn 1 ;
47
48
49 i f (wri te( fd, &peopl e [i], si ze of(people[i])) '= siz eo f (pe ople[ i] ))
50 fprintf ( s tderr, "%s : %S : writ e error : %s\ n",
51 argv[O], ar gv[l], str er ror(errn o)) ;
52 (void) clos e ( fd) ;
53 return Ii
54
55
56
57 / * all ok here * /
58 (void) close(fd);
59 return 0 ;
60
Here are the results when the program is run:

$ ch04-holes peoplelist Run the program
$ Is -Is peoplelist Show size and blocks used
16 -rw-r--r - - 1 arnold devel 81944 Mar 23 17 : 43 people list
$ echo 81944 / 4096 I be -1 Show blocks if no holes
20 . 005859375000000000 00
We happen to know that each disk block in the file uses 4 096 bytes. (H ow we know
that is discussed in Section 5.4.2, "Retrieving File Information, " page 14l. For now,
take it as a given.) The final be command indicates that a file of size 81 ,944 bytes needs
2 1 disk blocks. However, the -s op tion to 18, which tells us how many blocks a file
really uses, shows that the file uses only 16 blocks!3 The missing blocks in the file are
the holes. This is illustrated in Figure 4.2.
,,;_.
arno ld miriam joe
Block: 3 21
I~ Logical file size
FIGURE 4.2
Holes in a file
I
II NOTE ch0 4-ho1es. c does direct binary I/ O . This nicely illustrates the beauty
of random access I/ O: You can treat a disk file as ifit were a very large array of
binary data structures.
II In practice, storing live data by using binary I/ O is a design decision that you
should consider carefully. For example, suppose you need to move the data to
W a system using different byte orders for integers? Or different floating-point
IIformats? Or to a system with different alignment requirements? Ignoring such

issues can become significantly costly.
4.6 Creating Files

As described earlier, open () apparently opens existing files only. This section describes
how brand-new files are created. There are two choices: creat () and open () with
additional flags. Initially, crea t () was the only way to create a file, but open () was
later enhanced with this functionality as well. Both mechanisms require specification
of the initial file permissions.
4.6 . 1 Specifying Initial File Permissions

As a GNU/Linux user, yo u are familiar with file permissions as printed by 'l s -1':
read, write, and execute for each of user (the file 's owner), group, and other. The various
3 At least three of th ese blocks conrain the data that we wrote ou t; the others are for use by the operating system
in keeping track of where the data reside.
4.6 Creating Files 107
combinations are often expressed in octal, particularly for the chmod and umask com-
mands. For example, file permissions -rw-r--r-- is equivalent to octal 0644 and
-rwxr-xr-x is equivalent to ocral 075 5 . (The leading 0 is C's notation for octal values .)
When yo u create a file, you must know the protections to be given to the new file.
You can do this as a raw octal number if you choose, and indeed it's not uncommon
to see such numbers in older code. However, it is better to use a bitwise OR of one or
more of the symbolic constants from <sys/ stat . h>, described in Table 4.5.
TABLE 4.5
POSIX symbo lic constants for file modes

S IRWXU 00700 User read, write, and execute permission.
S IRU SR 00400 User read permission.
S IREAD Same as S_ IRUSR.
S lWU SR 00200 User write permission.
S IWRITE Same as S_IWUSR.
S IXUSR 00100 User execute permission.
S IEXEC Same as S_ IXUSR.
S_IRWXG 00070 G roup read, write, and execute pe rmission.
S IRGRP 000 40 Group read permission.
S IWGRP 00020 Group write permission.
S IXGRP 00010 Gro up execute permission.
S IRWXO 00007 Other read, write, and execute permission.
S IROTH 00004 Other read permission .
S IWOTH 00002 Other write permission.
S IXOTH 00001 Other execute permission.
The following fragment shows how to create variables representing permlsslOns

-rw-r--r-- and -rwxr-xr-x (06 44 and 0755 respectively):
rW_ffiode S IRUSR S lWUSR S IRGRP S_IROTH; / * 0644 */

rwx_ffiode S IRWXU S IRGRP S IXGRP S_I ROTH I S_IXOTH; / * 0 755 * /
Older code used S_IREAD, S_IWRITE, and S_IEXEC (Ogether with bit shifting (0
produce the same results:
rw_mode (S_IREADls_IWRITE) I (S_IREAD» 3) I (S_IREAD» 6); /* 0644 * /

rwx_mode = (S_IREAD I S_IWRITE I S_IEXEC) I
( (S_IREAD I S_IEXEC) » 3) I (( S_IREAD I S_IEXEC) » 6); / * 0755 * /
Unfortunately, neither notation is incredibly clear. The modern version is preferred

since each permission bit has its own name and there is less opportunity (0 do the bitwise
operations incorrectly.
The additional permission bits shown in Table 4.6 are available for use when you
are changing a file 's permission , but they should not be used when yo u initially create
a file. Whether these bits may be included varies wildly by operating system. It's best
not (0 try; rather, you should explicitly change the permissions after the file is created.
(Changing permission is described in Section 5.5.2, "Changing Permissions: chmod ( )
and fchmod()," page 156. The meanings of these bits is discussed in Chapter 11 ,
"Permissions and User and Group ID Numbers," page 403.)
TABLE 4 .6
Additional POSIX symbolic constants for file modes

S_ISUID 04000 Set user 10.
S_ISGID 02000 Set group 10.
S_ISVTX 01000 Save text.
When standard utilities create files, the default permissions they use are - rw-rw-rw-
(or 0666 ). Because most users prefer (0 avoid having files that are world-writable, each
process carries with it a umask. The umask is a set of permission bits indicating those
bits that should never be allowed when new files are created. (The umask is not used
when changing permissions.) Conceptually, the operation that occurs is
actual-permissions = (requested-permissions & (-umask));
The umask is usually set by the umask command in $HOMEI .profile when you
log in. From a C program, it's set with the umask () system call:
4. 6 Crearing Files 109
# include <sys/typ es . h > POSIX

#i n clude <sys / sta t . h>
The return value is the old umask. Thus, to determine the current mask, you must
set it to a value and then reset it (or change it, as desired):
mo d e_ t mas k = umas k ( O) ; I x re trieve cu rr ent mas k * 1
(v o i d ) umas k (ma s k ) ; 1* res tore i t * 1
Here is an example of the umask in actio n, at the shell level:

$ umask Show the curren t mask
0022
S touch newfi1e Create a file
$ 1s -1 newfi1e Show perm issions of new file
- r w- r - -r -- 1 a rnold devel o Ma r 24 15 : 43 newfi l e
$ wnask 0 Set mask to empty
$ touch newfi1e2 Create a second fi le
S 1s -1 newfi1e2 Show permissions of new file
- rw -rw-rw- 1 arnol d devel o Ma r 24 15 : 44 new fil e 2
4.6.2 Creating Files with crea t ( )

The crea t () 4 system call creates new files. It is declared as follows:
#include <s ys / t ypes . h> POSIX
# include <s y s /stat . h>
# include <f cnt l . h >
in t c r e at (const cha r *pathname , mode_ t mode) ;
The mo de argument represents the permissions for the new file (as discussed in the
previous section). The file named by p athname is created, with the given permission
as modified by the umask. It is opened for writing (only) , and the return val ue is the
file descriptor for the new file or -1 if there was a problem. In this case, er rno indicates
the error. If the file already exists , it will be truncated when opened.
In all other respects, file descriptors returned by creat () are the same as those
returned by open ( ) ; they're used for writing and seeking and must be closed with
c lo s e () :
4 Yes, rhar's how ir's spelled. Ken T hompson , one of rhe [wo "fa rh ers" of Unix, was once asked whar he wo uld
have done differendy if he had ir co do ove r again. H e rep lied rhar he would have speLl ed c reat () wirh an "e."
Indeed, rhar is exacrly whar he di d for rhe Plan 9 From Bell Labs o perating system.
int fd, count;
/* Error checking om itted for brevity * /

fd = creat( " /some/new/ file", 0666) ;
count = write(f d , "some data\ n ", 10 ) ;
(void) clo s e (fd) ;
4.6.3 Revisiting open ( )

You may recall the declaration for open ( ) :
int open( const char *pathname, int flag s, mode_t mode ) ;
Earlier, we said that when opening a file for plain I/O , we could igno re the mode
argument. Having seen crea t ( ), though, you can probably guess that open () can
also be used for creating files and that the mode argument is used in this case. This is
indeed true.
Besides the O_RDONLY, O_WRONLY, and O_RDWR flags, additional flags may be bitwise
OR'd when open () is called. The POSIX standard mandates a number of these addi-
tional flags. Table 4.7 presents the flags that are used for most mundane applications.
TABLE 4.7
Additional POSIX flags for open ( )
Flag Meaning
O_APPEND Force all wri tes to occur at the end of the fil e.
O_CREAT C reate the fil e if it doesn 't exist.
O_EXCL When used with O_ CREAT, ca use open () to fail if the file already exists.
O_TRUNC Truncate the file (set it to zero length) if it exists.
G iven O_APP END and O_TRUNC, you can imagine how the shell might open or create
files corresponding to the > and » operators. For example:
int f d;
extern char *filename;
mode_t mod e = S_IRUSRI S_IWUSR I S_IRGRP I S_IWGRPIS_IROTH I S_IWOTH; / * 0666 * /
fd = open ( filenam e, O._CREAT IO_WRONLYIO_TRUNC , mode ) ; /* for> * /
fd = open( fi lename, O_CREATlo_ WRONLYIO_APPEND, mode); /* for » */

4.6 Creating Files 111
Note that the O_EXCL flag would not be used here, since for both > and », it's not
an erro r for the file to exist. Remember also that the sys tem applies the umask to the
req uested permissions.
Also, it's easy to see that, at least conceptually, cr e at () could be wri tten this easily:
inc creat (const char *path, mode _t model
'i·.~· NOTE If a file is opened wi th O_ APPEND, al l data will be written at the end of
* the file , eve n if the c urrent po si tion has been reset w ith l s eek ( ) .
:(.~
Modern systems provide additional flags whose u ses are more specialized . Table 4.8
describes them briefly.
TABLE 4.8
Addition a l advanced POSIX flags for open ( )
Flag Meaning
O_ NOC TTY If the device being opened is a terminal, it does not become the process's
controlling terminal. (This is a more advanced topic, discussed briefly in
Sectio n 9.2. 1, page 312.)
O_ NONBLOCK Disables blocking o fI /O operations in certain cases (see Section 9.4.3 .4,
page 333) .
° DSYNC Ensure that data written to a file make it all the way to physical storage before
wri te () returns.
° RSYNC Ensure that any data that read () wo uld read, which may have been written
to the file being read, have made it all the way to physical storage before
read () returns.
Like O_DSYNC , but also ensure that all file metadata, such as access times, have
also been written to physical storage.
The O_DSYNC, O_RS YNC, and O_ SYNC flags need so me explanation. Unix sys tems
(including Linux) maintain an internal cache of disk blocks, called the buffer cache.
When the wri te () system call returns , the data passed to the operating system have
been copied to a buffer in the buffer cache. They are not necessarily written out to
the disk.
The buffer cache provides considerable performance improvement: Since disk 110
is often an order of magnitude or more slower than CPU and memory operations,
programs would slow down considerably if they had to wait for every write to go all
the way through to the disk. In addition, if data have recently been written to a file, a
subsequent read of that same data will find the information already in the buffer cache,
where it can be returned immediately instead of having to wait for an I/O operation
to read it from the disk.
Unix systems also do read-ahead; since most reads are sequential, upon reading one
block, the operating system will read several more consecutive disk blocks so that their
information will already be in the buffer cache when a program asks for it. If multiple
programs are reading the same file, they all benefit since they will all get their data from
the same copy of the file's disk blocks in the buffer cache.
All of this caching is wonderful, but of course there's no free lunch. While data are
in the buffer cache and before they have been written to disk, there's a small-but very
real-window in which disaster can strike; for example, if the power goes out. Modern
disk drives exacerbate this problem: Many have their own internal buffers, so while
data may have made it to the drive, it may not have made it onto the media when the
power goes our! This can be a significant issue for small systems that aren't in a data
center with controlled power or that don 't have an uninterruptible power supply (UPS). 5
For most applications, the chance that data in the buffer cache might be inadvertently
lost is acceptably small. However, for some applications , any such chance is not accept-
able. Thus, the notion of synchronous I/O was added to Unix systems, whereby a program
can be guaranteed that if a system call has returned, the data are safely written on a
physical storage device.
The O_ DSYNC Bag guarantees data integrity; the data and any other information that
the operating system needs to find the data are written to disk before wri te () returns.
However, metadata, such as access and modification times, may not be written to disk.
The O_S YNC Bag requires that metadata also be written to disk before wri te ( ) returns.
(Here too there is no free lunch; synchronous writes can seriously affect the performance
of a program, slowing it down noticeably.)
5 If you don 't have a UPS and you use your system for critical work, we highly recommend investing in one. You
should also be doing regular backups.
4.7 Forcing Daca [0 Disk 113
The O_ RSYNC Hag is for data reads: If read () finds data in the buffer cache tha t were
scheduled for writing to disk, then re ad () won' t return that data until they have been
written to disk. The other two Hags can affect this: In particular, O_SYNC will cause
re ad () to wait until the file metadata h ave been written out as well.
I NOTE As of kernel ve rsion 2.4, Linu x treats all three flags the sa me, with
~ essentially the meaning ofO_SYNC . Furthermore , Linux defines additional Rags
I that are Linux specific and intend ed for special ized uses . Check the GNU/ Linux
I
ill
open(2) manpage for more information .
4.7 Forcing Data to Disk

Earlier, we described the O_DSYNC, O_RSYNC, and O_ SYNC Hags for ope n ( ) . We
noted that using these Hags co uld slow a program d own since each wri t e () does not
return until all data have been written to physical media.
For a slightly higher risk level, we can have our cake and eat it too. We do this by
opening a fil e without one of the O_ x SYNC Hags and then using one of the foll owing
two system calls at whatever point it's necessary to have the data safely moved to phys-
ical storage:
# i nc lud e <unist d .h>
in t fs ync(int fdl; POSIX FSC

int fda tasync( int fd ) ; POSIX SIO
The f d a ta s y nc () system call is like O_DSYNC: It forces all file data to be written to
the final physical device. The fsyn c ( ) system call is like O_SYNC, forcing not just file
data, but also file metadata, to physical storage. The f s ync () call is more portable; it
has been around in the Unix world for lo nger and is more likely to exist across a broad
range of systems.
You can use these calls with <stdi o . h> file pointers by first calling ffl us h () and
then using f il eno ( ) to obtain the underlying file descrip to r. Here is an fpsync ( )
functi on that can be used to wrap both operations in one call. It returns 0 on success:
114 Chapter 4 • Fil es and File I/O
/ * fpsync -- - sync a stdi o FILE * v ar iable * /
in t fpsync ( FILE *fp )
if ( fp == NULL II ffl us h (fp) EOF II fs ync (fil e n o(fp)) < 0)

return -1;
return 0;
)
Technically, both of these calls are extensions to the base POSIX standard: f sync ( )
in the "File Synchronization" extension (FSC) , and f da tasyn c ( ) in the "Synchronized
Input and O utput" extension. N evertheless, you can use them on a GNU/Linux system
witho ut any problem.
4.8 Setting File Length

Two system calls make it possible to adjust the size of a fi le:
# include <sys/types . h >
in t trunc ate (const char *pa th, off _t length); XSI

int ftru nc a te (int fd, off_t length ) ; POSIX
As should be obvio us from the p arameters, trun c ate () takes a filename argument,
whereas f t runca t e () works on an open file descriptor. (T he xxx () and fxxxx ( )
naming convention for system call pairs that work on a fil ename or fil e descripto r is
common. W e see several examples in this an d subsequent ch apters.) For both, the
length argument is the new size of the file.
This system call originated in 4 .2 BSD Unix, and in early systems could only be used
to sh orten a file 's length, hence the name. (It was created to simplify implementation
of the truncate operation in Fortran.) On modern sys tems, incl uding Lin ux, the name
is a misnomer, since it's possible to extend the length of a file with these calls, not
just shorten a file. (However, POSIX indicates that the ability to extend a file is an
XSI extension.)
For these calls, the file being truncated must have write permission (for t r uncate ()) ,
or have been opened for writing (for ftrunca t e ( )). If the file is being shortened, any
data past the new end of the file are lost. (Thus, you can' t shorten the file, lengthen it
again , and expect to find th e original data.) If the file is extended, as with data written
after an ls eek ( ) , the data between the old end of the file and the new end of fi le read
as zeros.
4.10 Exercises 115
These calls are very different from ' open ( f i l e , ... I O_TRUNC, mode)' . The latter
truncates a file completely, throwing away all its data. These calls simply set the file 's
absolute length to the given value.
These functions are fairly specialized; they' re used only four times in all of the
GNU Coreutils code. We present an example use of ft r uncate () in Section 5.5.3 ,
"Changing Timestamps: utime (), " page 157.
4. 9 Summary
• When a system call fails , it usually returns -1, and the global variable e r r no is set
to a predefined value indicating the problem. The functions pe rr or () and
s t r er r or () can be used for reporting errors.
• Files are manipulated by small integers called file descriptors. File descriptors for
standard input, standard output, and standard error are inherited from a program's
parent process. Others are obtained with open () or creat ( ) . They are closed
with close (), and getdtables i ze () returns the maximum number of allowed
open files . The value of the umask (set with uma sk ( )) affects the permissions
given to new files created with c r eat () or the O_CREAT flag for open () .
• The read () and wri te () system calls read and write data, respectively. Their
interface is simple. In particular, they do no interpretation of the data; files are
linear streams of bytes. The lseek ( ) system call provides random access I/O: the
ability to move around within a file.
• Additional flags for open () provide for synchronous I/O, whereby data make it
all the way to the physical srorage media before wri te () or r ead () return. Data
can also be forced to disk on a controlled basis with fsyn c () or f d a t async ( ) .
• The trunc at e () and ftr un c a te () system calls set the absolute length of a file.
(On older systems, they can only be used to shorten a file; on modern systems
they can also extend a file.)
Exercises
1. Using just ope n ( ), read ( ) , wri te ( ) , and c l ose () , write a simple cop y
program that copies the file named by its first argument to the file named by
its second.
2. Enhance the copy program to accept" -" to mean "standard input" if used
as the first argument and "standard output" as the second. Does 'copy - -'
work correctly?
3. Look at the proc(5) manpage on a GNU/Linux system. In particular the fd
subsection. Do an 'ls -1 /dev/fd' and examine the files in the
/ proc / self / fd directly. If /dev/s tdin and friends had been around in the
early versions ofU nix, how would that have simplified the code for the V7 cat
program? (Many other modern Unix systems have a / dev / f d directory or
filesystem. If you're not using GNU/Linux, see what yo u can discover about
your Unix version.)
4. Even though you don 't understand it yet, try to copy the code segment from
the V7 cat . c that uses the struct stat and the fstat () function into
ch04-cat. c so that it too reports an error for 'cat file » file' .
5. (Easy.) Assuming the existence of strerror ( ), write your own version of
perror ().
6. What is the result of 'ul imi t -n' on your system?
7. Write a simple version of the umask program, named myumask, that takes an
octal mask on the command line. Use strtol () with a base of 8 to convert
the character string command-line argument into an integer value. Change the
umask to the new mask with the umask () system call.
Compile and run myumask, and then examine the value of the umask
with the regular umask command. Explain the results. (Hint: in Bash, enter
,type umask., )
8. Change the simple copy program you wrote earlier to use open () with the
O_SYNC flag. Using the time command, compare the performance of the
original version and the new version on a large file.
9. For ftruncate (), we said that the file must have been opened for writing.
How can a file be open for writing when the file itself doesn't have write
permission?
10. Write a truncate program whose usage is 'truncate filelength'.
In this chapter
• 5.1 Considering Directory Contents page 118
• 5.2 Creating and Removing Directories page 130
• 5.3 Reading Directories page 132

• 5.4 Obtaining Information about Files page 139
• 5.5 Changing Owners hip , Permission , and Mod ifi cation Times page 155
• Exercises page 163
117
T his chapter continues the climb up the learning curve toward the next plateau:
understanding directories and information about files.
In this chapter we explore how file information is stored in a directory, how direc-
tories themselves are read, created, and removed, what information about files is
available, and how to retrieve it. Finally, we explore other ways to update file
metadata, such as the owner, group , permissions, and access and modification times.
5.1 Considering Directory Contents

All Unix systems, including Linux, use the same conceptual design for storing file
information on disk. Although there is considerable variation in the implementation
of the design, the interface at the C level remains consistent, making it possible to write
portable programs that compile and run on many different systems.
5.1.1 Definitions
USER FR I ENDLY by l ll iad
ReMeMBeR THe
DAYS Of' DOS
WITH WNFIG
FUS? e.DIT yeAH? WEU..- 00/,11 WIMP!
WAS SUCH A WUSS. I WROTe MY weLL.. I EDITeD
HOPa£SS IUSCD FUS WITH THe INODES BY
TeXT eDITOR. eDUN. UHO. HANO WITH
MAGNeTS-
\ /
Copyright 1997-2004 © J.D . "llIiad" Frazer

Used with permission . http: // www.userfriendly.org
We start the discussion by defining some terms.
Partition
A unit of physical storage. Physical partitions are typically either part of a disk or
an entire disk. Modern systems make it possible to create logical partitions from
multiple physical ones.
118
5.1 Considering Direcrory Conrenrs 119
Filesystem
A partition (physical or logical) that co ntains file data and metadata, information
abo ut files (as opposed to the file contents, which is information in the files). Such
metadata include file ownership, permissions, size, and so on, as well as information
for use by the operating system in locating file contents .
You place filesystems "in" partitions (a one-to-one correspondence) by wri ting
standard information in them. This is done with a user-level program, such as
mke2 fs on GNU/Linux, or newfs on Unix. (The Unix mkf s co mmand makes
partitions but is difficult to use directly. n ewfs calls it with the correct parameters.
If your system is a Unix sys tem, see the newfs(S) and mkfs(S) manpages for
the details.)
For the most part, GNU/ Linux and Unix hide the existence of filesys tems and
partitio ns . (Further details are given in Section S. l , "Mounting and U n mounting
Filesystems ," page 22S). Everything is accessed by pathnames, wi th out reference
to which disk a file lives on. (Contrast this with almost every other co mmercial
operating system, such as Open VMS, or the default behavio r of any
Microsoft system. )
[node
Short for "index node," ini tially abbreviated "i-node" and now written "inode."
A small block of information describing everythin g about a file except the fi le's
name(s). The number of in odes, and thus the number of unique files per filesystem ,
is set and m ade permanent when the filesystem is created. 'd f - i' can tell yo u
how many in odes you h ave and how many are used.
D evice
In the context of files, filesystems, and file metadata, a unique number representing
an in-use ("mounted") filesystem. The (device, in ode) pair uniquely identifies the
file: Two different files are guaranteed to h ave different (device, inode) pairs. This
is discussed in more detail later in this chapter.
D irectory
A special file , containing a list of (inode number, name) pairs. Directories can be
opened for reading but not for writing; the operating system makes all the changes
to a directory's contents.
120 Chapter 5 • Directories and File Metadara
Conceptually, each disk block contains either some number of inodes, or file data.
The inode, in turn, contains pointers to the blocks that contain the file's data. See
Figure 5.1.
.. Disk blocks, linearly in a partition -------_~
I I I
111 :11111 II
IN NINININ NI
I0 0 10 f 0 1 0 0 I Data Data Data Data
ID DIDID!D DI
iE ElE i ElE Ei
·t~s:m<:m~@"":>.~·" ~::r,:::;::::~;;s::::>.:t'::-.-x~§;;;:;;::::m::::.:."X::::~:::~.;;~:>~::::::e::::;;;:<::::::::::::::: <::::;:::~::~::::::::::'«i;::';;;:::;::::~:~:~~:::::::::::~ :::::~*mX;:~:i:!:;:::';:~::::::i:O:::::;:;":;::~:::::::;;::::ii:(';:;::':::::>';:"<::~;::::::;:::~::::; ::,:::::~~:w~.>,'~
l'----l=-l-------...t..--=-_1~---=-1---=-J______J
FIGURE 5.1
Conceptual view of inode and data blocks
The figure shows all the inode blocks at the front of the partition and the data blocks
after them. Early Unix filesystems were indeed organized this way. However, while all
modern systems still have in odes and data blocks, the organization has changed for
improved efficiency and robustness. The details vary from system to system, and even
within GNU/Linux: systems there are multiple kinds of filesystems, but the concepts
are the same.
5.1.2 Directory Contents

Directories make the connection between a filename and an inode. Directory entries
contain an inode number and a filename. They also contain additional bookkeeping
information that is not of interest to us here. See Figure 5.2.
Early Unix systems had two-byte inode numbers and up to 14-byte filenames. Here
is the entire content of the V7 / usr / include/sys / dir . h :
5.1 Considering Direcro ry Contents 121
I
23 [ . Dot
---->-----------
19 [ . . Dot·dot
----f-----------
-~:-~-~~~~:.----
Filename
o [ tempdata Empty slot

----r-----------
37 [ . p r ofile Filename
----1-----------
I
I
FIGURE 5.2
Conceptual directory contents
#ifndef DIRSIZ
#de fin e DIRSIZ 14
#endi f
struct di rec t
ina - t d _ino ;
char d _name [D IRSI Z] ;
};
An ino_t is defined in the V7 <s ys / types . h> as ' type def unsigned in t
ino_ t;' . Since a PDP-II int is 16 bits, so too is the ino_ t . This organization made
it easy to read directo ries directly; since the size of an entry was fixed, the code was
simple. (The only thing to watch out for was that a full 14-character d_name was not
NUL-terminated.)
Directory content management was also easy for the system. When a file was removed
from a directory, the system replaced the inode number with a binary zero, signifYing
that the "slot" in the directory was unused. New files could then reuse the empty slot.
This helped keep the size of directory files themselves reasonable. (By convention, inode
number 1 is unused; inode number 2 is always the first usable inode. More details are
provided in Section 8.1, "Mounting and Unmounting Filesys tems," page 228.)
Modern systems provide long filenames. Each directory entry is of variable length,
with a common limit of 255 bytes for the filename component of the directory. Later
on, we show how to read a directory's contents on a modern system. Modern systems
also provide 32-b it (or even 64-bit!) inode numbers.
122 Chapter 5 • Directories and File Metadata
5.1.3 Hard Links

When a file is created with open () or crea t ( ) , the system finds an unused in ode
and assigns it to the new file. It creates the directory entry for the file, with the file 's
name and inode number in it. The -i option to 18 shows the inode number:
$ echo hello, world > message Create new file
$ 1s - il message Show inode number too
228786 -rw-r- -r-- 1 arnold devel 13 May 4 15 : 4 3 message
Since directory entries associate filenames with inodes , it is possible for one file to
have multiple names. Each directory entry referring to the same inode is called a link,
or hard link, to the file. Links are created with the ln command. The usage is 'I n
oldfile newfile' :
$ In message msg Create a link
$ cat msg Show contents of new name
hello, world
$ 1s -il msg message Show inode numbers
228786 -rw-r--r -- 2 arno ld devel 1 3 May 4 15:43 message
228786 -rw-r--r -- 2 arnold devel 13 May 4 15:43 msg
The output shows that the inode numbers of the two files are the same, and the third
fiel d in the long output is now 2. This field is the link count, which reports how many
links (directory entries referring to the inode) the file has.
It cannot be emphasized enough: Hard links all refer to the same file. If you change
one, you have changed the others:
$ echo "Hi, how ya doin' ?" > msg Change file by new name
$ cat message Show contents by old name
Hi, how ya doi n' ?
$ 1s -i1 message msg Show info. Size changed
228786 -rw-r --r-- 2 arnold devel 19 May 4 15: 51 message
22 878 6 -rw-r--r-- 2 arn old devel 19 May 4 15:5 1 msg
Although we've created two links to the same file in a single directory, hard links are
not restricted to being in the same directory; they can be in any other directory on the
same filesystem. (This is discussed a bit more in Section 5.1.6, "Symbolic Links,"
page 128.)
Additionally, you can create a link to a file you don 't own as long as you have write
permission in the directory in which you're creating the link. (S uch a file retains all the
attributes of the original file: the owner, permissions, and so on. This is because it is
the original file; it has only acquired an additional name.) User-level code cannot create
a hard link to a directory.
5.1 Considering Directory Contents 123
Once a link is removed, creating a new file by the same name as the original file
creates a new file:
$ rm message Remove old name
$ echo "What's happenin?" > message Reuse the name
$ Is -il msg message Show information
22879 4 -rw-r --r-- 1 arnold devel 17 May 4 15 : 58 message
228786 -rw-r--r-- 1 arnold devel 19 May 4 15 : 51 msg
Notice that the link co unts for both files are now equal (0 l.
At the C level, links are created wi th the 1 i nk () sys tem call :

int linklconst char *oldpath, const char *newpath ) ;
The return value is 0 if the li nk was created su ccessfully, or - 1 oth erwise, in which
case errno reRects the erro r. An im portant failure case is one in which newpa th already
exists. T h e system won' t rem ove it for you, since attempting to do so can cause incon-
sistencies in the filesystem.
5.1.3.1 The GNU link Program

The In program is complicated an d large. H owever, the GNU Core utils contains a
simple link program that just calls link () on its firs t two arguments. T he fo llowing
exam ple shows the code from link. c , with some irrelevant parts deleted. Line n umbers
relate to the actual file.
20 /* Implementation overview :
21
22 Simply call the system 'link' function */
23
... #include statements omitted for brevity .. .
34
35 /* The official name of this program le . g . , no 'g' prefix). */
36 #define PROGRAM_NAME "link "
37
38 #define AUTHORS "Michael Stone "
39
40 / * Name this program was run with . */
41 char *program_name;
42
43 void
44 usage lint status)
45 {
... omitted fo r brevity ...
62 }
63
64 int
65 main (int argc, char **argv)
66
67 program_name = argv[O];
68 setlocale (LC_ALL, "");
69 bindtextdomain (PACKAGE, LOCALEDIR);
70 textdomain (PACKAGE);
71
72 atexit (close_stdout) ;
73
74 parse_long_options (argc, argv, PROGRAM_NAME, GNU_PACKAGE, VERSION,
75 AUTHORS, usage);
76
77 /* The above handles --help and --version .
78 Since there is no other invocation of getopt , handle here . */
79 i f (1 < argc && STREQ (argv [1], "- - " ) )
80 {
81 --argc;
82 ++argv;
83
84
85 if (argc < 3)
86 {
87 error (0, 0, _("too few arguments" )) ;
88 usage (EXIT_FAILURE);
89
90
91 if (3 < argc )
92 {
93 error (0 , 0 , _ ( "to o many arguments") ) ;
94 usage (EXIT_FAILURE);
95
96
97 if (link (argv[l], argv[2]) != 0)
98 error (EXIT_FAILURE, errno, _( "cannot create link %s to %s"),
99 quote_n (0 , argv[2] ) , quote_n (1 , argv[l]));
100
101 exit (EXIT_SUCCESS);
102 )
Lines 67-75 are typical Coreutils boilerplate, setting up internationalization, the

final action upon exit, and parsing the arguments. Lines 79-95 make sure that link is
called with only two arguments. The l ink () system call itself occurs on line 97. (The
quote_ n ( ) function provides quoting of the arguments in a style suitable for the current
locale; the details aren't important here.)
5.1 Co nsidering Direcrory Conrenrs 125
5.1.3.2 Dot and Dot-Dot

Ro unding off the discussion oflinks, let's look at how the '.' and ' . . ' special names
are managed. T hey are really just hard links. In the first case, '. ' is a hard link to the
directory containing it, and ' .. ' is a h ard link to the parent directory. The operating
system creates these links for you; as mentioned earlier, user-level code cannot create a
hard link to a directory. This example illustrates the links:
$ pwd Show current directory
I tmp
$ Is -ldi I tmp Show its inode number
225345 drwxrwx r wt 14 root root 4096 May 4 16 : 15 Itmp
$ mkdir x Crea te a new directory
$ Is -ldi x And show its inode number
52794 drwxr-xr-x 2 arno ld devel 4096 May 4 16 : 27 x
$ Is -ldi xl . xl .. Show. and .. inode numbers
52794 drwxr-xr-x 2 arnold devel 4096 May 4 16 : 27 xl.
225345 drwxrwxrwt 15 root root 4096 May 4 16 : 27 x l ..
The root's parent directory (I . . ) is a special case; we defer discussio n of it until

C hap ter 8, "Filesystems and Directory Walks," page 227.
5.1.4 File Renaming

Given the way in which directory entries map names to inode numbers , renaming
a file is conceptually quite easy:
1. If the new name for the file names an existing fi le, remove the exis ting file first.
2. Create a new link to the file by the new name.
3. Remove the old name (link) for the file. (Removing names is discussed in the
next section.)
Early versions of the mv command did work this way. However, when done this way,
file renaming is not atomic; that is, it d oesn' t happen in one uninterruptible operation.
And, on a heavily loaded system, a m alicious user could take advantage of race
conditions, 1 subverting the rename operation and substituting a different file for the
original one.
1 A race condition is a situation in which details of timing can produce unintended side effects or bugs. In thi s case,
the direcro ry, for a short period of time , is in an in co nsistent state, and it is this inco nsistency that introduces
the vulnerabi li ry.
126 Chapter 5 • Direcrories and File Metadata
For this reason, 4 .2 BSD introduced the rename () system call:

#include <s tdi o . h> ISOC
int rename(c onst char *oldpath, cons t char *newpa t h );
On Linux systems, the renaming operation is atomic; the manpage states:
If newpath already exists it will be atomically replaced ... , so that there is

no point at which another ptocess attempting to access newpa th will find
It mlssillg.
If newpa t h exists but the operatio n fails for some reason , rename guarantees
to leave an instance of newpath in place.
However, when overwriting there will probably be a window in which both
oldpath and newpath refer to the file being renamed.
As with other system calls, a 0 return indicates success, and a return value of -1 indi-
cates an error.
5.1 .5 File Removal

Removing a file means removing the file 's entry in the directory and decrementing
the file 's link count (maintained in the inode) . The contents of the file, and the disk
blocks holding them, are not freed until the link count reaches zero.
The system call is named un l ink ( ) :
# include <unis td . h> POSIX
int unlink ( c o n st char *pathname );
Given our discussion of file links, the name makes sense; this call removes the given
link (directory entry) for the file. It returns 0 on success and - 1 on error.
The ability to remove a file requires write permission only for the directory and not for
thefde itself. This fact ca n be confusing, particularly for new Linux/Unix users . However,
since the operation is one on the directory, this makes sense; it is the directory contents
that are being modified, not the file 's co ntents.2
2 Indeed , the file 's metadata are changed (the number of links), but that does not affect any oth er fil e amibute,
nor does it affect th e fil e's contents. U pdating the link co unt is the only operati on on a file th at doesn't involve
ch ecking (he file 's permissions.
5.1 Considering Directo ry Conrenrs 127
5.1.5.1 Removing Open Files

Since the earliest days of U nix, it has bee n possible to remove open files. Simply call
unl i n k () with the filename after a successful call to open () or creat () .
At first glance, this seems to be a strange thing to do. Since the system frees the data
blocks when a fi le's link co unt goes to zero , is it even possible to u se the open fil e?
The answer is yes, you can continue to use the open file normally. T he system knows
that the file is open, and therefore it delays the release of the file 's storage until the last
file descriptor on the file is closed. Once the file is completely unused, the storage is freed.
This operation also happens to be a useful one: It is an easy way fo r a program to
get temporary file storage that is guaranteed [Q be both private and automatically released
when no longer needed:
/ * Obta ining pri vate tempora ry st orage, er r or checking omitted for brev ity */
int fd;
mode _ t mode = O_CREAT lo_EXCL l o _ TRUNC lo_RDWR ;
fd = open( " /tmp/ myfile", mode , 0000) ; Open the file

u nl i nk(" /tmp/myfile " ) ; Remove it
... continue to use file ...

cl ose (fd) ; Close file, free storage
The downside to this approach is that it's also possible for a run away application to
fill up a filesystem with an open but anonymous file , in which case the system adminis-
trator has to try to find and kill the process. In olden days, a reboo t and fil esys tem
co nsistency check might have been required; thankfully, this is exceedingly rare on
modern systems.
5.1.5.2 Using ISO C: r emove ( )

ISO C provides the remove () fun ctio n for removing files; this is intended to be
a general function, usable on any system that supports ISO C, not just Unix and
GNU/Linux:
#include <stdi o . h> ISOC
int remove (const char *pa thname);
While not technically a system call, the return value is in the same vein: 0 on success
and -1 on error, with er rno reflecting the val ue.
On GNU/Linux, remove () uses the unlink () system call to remove files, and the
rmdir () system call (discussed later in the chapter) to remove directories. (On older
GNU/Linux systems not using GLIBC, remove () is an alias for unlink ( ) ; this fails
on directories . If you have such a system, you sho uld probably upgrade it.)
5.1 .6 Symbolic Links

W e started the chapter with a discussion of partitions, fil esystems, and inodes. We
also saw that directory entries associate names with inode numbers. Because directory
entries contain no other information, hard links are restricted to files within the same
filesystem. This has to be; there is no way to distinguish inode 2341 on one filesystem
from inode 2341 on another fil esystem. Here is what happens when we try:
$ mount Show filesystems in use
/dev/hda2 on / type ext3 ( rw )
/ dev /hda5 on /d type ext3 ( rw )
$ ls -li /tmp/message Earlier example was on filesystem for /

228786 -rw- r- -r-- 2 arno ld devel 19 May 4 15 : 51 /tmp / message
$ cat /tmp/message
Hi, how ya doin' ?
$ /bin/pwd Current directory is on a different {ilesystem
/d/home/ arnold
$ 1n /tmp/message Attempt the link
In: creat ing hard li nk ' . /message' to '/tmp/message' : Inval id cross-device l ink
Large systems often have many partitions, both on physically attached local disks
and on remo tely mounted network filesystems. The hard-link restriction to the same
filesystem is inconvenient, for example, if som e files or directories must be moved to a
new location, but old software uses a hard-coded filename for the old location.
To get around this restriction, 4 .2 BSD introduced symbolic links. A symbolic link
(also referred to as a soft Link) is a special kind of file (just as a directory is a special kind
of file). The contents of the file are the pathname of the file being "pointed to." All
modern Unix systems, including Linux, provide symbolic links; indeed they are now
part of POSIX.
Symbolic links may refer to any file anywhere on the system. They may als o refer to
directories. This makes it easy to move directories from place to place, with a symbolic
link left behind in the original location pointing to the new location.
5.1 Considering Directory Contents 129
When processing a filename, the system notices symbolic links and instead performs
the actio n on the pointed-to fi le or directory. Symbolic links are created with the - s
op tion to In:
$ /bin/pwd Where are we
/ d/home/arnold On a different filesystem
$ In -s /tmp/message ./hello Create a symbolic link
$ cat hello Use it
Hi, how ya doin' ?
$ ls -1 hello Show information about it
lrwxrwx rwx 1 arnold devel 12 May 4 16 :4 1 hello -> / tmp / message
The file pointed to by the link need not exist. The system detects this at runtime
and acts appropriately:
$ rm /tmp/message Remove pointed-to file
$ cat ./hello Attempt to use it by the soft link
cat : . /he llo : No such file or directo ry
$ echo hi again > hello Create new file contents
$ ls -1 /tmp / message Show pointed-to file info ..
-rw-r--r-- 1 arnold devel 9 May 4 16 : 45 /tmp/message
$ cat /tmp/message ... and contents
hi again
Symbolic links are created with the symlink () system call:

int symlink (const char *oldpath, const char *newpath);
The oldpath argument n ames the pointed-to file or directory, and newpath is the
name of the symbolic link to be created. The return value is 0 on success and - 1 on
error; see your symlink(2) man page for the possible errno values.
Symbolic links have their disadvantages:
• They take up extra disk space, requiring a separate inode and data block. H ard
links take up only a directo ry slot.
• They add overhead. The kernel has to work harder to resolve a pathname contain-
ing symbolic links.
• They can introduce "loops. " Consider the following:
$ rm -£ a b Make sure 'a' and 'b' don't exist
$ In -s a b Symlink old file 'a' to new file 'b'
$ In -s b a Symlink old file 'b' to new file 'a'
$ cat a What happens?
cat : a : Too many leve l s o f symbolic links
130 Chapter 5 • Directories and File M etadata
The kernel has to be able to detect this case and produce an error message.
• They are easy to break. If you move the pointed-to file to a different location or
rename it, the symbolic link is no longer valid. This can' t happen with a hard link.
5.2 Creating and Removing Directories

Creating and removing directories is straightforward. The two system calls, not sur-
prisingly, are mkdir () and rmd ir ( ) , respectively:
#include <sys / types.h> POSIX
#include <sys / stat . h>
int mkdir (const char *pathname , mode_ t mode) ;
int rmdir (const char *pathname);
Both return 0 on success and - 1 on error, with errno set appropriately. For mkdi r ( ) ,
the mode argument represents the permissions to be applied to the directory. It is
completely analogous to the mode arguments for c rea t () and open () discussed in
Section 4.6, "Creating Files," page lOG.
Both functions handle the' . ' and ' . . ' in the directory being created or removed. A
directory must be empty before it can be removed; errno is set to ENOTEMPTY if
the directory isn' t empty. (In this case, "empty" means the directory contains only ' . '
and ' .. ' .)
New directories, like all fi les, are assigned a group ID number. Unfortunately, how
this works is complicated. We delay discussion until Section 11.5.1 , "Default Group
for New Files and Directories, " page 412.
Both functions work one directory level at a time. If ! s ome d ir exists and
! s ome d ir ! sub1 d oes not, 'mkd ir ( " ! somedi r ! sub1 ! sub2 " ) , fails. Each component
in a long pathname h as to be created individually (th us the - p option to mkdir,
see mkdir(1 )).
Also , if pathname ends with a ! character, mkdir ( ) and rmdi r ( ) will fail on some
systems and succeed on others . The fo llowing program, ch 05- t r ymkd i r . c , dem on-
strates both aspects.
5.2 Crearing and Removing Direcrories 13 1
1 1* ch05-trymkdir . c -- - Demo nstrat e mkd ir () behavior.

2 Co urtesy of Nel son H. F . Beebe . * 1
3
4 #include <stdio . h >
5 #include <stdl ib.h>
6 #include <errno .h>
7
8 #i f 'defined ( EXIT_SUCCESS)
9 #define EXIT SUCCESS 0
10 #endif
11
12 void do_t est (const char *path)
13 (
14 int retcode;
15
16 errno = 0 ;
17 ret code = mkdir(path , 07 55 ) ;
18 prin tf( "mkdir(\"%s\ " ) returns %d : er rno = %d [%s]\n ",
19 pat h, retcode , er rno, s trerror( e rrn o)) ;
20
21
22 int main (void)
23
24 do_ test ( " Itmp /t1/t2/t3/t 4"); Attempt creation in subdirs
25 do_t est ( " / tmp / t1 /t2/t 3" ) ;
26 do_ test("/t mp / t1 /t2 " ) ;
27 do_test ("/t mp / t1" ) ;
28
29 do_tes t ( " / tmp /u1 " ) ; Make subdirs
30 do_test ( " / tmp/u1 /u2 " ) ;
31 do_test ( " /tmp / u1 /u2/u3 " ) ;
32 do_test ( " / tmp / u1 /u2/u3/u 4" ) ;
33
34 do_test ("/ tmp /v1/ " ) ; How is trailing 'I' handled?
35 do_test ("/ tmp / v1 / v2 / " ) ;
36 do_ te s t ("/ tmp/v1 / v 2 /v3 / " ) ;
37 do_test ( "/ tmp/v1/v2 /v3 /v4 I" ) ;
38
39 retur n ( EXIT_SUCCESS) ;
40
Here are the results under G NU/Linux:

$ chOS - trymkdir
mkdir ( " / tmp / t1 / t2 /t3/ t4") retu rns -1 : errno = 2 [No s uch file or dir ecto ry]
mkdir(" / tmp / t1 / t2 /t3 " ) returns -1 : errno = 2 [No suc h file or directory]
mkdir("/tmp/t1/t2" ) returns -1 : er rno = 2 [No suc h fil e or directory]
mkdir ( " / tmp/ t1" ) retur ns 0 : errno = 0 [Succe ss ]
mkdir("/tmp/u1 " ) returns 0 : errno = 0 [Success]
mkdir(" /tmp/u1 /u2 " ) returns 0 : er rno = 0 [Success]
132 Chapter 5 • Directories and File Meradara
mkdir("/tmp / ul/u2/u3 " ) re turns 0: errno = 0 [Succes s]

mkd ir("/ tmp / ul /u2/ u3 / u4" ) returns 0 : errno = 0 [Success ]
mkdir ( " / tmp /vl/ " ) returns 0 : errn o = 0 [Success]
mkd ir ( " / tmp / vl / v 2 / " ) returns 0: errno = 0 [Suc cess ]
mkdir( "/tmp/vl / v 2 /v3/ " ) returns 0 : errno = 0 [Success]
mkd ir( " /tmp / vl /v2/v3/v4/ " ) returns 0 : errno = 0 [Suc cess]
Note h ow GNU/Linux accepts a trailing slash. Not all systems do.
5.3 Reading Directories

On the original Unix systems, reading directory contents was easy. A program opened
the directory with open () and read binary struc t dire ct structures directly, 16
bytes at a time. The following fragment of code is from the V7 rmdir program,3 lines
60-74. It shows the check for the directory being empty.
60 if((fd = open(name,O )) < 0)
61 fpr intf ( stderr, "rmdir : %s unr eadable \ n", name ) ;
62 ++E rr ors;
63 return;
64
65 whi le(r ead (fd , (char * )&dir, sizeof dir) == sizeof dir) (
66 if(d ir . d_ino == 0) c ontinue;
67 if ( ! s trcmp (dir .d_name, ".") II !strcmp (di r.d_name, " .. ") )
68 con tinue;
69 fpri ntf (stderr , "rmdir: %s not empty \n", name ) ;
70 ++Error s ;
71 close (fd ) ;
72 retu rn;
73
74 close ( fd ) ;
Line 60 opens the directory for reading (a second argument of 0 , equal to O_RDONLY).
Line 65 reads the struct dire c t. Line 66 is the check for an empty directory slot;
that is , one with an inode number of o. Lines 67 and 68 check for ' .' and' .. '. Upon
reaching line 69, we know that some other fil ename has been seen and, therefore, that
the directory isn't empty.
(The test'! strcmp (51, s2)' is a shorter way of saying 'strcmp (51, s2) == 0' ;
that is, testing that the strings are equal. For what it's worth , we consider the
, ! 5 trcmp (sl, s2)' form to be poor style. As Henry Spencer once said, "s t r cmp ( )
is not a boolean!")
3 See / usr / src/ cmd/ rmdir . c in the V7 distribu t io n.

5.3 Reading Direcrories 133
When 4.2 BSD introduced a new filesys tem form at that allowed longer fil enames
and provided better performance, it also introduced seve ral new functions ro provide
a directory-reading abstraction. This suite of functions is usable no matter what the
underlying filesystem and directory organization are. The basic partS of it are what is
standardized by POSIX, an d programs using it are portable across GNU / Linux and
U nix systems.
5.3.1 Basic Directory Reading

Directory entries are rep resented by a struct diren t (not the same as the V7
struct direct !):
st ruct dirent {
ino_t d _i no; /* XSI extension - -- see text * /

cha r d _ name[ ... J ; / * See text on the size of thi s arr ay */
};
For portability, POSIX specifies only the d_name field, which is a zero-terminated
array of bytes representing the fil ename part of the directory entry. The size of d_name
is not specified by the standard, other than to say that there may be at m os t NAME_ MAX
bytes before the terminating zero. (NAME_ MAX is defined in <limits . h>.) T he XSI ex-
tension to PO SIX provides for the d_ ino inode number field .
In practice, since filenames can be of variable length and is usually fairly
NAME_MAX
large (like 255), the st ruct dirent contains additional members that aid in the
bookkeeping of variable-length directory entries on disk. These additional members
are not relevant for everyday code.
The following functions provide the directory-reading interface:
#include <sys /types .h> POSIX
#include <d iren t . h>
DIR *opendir(c onst cha r *name) ; Open a directory for reading

struc t dirent *readdir (DIR *dir ) ; Return one struct diren t at a time
int closedir(DIR *dir ) ; Close an open directory
void r ewinddir (DIR *dirp ); Return to the front ofa directory
The DIR type is analogous to the FILE type in <s td i o . h> . It is an opaque type,
meaning that application code is not supposed to know what's inside it; its contents
are for use by the other directory routines. If opendir () returns NULL , the named di-
rectory could no t be opened fo r reading and errno is set to indicate the error.
134 Chapter 5 • Directories and File Me(ada(a
Once you have an open DI R * variable, it can be used to retrieve a pointer ro a

s t ru ct di rent representing the next directory entry. re addi r () returns NULL upon
end-of-file or error.
Finally, closedir () is analogous to the f close ( ) function in <stdio . h> ; it closes
the open DI R * variable. The rewinddi r () function can be used ro start over at the
beginning of a directory.
With these routines in hand (or at least in the C library) , we can write a simple
catdi r program that "cats" the contents of a direcrory. Such a program is presented
in ch0 5- catdi r. c :
1 /* chOS-catdir . c Demonstrate opendir(), readdir(), closedir(). * /
2
3 #include <stdio.h> /* for printf () etc . * /
4 #include <errno.h> /* for errno * /
5 #include <sys/types.h> /* for system types * /
6 #include <dirent.h> /* for directory functions * /
7
8 char *myname;
9 int process(char *dir );
10
11 /* main --- loop over dir ectory arguments * /
12
13 int main(int argc, char **argv )
14
15 int i;
16 int errs = 0 ;
17
18 myname = argv [0) ;
19
20 if (argc == 1)
21 errs process("."); / * default to current directory * /
22 else
23 for ( i 1; i < argc; i++ )
24 errs += process (a rgv[i) ) ;
25
26 return (errs != 0);
27
This program is quite similar to ch04 - cat . c (see Section 4.2, "Presenting a Basic
Program Structure," page 84); the main () function is almost identical . The primary
difference is that it defaults to using the current directory if there are no arguments
(lines 20-21) .
5.3 Reading Direcrories 135
29 1*
30 * process --- do something wit h the di rect ory, in th is case,
31 print inode/name pairs on standard output .
32 Retur ns 0 if all o k, 1 o therwis e .
33 *I
34
35 int
36 process(char *dir )
37
38 OIR *dp;
39 struct dirent *ent ;
40
41 if ((dp = op endir (di r )) = = NULL) (
42 fprintr( s tderr, "% s : %s : cannot open for read ing : %s\n",
43 myname, di r, s crer ror(e rrno )) ;
44 return 1 ;
45
46
47 err no = 0 ;
48 whil e (( ent = readdir (dp)) ' = NULL )
49 printf ("%8Id %s \ n", enc->d_ino, enc->d_ name ) ;
50
51 if ( errno ! = 0) {
52 fprintf (stde rr , "%s : %s : rea ding directory ent ries : %s\n",
53 myname , dir, stre rror(er rno)) ;
54 return 1 ;
55
56
57 if (closedir (dp) ' = 0) (
58 fprintf (stde rr , "%S : %S : closedir: %s\n ",
59 myname , dir, st rerror(errno)) ;
60 return 1;
61
62
63 ret urn 0 ;
64
T he p r oc ess () functio n does all the work, and the m ajority of it is error-checking
code. The h eart of the function is lines 4 8 and 49 :
while (( ent = readdir (dp )) ! = NULL)
print f ( " %8Id %s\n ", en t->d_i no , ent ->d_ name ) ;
T his loo p reads directory entries, one at a time, until readdir () returns NUL L . The
loo p body prints the inode num ber and filename of each entry. H ere's wh at happens
when the p rogram is run :
$ ch05 - catdir Default to current directory

639063
639062
639064 p r oposal . txt
639012 lightsabers . u r l
688470 code
638976 progex. texi
639305 texinf o . tex
639007 15 - process es . texi
639011 OO - preface . tex i
639020 18 - tty . texi
638980 Makefil e
639239 19 -i 18n . texi
The output is not sorted in any way; it represents the linear contents of the directory.
(We describe how to sort the directory contents in Section 6.2, "Sorting and Searching
Functions," page 181 .)
5.3 . 1.1 Portability Considerations

There are several portability considerations. First, you should not assume that the
first two entries returned by readdir () will always be '.' and ' .. '. Many filesystems
use directory organizations that are different from that of the original Unix design , and
, . ' and ' .. ' could be in the middle of the directory or possibly not even present. 4
Second, the POSIX stanJard is silent about possible values for d_ino. It does say
that the returned structures represent directory entries for files; this implies that empty
slots are not returned by readdir ( ), and thus the GNU/Linux readdir () implemen-
tation doesn 't bother returning entries when 'd_ino == 0 '; it contin ues to the next
valid directory entry.
So, on GNU/Linux and U nix systems at least, it is unlikely that d_ino will ever be
zero . However, it is best to avoid using this field entirely if you can.
Finally, some systems use d_filen o instead of d_ ino inside the struct dir e nt .
Be aware of this if you have to port directory-reading code to such systems.
4 GNU/Linux systems are capable of mountin g filesystems from many non-U nix operating systems. Many com-
mercial Unix systems can also mount MS- DOS filesystems. Assumptions about Unix filesystem s don't apply in
such cases.
5.3 Reading Direc(Ories 137
Indirect System Calls
"Don't try this at home, kid s!"

-Mr. Wizard-
Many system calls, such as open ( ) , r ead ( ) , and wr i t e ( ) , are meant to be call ed
directly from user-level application code: in other words, from code that you, as a
GNU/Linux developer, would write.
However, other system calls exis t only (0 make it poss ible to implement higher-level,
standard library function s and should not be called directly. The GNU/Linux
ge t d e nt s () sys tem call is one such; it reads multi ple directory entries into a buffer
provided by the caller- in this case, the code that implements r e add ir ( ). The
re addir () code then returns valid directory entries from the buffer, one at a time,
refilling the buffer as needed.
These for-library- use-o nly system calls can be distinguished fro m for-user-use system
calls by their appearance in the man page. For examp le, from getdents (2):
NAME
get dent s - get d ir e c tory entr ie s
SYNOPSIS
#include <unistd . h >
#include <linux / type s . h>
#inc l ude <linux /dirent . h>
#include <linux/ un i std . h >
_sy s c al 1 3 ( i nt , g etdents, uint, f d , s truct d irent * , di rp, u in t , count ) ;
int getde nts (unsigne d int fd, s t r uct dirent *di r p, unsigne d int count);
Any system call that uses a _syscal lX () macro should not be called by application
code. (More information on these calls can be found in the intro(2) manpage; yo u should
read that manpage if yo u haven 't already.)
In the case of get d en ts ( ), many other Unix systems have a similar system call;
som etimes with the same name, sometimes wi th a different name. Thus, trying to use
these calls would only lead to a massive portability mess anyway; yo u're much better off
in all cases using readd ir ( ) , whose interface is well defined , standard, and portable.
5.3.1.2 Linux and BSD Directory Entries

Although we just said that you should only use the d_i no and d_name members of
the s truct dirent, it's worth knowing about the d_typ e member in the BSD and
Linux s tru c t d irent . This is an uns igned char value that stores the rype of the
file named by the directo ry entry:
st ruct d irent {
ino_t d_ino; / * As before * /

char d_name[ ... J ; /* As befor e */
unsigned cha r d_ type; /* Linux and mo dern BSD * /
} ;
d_type can have any of the values described in Table 5.1.
TABLE S.1
Values for d_type
Name Meaning
DT_BLK Block device fil e.

DT_CHR C haracter device file.
DT_DIR Directory.
DT_FIFO FIFO or named pipe.
DT_LNK Symbolic link.
DT_REG Regular file.
DT SOCK Socket.
DT_UNKNOWN Unknown file type.
Whiteout entry (BSD systems only).
Knowing the file's type just by reading the directory entry is very handy; it can save
a possibly expensive s ta t () system call. (The s ta t () call is described shortly, in Sec-
tion 5.4.2, "Retrieving File Information," page 141.)
5.3.2 BSD Directory Positioning Functions

Occasionally, it's useful to mark the current position in a directory in order to be
able to return to it later. For example, you might be writing code that traverses a direc-
tory tree and wish to recursively enter each subdirectory as yo u come across it. (How
to distinguish files from directories is discussed in the next section. ) For this reason ,
the original BSD interface included two additional routines:

#include <dirent.h> XSI
/ * Caveat Emptor : POSIX XSI uses long, not of f_t, for both function s * /
of f_t telld ir (DIR *dir ) ; Return current position
voi d seekdir(DIR *dir, off_t off set); Move to given position
5.4 Ob[aining Informa[ion abou[ Files 139
These routines are similar to the ft ell () and fs eek () functi ons in <stdi o . h> .
T hey return the current positio n in a directory and set the current position to a previ-
ously retrieved value, respectively.
These routines are included in the XSI part of the POSIX standard, since they make
sense only for directories that are implemented with linear storage of directo ry entries.
Besides the ass umptio ns made about the underlying directory structure, these routines
are riskier to use than the simple directory-reading routines. This is because the contents
of a directory might be changing dynami cally: As files are added to or removed from a
directory, the operating syste m adjusts the contents of the directory. Since directory
entries are of variable length, it may be that the absolute offset saved at an earlier ti me
no longer represe nts the start of a directory entry! Th us, we don 't reco mmend that yo u
use these functions unless you have to.
5.4 Obtaining Information about Files

Reading a directory to retrieve filenames is only half the battle. Once you have a
filename, yo u n eed to know h ow to retrieve the o ther information associated with a
file , such as the file 's type, its permissio ns, owner, and so on.
5.4.1 Linux File Types

Linux (and Unix) supports the following differe nt kinds of file types:
Regular files
As the name implies; used for data, executable programs, and anything else you
might like. In an '1 s - 1' listing, they show up with a ' - ' in the first character of
the permissions (mode) field .
Directories
Special files for associating file names with inodes. In an ' I s -1' listing, they show
up with a d in the first character of the permissions field.
Symbolic links
As described earlier in the chapter. In an ' Is -1' listing, they show up with an 1
(letter "ell," not digit 1) in the first character of the permissions fiel d.
140 Chapter 5 • Direc[Qries and File Metadata
Devices
Files representing both physical hardware devices and software pseudo-devices.
There are two kinds:
Block devices
Devices on which I/O happens in chunks of some fixed physical record size,
such as disk drives and tape drives . Access to such devices goes through the
kernel's buffer cache. In an '18 -1 ' listing, they show up with a b in the first
character of the permissions field.
Character devices
Also known as raw devices. Originally, character devices were those on which
I/O happened a few bytes at a time, such as terminals. However, the character
device is also used for direct I/O to block devices such as tapes and disks,
bypassing the buffer cache. 5 In an ' 1 8 -1' listing, they show up with a c in
the first character of the permissions field .
Namedpipes
Also known as FIFOs ("first-in first-out") files. These special files act like pipes;
data written into them by one program can be read by another; no data go to or
from the disk. FIFOs are created with the mkfifo command; they are discussed
in Section 9.3.2, "FIFOs," page 319. In an '1 8 -1 ' listing, they show up with a
p in the first character of the permissions field.
Sockets
Similar in purpose to named pipes,6 they are managed with the socket interprocess
communication (IPe) system calls and are not otherwise dealt with in this book.
In an ' 18 -1 ' listing, they show up with an 8 in the first character of the permis-
sions field .
5 Linux uses the block device for disks exclusively. Other systems use both.
6 Named pipes and sockets were developed independendy by the System V and BSD U nix groups, respectively.
As U n ix systems reco nve rged, both kinds of files became uni versally available.
5. 4 Obtaining Information about Files 141
5.4.2 Retrieving File Information

Three system calls return information about files:
#include <sys/types . h> POSIX
#include <sys/stat . h>
int stat(const char *file_name, struct stat *buf ) ;

int fstat(int filedes, struct stat *buf) ;
int lstat(const char *fi le_name , struct stat *buf);
The s t a t () function accepts a pathname and returns information about the given
file. It follows symbolic links; that is, when applied to a symbolic link, s ta t () returns
information about the pointed-to file , not about the link itself. For those times when
you want to know if a file is a symbolic link, use the lstat () function instead; it does
not follow symbolic li nks.
The f s ta t () function retrieves information about an already open file. It is partic-
ularly useful for file descriptors 0, 1, and 2, (standard input, output, and error) which
are already open when a process starts up. However, it can be applied to any open file.
(An open file descriptor will never relate to a symbolic link; make sure you under-
stand why.)
The value passed 10 as the second parameter should be the address of a struct
stat, declared in < s y s /stat . h> . As with the struct dirent , the st ru ct stat
contains at least the following members:
struct stat {
dey- t st_dev; /* device */

ino - t st _ino; /* inode * /
mode - t st_mode; /* type and protection */
nlink - t st_nlink; /* number o f hard links */
uid- t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_ t st_rdev; /* device type (block or character device) */
off - t st _size i /* total size, in bytes */
blks ize - t st_blksize; /* blocksize for filesystem I/O */
blkcnt - t st_blocks; /* number of blocks allocated */
time - t st_atime; /* time of last access */
time - t st_mtime; /* time of last modification */
time - t st_ctime; /* time of last inode change */
};
(The layout may be different on different architectures.) This structure uses a number
of typede f' d types . Although they are all (typically) integer types , the use of specially
defined types allows them to have different sizes on different systems. This keeps user-
level code that uses them portable. Here is a fuller description of each field .
st_dev
The device for a mounted filesystem. Each mounted filesystem has a unique value
for st_dev.
st ino
The file 's inode number within the filesystem. The (st_dev, st_ino) paIr
uniquely identifies the file .
st_rnode
The file's type and its permissions encoded together in one field. We will shortly
see how tu extract this information.
st_n l i nk
The number of hard links to the file (the link count). This can be zero if the file
was unlinked after being opened.
st_uid
The file 's UID (owner number).
st_ gid
The file's GID (group number).
st rdev
The device type if the file is a block or character device. s t _ rdev encodes infor-
mation about the device. We will shortly see how to extract this information. This
field has no meaning if the fil e is not a block or character devi ce.
st siz e
The logical size of the file. As mentioned in Section 4.5 , "Random Access: Moving
Around within a File," page 102, a file may have holes in it, in which case the size
may not reflect the true amount of storage space that it occupies.
s t _ bl ks i ze
The "block size" of the file. This represents the preferred size of a data block for
I/O to or from the file. This is almost always larger than a physical disk sector.
Older Unix systems don 't have this field (or st_blocks ) in the stru ct stat .
For the Linux ext2 and ext 3 filesystems , this value is 4096.
5.4 Obtaining In formatio n about Files 143
st_b locks
The number of "blocks " used by the file. On Linux, this is in units of 512-byte
blocks. On other systems, the size of a block m ay be different; check your local
stat(2) manpage. (This number comes from the DEV_B S IZE constant in
<sys / param . h>. This co nstant isn' t standardized, but it is fairly widely used on
U nix systems.)
The number of blocks may be m ore than 's t_s ize / 51 2 '; besides the data
blocks, a filesystem m ay use additional blocks to store the locations of the data
blocks. This is particularly necessary for large fi les.
st a time
The file's access time; that is, the last time the file's data were read .
st_mt i me
T he file 's modification time; that is , the last time the file 's data were written or
truncated.
st ctime
The file's inode change time. This indicates the last time when the file's metadata
changed, such as the permissions or the owner.
i
~ NOTE The st_c time field is not the file's "creation tim e" ! There is no such
. th ing in a Linux or Unix system . Some early documentati o n referred to the
st_c time field as the creation time. This was a m isguid ed effort to s imp lify
I th e presentation of the file m etadata.
The t ime_t type used for the st_atime, s t _mtime , and st_ c time fields represents
dates and times. These time-related values are sometimes termed timestamps. Discussion
of how to use a time_t value is delayed until Section 6. 1, "Times and Dates," page 166.
Similarly, the uid_t and gid_t types represent user and group ID numbers, which
are discussed in Section 6.3, "User and Group Names," page 195. Most of the other
types are n ot of general interest.
5.4.3 Linux Only: Specifying Higher-Precision File Times

T he 2.6 and later Linux kernel supplies three additional fields in th e struct sta t .
These provide nanosecond resolution on the file times:
The nanoseconds component of the file 's access time.

The nanoseconds component of the file 's modification time.
The nanoseconds component of the file's inode change time.
Some other systems also provide such high-resolution time fields, but the member
names for the s truct stat are not standardized, making it difficult to write portable
code that uses these times. (See Section 14.3.2, "Microsecond File Times: utimes () ,"
page 545, for a related advanced system call.)
5.4.4 Determining File Type

Recall that the s t_mode field encodes both the file's type and its permISSIOns.
< sys / s ta t . h > defines a n umber of macros that determine the file's type. In particular,
these macros return true or false when applied to the st_mode field. The macros corre-
spond to each of the file types described earlier. Assume that the following code has
been executed:
struct stat stbuf ;
char filenarne[PATH_ MAX }; / * PATH_MAX is fr om <li mits.h> * /
... fill in filenam e with a file name ..

i f (sta t(filenarne, & stbuf ) < 0) (
/ * handle error * /
)
Once stbuf has been filled in by the system, the following macros caI1 be called,
being passed stbuf . st_mode as the argument:
S_ ISREG{stbuf.st_mode)
Returns true if filen ame is a regular file.
S_I SDIR {stbuf .st_mode )
Returns true if fi lename is a directory.
S_ISCHR{ stbuf.st_mode)
Returns true if filen a me is a character device. Devices are shortly discussed in
more detail.
S_ISBLK{stbuf. st_mode )
Returns true if f i 1 ename is a block device.
S_ISF I FO{ stbu f. st_mo de )
Returns true if filen ame is a FIFO.
5.4 Ob(aining Informacio n abou( Files 145
S_ISLNK(stbuf . st_mode)
Returns true if fi lename is a symbolic link. (This can never return true if stat ( )
or fst at () were used instead of lstat ( ) .)
S_IS SOCK(stbuf .st_mode )
Returns true if f i 1 ename is a socket.
:r~
'~':~
i
'ru
I
NOTE It happens that on GNU/ Linux, these macros return 1 for true and 0
for false . However, on other system s, it's possible that they return an arbitrary
nonzero value for true, instead of 1. ( POSIX specifies only non ze ro vs. zero .)
\~ Thus, you shoul d always use these macros as standalone tests instead of testing
:~ the return va lue:
.ill if (S_ISREG(stbuf . st_mode)) . .. Correct
fg
~
~~ if (S_ IS REG (stbu f .s t_mode ) == 1) Incorrect
11
Along with the m acros, <sys / stat . h> provides two sets of bit masks. One set is for
testing permission, and the other set is for testing the type of a file. We saw the permis-
sion masks in Section 4 .6, "Creating Files," page 106, when we discussed the mode_ t
type and values for open () and creat ( ) . The bitmasks, their values for GNU/ Linux,
and their meanings are described in Table 5.2.
Several of these masks serve to isolate the different sets of bits encoded in the
s t_mode field:
• S_ IFMT represents bits 12- 15, which are where the different types of files are
encoded.
• S_IRWXU represents bits 6-8 , which are the user's permission (read, write, execute
for User) .
• S_IRWXG represents bits 3-5, which are the group's permission (read, write, execute
for Group).
• S_ IRWXO represents bits 0-2 , whi ch are the "other" permission (read, write, execute
for Other).
The permission and file type bits are depicted graphically in Figure 5.3.
146 Chapter 5 • Directories and File Meradara
TABLE 5.2
POSIX fi le-type and perm issi on bitmasks in <sys / stat .h>
Mask Value Meaning

S IFMT 0170000 Bitmask fo r the file type bitfields.
S IFSOCK 0140000 Socket.
S IFLNK 0120000 Sym bolic link.
S IFREG 0100000 Regular file.
S IFBLK 0060000 Block device.
S IFDIR 0040000 Directory.
S IFCHR 00200 00 C haracter device.
S IFI FO 0010000 FIFO.
S I SUI D 0004000 Setuid bit.
S ISGID 0002000 Setgid b it.
S ISVTX 0001000 Sti cky bit.
S I RWXU 0000 700 Mask for owner perm issio ns.
S IRUSR 0 000 4 00 Owner read permission.
S IWUSR 000 0200 Owner w rite permission.
S IXUSR 000 0100 Owner execute permission.
S IRWXG 00 00070 Mask for group perm issions.
S IRGRP 00000 4 0 Group read permission.
S IWGRP 0000020 Group write permission .
S IXGRP 00000 10 Group execute permission.
S I RWXO 0000007 Mask for permissions for others.
S IROTH 0000 004 Other read permission.
S IWOTH 0000002 O ther write permission.
S- IXOTH 0000001 Other execute permission.
5.4 Obraining Informarion about Files 147
T he file- rype masks are standardized primarily for compatibiliry with older code;
they should not be used directly, because such code is less readable than the corres pond-
ing macros . It h appens that the macros are implemented, logically enough, with the
masks, but that's irrelevant for user-level code.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 o
Fil e type Group r /w/x Other r / w/x
FIGURE 5.3
Permission and file-type bits
The POSIX standard explicitly states that no new bitmasks will be standardized in
the futute and that tests for any addi tional kinds of file rypes that may be added will
be available only as S_I Sxxx () macros.
5.4.4.1 Device Information

Because it is meant to apply to non-Unix sys tems as well as Unix sys tems, the POSIX
standard doesn' t define the meaning for the dev_t rype. H owever, it's worthwh ile to
know what's in a dev_t .
When S_ISBLK (sbuf. st_ffiode) or S_I SCHR(s buf. st_ffiode) is true, then the
device information is found in the s buf . st_rdev field. Otherwise, this field does not
contain any useful information.
Traditionally, Unix device files encode a major device number and a minor device
number within the dev _ t val ue. The major number distinguishes the device type, such
as "disk drive" or "tape drive." Major numbers also distinguish among different rypes
of devices, such as SCSI disk vs. IDE disk. The minor number distinguishes the unit
of that rype, for example, the first disk or the second one. You can see these val ues with
'ls -1 ':
148 Chapter 5 • Direcrories and File Meradara
$ 1s -1 /dev/hda /dev/hda? Show numbers for first hard disk

brw-rw---- 1 root disk 3, 0 Aug 31 2002 /dev/ hda
brw-rw---- 1 root disk 3, 1 Aug 31 2002 /dev/hda1
brw-rw---- 1 r oot disk 3, 5 Aug 31 2 002 /dev/ hdaS
brw-rw--- - 1 root disk 3, 6 Aug 31 2002 /dev/ hda6
brw-rw---- 1 root disk 3, 9 Aug 31 2 002 /dev/hda9
$ 1s -1 /dev/nu11 Show info for / dev/ null, too

crw-rw-rw- 1 root root 1, 3 Aug 31 2002 /dev/ null
Instead of the file size, l s displays the major and minor numbers. In the case of the
hard disk, / dev / hda represents the whole drive. /dev / hdal, / dev / hda 2, and so on,
represent partitions within the drive. They all share the same major device number (3),
but have different minor device numbers.
Note that the disk devices are block devices, whereas / dev / nu11 is a character device.
Block devices and character devices are separate entities; even if a character device and
a block device share the same major device number, they are not necessarily related.
The major and minor device numbers can be extracted from a d ev_ t value with the
ma j or () and mi nor () functions defined in <sys / sysmacros . h >:
#include <s ys / type s .h> Common
#i nclude <sys/sysmacros . h>
int major (dev_t dev); Major device number

int minor(dev_t dev); Minor device number
dev_t makedev(int major, int minor) ; Create a dev_t value
(Some systems implement them as macros.)
The makedev ( ) function goes the other way; it takes separate major and minor values
and encodes them into a dev_t value. Its use is otherwise beyond the scope of this
book; the morbidly curious should see mknod(2).
The following program, ch0 5-dev num . c , shows how to use the stat () system call,
the file-type test macros, and finally, the maj o r () and mi no r () macros.
1 * ch05-devnum . c -- - Demonstrat e stat() , ma jor(), lninor(). */
#inc l ude <stdio.h>

#include <errno .h>
#include <sys / types . h>
#include <sys/ stat . h>
#include <sys/sysmacros . h>
5.4 Obtaining In for mation about Files 149
int main (int argc, char **argv)
struct stat sbuf;

char *devtyp e;
if (argc I = 2)
fprintf( stderr, "usage : %s path\n", argv[O]);
exi t (1) ;
if (stat(argv[l], & sbuf) < 0)

fprintf( stderr, " %s : stat : %s\n ", argv[l], stre rror( errno)) ;
exi t ( 1) ;
if (S_ISCHR ( sbuf.st_mode))
devtype = "char";
else if (S_ISBLK(sbuf . st _mode))
devtype = "block";
el se {
fprintf (stderr, "%s is not a block or character devic e \n", argv[l]) ;
e xi t (1) ;
print f ( " %s: major : %d, minor : %d\ n", devtype,

major (s buf .st_rdev ) , mino r ( sbuf.st_rdev )) ;
e xit(O) ;
H ere is what happens when the program is run:

$ chOS-devnum /tmp Try a nondevice
/ tmp is not a block or character device
$ chOS - devnum /dev/null Character device
char: majo r: 1, minor : 3
$ chOS-devnum /dev/hda2 Block device
block: majo r : 3 , minor : 2
Fortunately, the outp ut agrees with that of ls, giving us confidence 7 that we h ave
indeed wri tten correct code.
Reproducing the o utput oEl s is all fine and good, but is it really useful? T he answer
is yes. Any application that works with file hierarchies must be able to distinguish among
all the different types of files. Consider an archiver such as tar or cpio . It would be
disastrous if such a program treated a disk device file as a regular file, attemp ting to
read it and sto re its contents in an archive! Or consider fin d, which can perform
7 The technical term is a warm jiIZZY.

150 Chapter 5 • Directories and File Metadaca
arbitrary actions b ased on the eype and other attributes of files it encounters. (fi nd is
a complicated program; see find(l ) if you're not familiar with it.) Or even something
as simple as a disk space accounting package has to distinguish regular files from
everything else.
5.4.4.2 The V7 cat Revisited

In Section 4.4.4, "Example: Unix cat, " page 99, we promised to return to the V7
ca t program to review its use of the s ta t () system call. The first group of lines that
used it were these:
31 fstat( fil eno(stdout), &statb);
32 statb . st_IDode &= S_ IFMT;
34 dev st a tb.s t_dev ;

35 ino = stat b .s t_ino;
36 }
This code should now make sense. Line 31 calls f s ta t () on the standard output
to fill in the s ta t b structure. Line 32 throws away all the information in
s tatb . s t_mode except the file eype, by ANDing the mode with the S_IFMT mask.
Line 33 checks that the file being used for standard output is not a device file . In that
case, the program saves th e device and inode numbers in dev and ino . These values
are then checked for each input file in lines 50- 56:
50 fstat(f il eno(fi), &statb ) ;
51 if (s tatb.st_dev ==dev && statb .s t _ino==ino)
52 fp ri ntf ( stderr, " ca t : input %s is output\n",
53 ff lg?"-": *argv);
54 fclo s e(fi);
55 co nt inu e ;
56
If an input file 's s t_dev and s t_ino values match those of the output file , then c a t
complains and continues to the next file named on the command line.
The check is done unconditionally, even though dev and ino are set only if
the output is not a device file. This works out OK, because of how those variables
are declared:
17 int dev, ino -1;
5.4 Obtaining Informarion about Files 151
Since i no is initialized to - 1, no valid inode number will ever be eq ual to ir. 8 That
dev is not so initialized is sloppy, but not a problem , since the test on line 51 requires
th at both the device and inode be equal. (A good compiler will complain that dev is
llsed without being initialized: 'gee - Wa 11' does.)
No te also that neither call to fsta t () is checked for errors. This too is sloppy, al-
though less so; it is unlikely that fsta t () wi ll fail on a valid file descriptor.
The test for input file equals output file is done only for nondevice files. This makes
it possible to use eat to copy input from device files to themselves, such as
with terminals:
$ tty Print current terminal device name
/ dev/ p ts!3
$ cat /dev/pts/3 > /dev/pts/3 Copy keyboard input to screen
this is a line of text Type in a line
this i s a line o f text cat repeats it
5.4.5 Working with Symbolic Links

In general, symbolic links act like hard links; file operations such as open () and
sta t () apply to the pointed-to file instead of to the symbolic link itself. However,
there are times when it really is necessary to work wi th the symbolic link instead of
with the file the link points to.
For this reason, the lstat () sys tem call exists. It behaves exactly like s tat () , but
if the file being checked happens to be a symbolic link, then the information returned
applies to the symbolic link, and not to the p ointed-to file. Specifically:
• S_ISLNK(sbu f .st_ffiode) will be true.

• sbu f . st_ siz e is the number of byres used by the name of the pointed-to file.
We already saw that the syml ink () system call creates a symbolic link. But given
an existing symbolic link, how can we retrieve the name of the file it points to? (Is
obviously can, so we ought to be able to also.)
Opening the link with open () in order to read it with read () won't work; open ()
fo llows the link to the pointed-to file. Symbolic links thus necessitate an additional
system call, named re adlink ( ) :
8 This s(a(ement was (rue for V7 ; (h ere are no such gu arantees on modern sys(ems.
#i nclude <unistd.h> POSIX
in t readl ink ( cons t char *path, char *buf, size_ t bufsiz ) ;
readl ink () places the contents of the symbolic link named by pa th into the buffer
pointed to by buf . No more than bufsiz characters are copied. The return value is
the number of characters placed in buf or - 1 if an error occurred . readlink () does
not supply the trailing zero byte.
Note that if the buffer passed in to readlink () is too small, you will lose informa-
tion; the full name of the pointed-to file won't be available. To properly use
r eadlink () , your code should do the followin g:
1. Use l s tat () to verify that you have a symbolic link.

2. Make sure that your buffer to hold the link contents is at least ' sbu f. st_size
+ l ' bytes big; the' + l' is for the trailing zero byte to turn the butTer into a
usable C string.
3. Call readl ink ( ) . It doesn't hurt to verify that the returned value is th e same
as sbuf. st siz e.
4. Assign ' \ 0 ' to the byte after the contents of the link, to make it into a C stri ng.
Code to do all that would look something like this:

/ * Err o r check ing omit ted for brevity */
int count;
cha r linkf il e [ PATH_MAX J. real fi le[PATH_MAXl; / * PATH_MAX is in <limits . h> * /
strut stat sbuf ;
.. . fill in linkfile with path to symbolic link of interest ..

lst at (linkfile , & sbuf); Get stat information
if ( ! S_ISLNK(sbu f. st_mode) ) Check that it's a symlink
/ * not a symbolic link, h and le it * /
if ( sbuf . st_size + 1 > PATH_MAX ) Check buffer size
/* handle buf fer size problems * /
count = r eadli nk(link file , real fi le, PATH_MAX ) ; Read the link
if (count ! = sbuf.st_size )
/ * something weird going on, handle i t * /
realfile[countl = '\0' ; Make it into a C string

This example uses fixed-size buffers for simplicity of presentation. Real code would
use malloe () to allocate a buffer of the correct size since the fixed-size arrays might
be too small. The file 1 i b / xreadl ink. c in the GNU Coreutils does just this . It reads
5.4 Obraining Informarion abour Files 153
the contents of a symbolic link into storage allocated by malloc ( ) . We show here just
the function ; most of the fil e is boilerplate definitions. Line numbers are relative to the
start of the file:
55 1* Call readlink t o get the s ymbolic l ink value of FILENAME .
56 Return a poi nter to tha t NUL-terminated string in mall oc'd stor age.
57 If readlink fai ls, return NULL (calle r may use errno c o diagnos e ) .
58 If r ealloc fails, or if the link value is longer than SIZ E_MAX :- ) ,
59 give a diagno s tic and exit . *1
60
61 c ha r *
62 xre a dlink (cha r c onst *filename)
63
64 1* The initi al buffer size f or the link val ue . A power of 2
65 detects arithmetic overflow earlier, but is not requir ed . *1
66 si ze_ t buf_size = 128;
67
68 while ( 1)
69 (
70 cha r *buffer = xmall oc (bu f_size ) ;
71 ssi ze_ t link_l ength = readl ink ( fi lename , buffer, buf _s i ze ) ;
72
73 if (l ink_leng th < 0)
74 (
75 inc saved_e rrno = errno ;
76 free (bu ffer ) ;
77 errno = saved_errno;
78 return NULL ;
79
80
81 if (( size_t ) link_leng th < bu f_ size )
82 (
83 buffer [link_ leng th) 0;
84 return buffe r;
85
86
87 free (buff er) ;
88 bu f_size *= 2;
89 if ( SSIZE_MAX < buf si ze II (SIZE_MAX I 2 < SSIZE_MAX && buf S1.ze 0) )
90 xalloc_d ie () ;
91
92 }
The function body consists of an infinite loop (lines 68-91), broken at line 84 which
returns the allocated buffer. The loop starts by allocating an initial buffer (line 70) and
reading the link (line 71) . Lines 73- 79 handle the error case, saving and restoring errno
so that it can be used correctly by the calling code.
Lines 81-85 handle the "s uccess" case, in which the link's contents' length is smaller
than the buffer size. In this case, the terminating zero is supplied (line 83) and then the
buffer returned (line 84), breaking the infinite loop. This ens ures that the entire link
contents have been placed into the buffer, since readlink () has no way to indicate
"insufficient space in buffer. "
Lines 87-88 free the buffer and double the buffer size for the next try at the top of
the loop. Lines 89-90 handle the case in which the link's size is roo big: bu C siz e is
greater than SSIZE_ MAX, or S SIZ E_MAX is larger than the value that can be represented
in a signed integer of the same size as used to hold SI ZE_MAX and buf_siz e has wrapped
around to zero. (These are unlikely conditions, but strange things do happen .) If either
condition is true, the program dies with an error message. Otherwise, the function
continues around to the top of the loop to make another try at allocating a buffer and
reading the link.
Some further explanation: The 'SIZE_MAX / 2 < S SIZE_MAX' condition is true
only on systems on which 'SI ZE_ MAX < 2 * SS IZE_MAX' ; we don ' t know of any, but
only on such a system can bu f _ s i z e wrap around to zero. Since in practice this co ndi-
tion can't be true, the compiler can optimize away the whole expression, including the
following 'buCsiz e == 0' test. After reading this code, you might ask, "Why not use
1 s ta t () to retrieve the size of the symbolic link, allocate a buffer of the right size with
mal lo c (), and be done ?" Well, there are a number of reasons .9
• 1 s ta t () is a system call-it's best to avoid the overhead of making it since the

contents of most symbolic links will fit in the initial buffer size of 128 .
• Calling lst a t () introduces a race condition : The link could change benveen the
execution of lsta t ( ) and read link () , forcing the need to iterate anyway.
• Some sys tems don' t properly fill in the st_si z e member for symbolic links. (Sad,
but true.) In a similar fashion, as we see in Section 8.4.2, "Getting the C urrent
Directory: g et cwd ( ) ," page 258, Linux provides special symbolic links under
/ proc whose st si ze is zero, but for which readlink () does return
valid conten t.
Finally, when the buffer isn't big enough, xreadl ink () uses fr e e () and malloc ( )
with a bigger size, instead of realloc ( ) , to avoid the useless copying that realloc ( )
9 Thanks to Jim Meyering for explaining [he issues.

5.5 Changing Ownership, Permission, and Modificarion Times 155
does. (The comment on line 58 is thus out of date since reall oc () isn't being used;
this is fixed in the post-5.0 version of the Coreutils.)
5.5 Changing Ownership, Permission, and Modification Times

Several additional system calls let you change other file-related information: in par-
ticular, the owner and group of a file , the file 's permissions, and the file's access and
modification times.
5.5.1 Changing File Ownership: chown ( ) , f c hown ( ) , and lchown ( )

File ownership and group are changed wi th three similar system calls:
#incl u d e <sy s/types h > POSIX
#i nclude <unistd . h>
inc chovm (const cha r *path, uid_ t ovmer, g id_ t gr oup ) ;

int r c hown ( int fd, uid_t owne r , gi d_t group) ;
i n t l c h o wn (cons t char *path , uid_t o wne r , gid_t group) ;
c hown () works on a pathname argument, fch own ( ) works on an open file, and
1ch own ( ) works on symbolic links instead of on the files pointed to by symbolic links.
In all other respects, the three calls work identically, returning 0 on success and - 1
on errOL
It is noteworthy that one system call ch anges both the owner and gro up of a file. To
change only the owner or only the group, pass in a value of -1 for the ID number that
is to be left unchanged.
While you might think that you could pass in the corresponding value from a previ-
ously retrieved s t ruct s ta t for the file or file descriptor, that method is more erro r
prone. There's a race condition: The owner or group could have changed between the
call to s ta t () and the call to chown ( ) .
You might wonder, "Why be able to change ownership of a symbolic link? The
permissions and owners hip on them don't marteL" But what happens if a user leaves,
but all his files are still needed? It's necessary to be able to change the ownership on all
the person's files to someone else, including symbolic links.
GNU/Linux systems normally do not permit ordinary (non-root) users to change
the ownership of ("give away") their files. Changing the group to one of the user's
groups is allowed, of course. The restriction on changi ng owners follows BSD sys tems,
which also have this prohibition. The primary reason is that allowing users to give away
files can defeat disk acco unting. Consider a scenario like this:
$ mkdir mywork Make a directory
$ chmod go-rwx mywork Set permissions to drwx------
$ cd mywork Go there
$ myprogram > large_ data_ file Create a large file
$ chmod ugo+rw large_ data_ file Set permissions to - rw-rw- rw-
$ chown otherguy large_ data_ file Give file away to otherguy
In this example, large_da ta_file now belongs to user otherguy. The original
user can continue to read and write the file , because of the permissions. But otherguy
will be charged for the disk space it occupies. However, since it's in a directory that
belongs to the original user, which cannot be accessed by o therguy, there is no way
for otherguy to remove the file.
Some System V systems do allow users to give away files. (Setuid and setgid files have
the corresponding bit removed when the owner is changed.) This can be a particular
problem when files are extracted from a . tar or . epio archive; the extracred files end
up belonging to the UID or GID encoded in the archive. On such systems, the tar
and epio programs have options that prevent this, but it's important to know that
ehown ( ) 's behavior does vary across systems.
We will see in Section 6.3 , "User and Group Names," p age 195, how to relate user
and group names to their corresponding numeric values.
5.5.2 Changing Permissions: chmod () and f chmod ()

After all the discussion in Chapter 4, "Files and File I/O," page 83, and in this
chapter, changing permissions is almost anti climatic. It's done with one of two system
calls, chmod () and fehmod ( ) :
#in clude <sys / types . h> POSIX
#include <sys /stat . h>
in t chmo d (cons t char *path, mode_t mode);

int fchm od(int fildes, mode_t mode );
chmod () works on a path name argument, and f chmo d () works on an open file.
(There is no lchmod () call in POSIX, since the system ignores the permission settings
on symbolic links. Some systems do have such a call, though.) Ai> with most other system
calls, these return 0 on success and -1 on failure. Only the file's owner or r oot can
change a file 's permissions.
5.5 C hanging Ownership , Permission, and Modifica rio n Times 157
T he mode val ue is created in the same way as for open () and creat () , as discussed
in Section 4.6, "Creating Files, " page 106 . See also Table 5.2 , which lists the permis-
SlO n constan ts .
T he system will not allow setting the setgid bit (S_ISGID) if the group o f the file
does not m atch the effective gro up ID of the p rocess or one of its supplem entary gro ups.
(We have no t yet d iscussed these iss ues in detail; see Sectio n 11 . 1.1, "Real and Effective
IDs, " page 40 5.) Of course, this check does not apply to r oot or to code running
as roo t .
5.5.3 Changing Timestamps: utime ( )

T he st ruct s ta t structure contains three fields of type t i me_t :
s t a t ime T he time the fi le was last accessed (read).
st_mtime The time the file was last modified (wrinen).
st c t i me T he time the file's inode was las t changed (for example, renamed).
A t i me_ t value represents time in "seconds since the Epoch." T he Epoch is the Be-
ginning of Time for comp uter systems. GNU/Lin ux an d U nix use Midnight, Jan uary
1, 1970 UTC 10 as the Epoch. M icrosoft W indows system s use M idnight J anuary 1,
1980 (local time, apparently) as the Epoch .
t ime_ t values are sometimes referred to as timestamps. In Section 6.1 , "Times and
Dates," page 166, we look at how these val ues are obtai ned an d at how they're used .
For now, it's enough to know what a time_t value is and that it represents seconds
since the Epoch.
T he uti me () system call allows yo u to change a file 's access and modificatio n
times tamps:
#include <sys/types. h> POSIX
#incl u de <utime . h >
int utime(const char *f i lename, struct utimbuf *buf);
A struct ut imbuf looks like this:
to UTC is a language- independe nt acronym for Coordinated U niversal T ime. O lder code (and so metimes older
people) refer ro this as "Greenwich Mean T ime" (GMT), which is the time in G reenwich , England . When time
zones ca me inro widespread use, G ree nwich was chosen as the locatio n to which all other time zo nes are relative,
either behind it or ahead of it.
s truct u t irnbuf {
time_t a ctime; / * access time * /
time_t modtime ; / * modifi c at i on t ime * /
};
If the call is successful, it returns 0; otherwise, it returns -1. Ifbuf is NULL, then the
system sets both the access time and the modification time to the current time.
To change one time but not the other, use the original value from the s t ruet sta t .
For example:
/ * Error checking o mi tted for brevity * /
struct s t a t sbuf;
struct utirnbuf uti
time_t now ;
time (& now) ; Get current time of day, see next chapter
stat( " /some / f ile ", & sbuf ); Fill in sbuf
ut . actime = sbuf.st_ at i me; Access time unchanged
u t . modtime = n ow - (24 * 60 * 60); Set modtime to 24 hours ago
u time( " /some /file ", & ut ) ; Set the values

About now, you may be asking yo urself, "Why would anyone want to change a file 's
access an d modification times?" Good question.
T o answer it, consider the case of a program that creates backup archives, such as
tar or e pi o . These programs have to read the contents of a file in order to archive
them. Reading the file, of course, changes the file's access time.
However, that file might not have been read by a human in 10 years. Someone doing
an '1 s -1 u' , which displays the access time (instead of the default modification time),
should see that the last time the file was read was 10 years ago. Thus, the backup program
should save the original access and modification times , read the file in order to archive
it, and then restore the original times with u t i rne ( ) .
Similarly, consider the case of an archiving program restoring a file from an archive.
The archive stores the file's original access and modification times. However, when a
file is extracted from an archive to a newly created copy on disk, the new file has the
current date and time of day for its access and modification times.
However, it's more useful if the newly created file looks as if it's the same age as the
original file in the archive. Thus, the archiver needs to be able to set the access and
modification times to those stored in the archive.
5.5 Changing Ownership, Permission , and Modifica(ion Times 15 9
IIi NOTE I n new code, yo u may wish to use the utimes () call (note the s in the
I name ), which is described later in th e book, in Section 14.3. 2, "Microsecond
I File Tim es: utimes ( ) ," page 545 .
5.5.3 .1 Faking u time (f ile, NULL)

Some older sys tems don ' t set the access and modification times to the current time
when the second argument to utime () is NULL . Yet, higher-level code (such as GNU
to uch) is simpler and more straightforward if it can rely on a single standardized
interface.
The GNU Coreutils library thus contains a replacement function for u ti me () that
handles this case, which can then be called by higher-level code. T his reflects the "pick
the best interface for the job" design principle we described in Section l.5, "Portabiliry
Revis ited," page 19.
The replacement function is in the fi le li b / utime . c in the Coreutils distribution.
The following code is the version from Coreutils 5.0. Line numbers are relative to the
start of the file:
24 #include <sys/types.h>
25
26 #ifdef HAVE_UTIME_H
27 # include <ut ime . h>
28 #endif
29
30 #include "full - wr ite.h"
31 #include " safe -read . h"
32
33 / * Some systems (even some that do hav e <utime . h>) don't declare this
34 structure anywhere. */
35 #if nde f HAVE_STRUCT_UTIMBUF
36 struct utimbuf
37
38 long actime ;
39 long modtime ;
40 };
41 #endif
42
43 / * Emulat e utime (f ile, NULL ) for syst ems (l ike 4 . 3BSD ) tha t do not
44 interp r et it to se t the access and modificat i on times o f FILE to
45 the cu rr ent time . Return 0 if s uccess ful , -1 if not . * /
46
47 static int
48 utime_null (const char *fil e )
49
50 #if HAVE_UTIMES_NULL
51 return utimes (file , 0);
52 #els e
53 int fd;
54 char c;
55 int status = 0 ;
56 struct stat sb;
57
58 fd = open(file, O_RDWR ) ;
59 if (fd0 <
60 II fs tat ( fd, &sb ) < 0
61 I I saf e_read (fd , &c, sizeof c ) == SAFE_READ_ERROR
62 II l s eek (fd , (o fe t) 0 , SEEK_ SET ) < 0
63 II full _write (fd , &c, sizeo f c) ! = sizeof c
64 /* Maybe do this - - it's nece ss ary on SunOS4. 1.3 with some combina tion
65 of patches, but that system doesn't use this code : it has utimes .
66 I I fsync ( fd ) < 0
67 *I
68 I I (st.s t_size == 0 && ftruncate (fd , st.st_s ize) < 0)
69 II close (fd) < 0)
70 status = -1;
71 return status;
72 #endif
73
74
75 i nt
76 rpl _ut ime (const cha r *file, const struct utimbuf *times)
77
78 if ( times )
79 return utime ( fil e, times ) ;
80
81 return utime_nu ll (file);
82 }
Lines 33-41 define the struc t utimbuf; as the comment says, some systems don ' t
declare the structure. The utime_nul l () function does the wo rk. If the utimes ()
system call is available, it is used. (ut imes () is a similar, but more advanced, system
call, which is covered in Section 14.3.2, "Microsecond File Times: ut ime s ( ) ," page 545 .
It also allows NULL for the second argument, meaning use the current time. )
In the case that the times must be updated manually, the code does the update by
first reading a byte from the file , and then writing it back. (The original Unix touc h
worked this way.) The operations are as follows:
l. Open the file , line 58.

2. Call stat () on the file, line 60.
5.5 C hanging Ownership , Permission, and Modificarion Times 161
3. Read one byte, line 6 1. For our purposes, safe_read () acts like read() ; it's
explained in Section 10.4.4, " Restartable System Calls," page 357.
4. Seek back to the front of the file with lseek ( ) , line 62. This is done to write
the just-read byte back on top of itself.
5. Write the byte back, line 63. full_write () acts like writ e (); it is also covered
in Section 10.4.4, "Restartable System Calls," page 357.
6. If the file is of zero size, use ftruncate () to set it to zero size (line 68) . This
doesn't change the file , but it has the side effect of updating the access an d
m odificatio n times. (ft runcate () was described in Section 4.8, "Setting File
Length," page 114.)
7. Close the file , line 69.
T hese steps are all done in one long successive chain of tests , inside an if. The tests
are set up so that if any operation fails, u time_null () returns - 1, like a regular sys tem
call. e rrno is automatically set by the system, for use by higher-level code.
The rp l_utime () function (lines 75-82) is the "replacement utime () ." If the
seco nd argument is not NULL , then it calls the real utime ( ) . Otherwise, it calls
utime _ null () .
5.5.4 Using fchown () and fc hmod () for Security

T he original U nix systems had only chown () and chmod () system calls. H owever,
on heavily loaded systems, these system calls are subject to race conditions, by which
an attacker could arrange to replace with a different file the file whose ownership or
permissions were being changed.
However, o nce a file is opened, race conditions aren ' t an issue anymore. A program
can use stat () on a pathname to obtain information about the file. If the information
is what's expected, then after the file is opened, f s ta t () can verify that the file is the
same (by comparing the st_dev and st_ ino fields of the "befo re" and "after" struct
stat structures).
Once the program knows that the files are the same, the ownership or permissions
can then be changed with fch own () or fc hmod ( ) .
These system calls, as well as 1 chown ( ) , are of relatively recent vintage; 11 older Unix
systems won't have them, although modern, POSIX-compliant systems do .
There are no corresponding futime () or lutime () functions. In the case of
futime (), this is (apparently) because the file timestamps are not critical to system
security in the same way that ownership and permissions are. There is no l utime ( ),
since the timestamps are irrelevant for symbolic links.
5.6 Summary
• The file and directory hierarchy as seen by the user is one logical tree, rooted at
/ . It is made up of one or more storage partitions, each of which contains a
filesystem. Within a filesystem, inodes store information about files (metadata),
including the location of file data blocks.
• Directories make the association between filenames and inodes. Conceptually,
directory contents are just sequences of (inode, name) pairs. Each directory entry
for a file is called a (hard) link, and files can have many links. Hard links, because
they work only by inode number, must all be on the same filesystem. Symbolic
(soft) links are pointers to files or directories that work based on filename, not inode
number, and thus are not restricted to being on the same filesystem.
• Hard links are created with li nk ( ), symbolic links are created with s ymlink () ,
links are removed with unlink () , and files are renamed (possibly being moved
to another directory) with r ename ( ) . A file 's data blocks are not reclaimed until
the link count goes to zero and the last open file descriptor for the file is closed.
• Directories are created with mkdir () and removed with rmdir ( ); a directory
must be empty (nothing left but ' . ' and ' .. ') before it can be removed. The
GNU/Linux version of the ISO C remov e ( ) function calls unlink () or rmdir ( )
as appropnate.
• Directories are processed with the opendir ( ) , re addir ( ) , rewinddir ( ) , and
cl os edir () functions. A st ruct d irent contains the inode number and the
file's name. Maximally portable code uses only the filename in the d_name member.
The BSD telldir () and seekdir () functions for saving and restoring the
11 f chown () and f chmod ( ) were introduced in 4.2 BSD bur not picked up for System V until System V Release 4.
5.7 Exercises 163
current position in a directory are widely available bu t are not as fully portable as
the other directory processing functions.
• File metadata are retrieved with the s ta t () family of system calls; the s tru c t
stat structure contains all the information about a file except the filename. (Indeed,
since a file may have many names or may even be co mpletely unlinked, it's not
possible to make the name available.)
• The S _ I S xxx() m acros in <sys /st at . h> make it possible to determine a file's
type. The ma j or () a nd minor ( ) functio ns from <sys / sysmac ros. h > make it
possible to decode the dev _ t values that represent block and character devices.
• Symbolic links can be checked for using l stat ( ) , and the st_s ize field of the
st ru c t s tat for a symbolic link returns the number of bytes needed to hold the
name of the pointed-to file. T he contents of a symbolic link are read with
r e ad1 ink ( ) . Care must be taken to get the buffer size correct and to terminate
the retrieved filename with a trai ling zero byte so that it can be used as a C string.
• Several miscellaneo us system calls update other information: the chown () family
for the owner and group, the c hmod () routines for the file permissions, and
utime () to change file access an d modificatio n times.
Exercises
l. Wri te a routine 'con st c har * f mt_mode (mode_ t mode ) ' . The input is a
mo de_ t value as provided by the st_mode field in the st ruct stat ; that is,
it co ntains both the permission bits and the file type.
The output sh o uld be a 1O-character stri ng identical to the first field of output
fro m ' 1 s -1' . In other words, the first character identifies the file type, and
the other nine the permissions.
When the S_I S UID and S_IXUSR bits are se t, use an s instead of an x; if only
the I_ISUI D bit is set, use an s. Similarly for the S_ISG ID and S_I XGRP bits.
Ifboth the S_ISVTX and S_IXOTH bits are set, use t ; for S_ISVT X alone, use T.
For simplicity, you may use a stati c buffer whose contents are overwritten
each time the routine is called.
2. Extend chOS - c atdir. c to call stat () on each file name found. Then print
the inode number, the result of fn1t_mode () , the link count, and the file's name.
3. Extend chO 5 -ca tdir. c further such that if a file is a symbolic link, it will also
print the name of the pointed-to file.
4. Add an option such that if a filename is that of a subdirectory, the program
recursively enters the subdirectory and prints information abour the subdirec-
tory's files (and directories) . Only one level of recursion is needed.
5. If you're not using a GNU/Linux system, run chO S- trymkdir (see Section 5.2,
"Creating and Removing Directories," page 130) on your system and compare
the results to those we showed.
6. Write the mkdir program. See your local mkdir(l) manpage and implement
all its options.
7. In the root directory, / , both the device and inode numbers for ' . ' and ' .. '
are the same. Using this bit of information, write the pwd program.
The program has to start by find ing the name of the current directory by
reading the contents of the parent directory. It must then continue, working
its way up the filesystem hierarchy, until it reaches the root directolY.
Printing the directory name backwards, from the current directory up to the
root, is easy. How will your version of pwd manage to print the directory name
in the correct way, from the root on down ?
8. If you wrote pwd using recursion, write it again, using iteration. If you used
iteration, write it using recursion. Which is better? (Hint: consider very deeply
nested directory trees.)
9. Examinetherp l _utime () function (see Section 5.5.3.1, "Fakingutime (fi le,
NULL ) ," page 159) closely. What resource is not recovered if one of the tests
in the middle of the if fails? (Thanks to Geoff Collyer.)
10. (Hard.) Read the chmod(l) manpage. Write code to parse the symbolic options
argument, which allows adding, removing, and setting permissions based on
user, group, other, and "all. "
Once you believe it works, write your own version of chmod that applies the
permission specification to each file or directory named on the command line.
Which function did you use, chmod ( ) -or open () and f chmod ( ) - and why?
In this chapter
• 6.1 Tim es a nd Dates page 166

• 6.2 Sorting a nd Searchi ng Fu nct io ns page 18 1
• 6 .3 User a nd Group Names page 195
• 6.4 Termi na ls: is atty() page 202
• 6 .5 Suggested Readi ng page 203
• 6.6 Summ ary page 203
165
W e saw in Chapter 5, "Directories and File Metadata, " page 11 7, that directly
reading a directory returns filenames in the order in which they' re kept in
the directory. We also saw that the struct stat contains all the information about
a file, except its name. However, some components of that structure are not directly
usable; they' re just numeric values.
This chapter presents the rest of the APIs needed to make full use of the s truct
st at component values. In order, we cover the following topics: t i me_ t values for
representing times and the time formatting function ; sorting and searching functions
(for sorting filenames, or any other data); the u i d_t and g i d_ t types for representing
users and groups and the functions that map them to and from the corresponding
user and group names; and finally, a function to test whether a file descriptor repre-
sents a terminal.
6.1 Times and Dates

Time values are kept in the type known as time_t oThe ISO C standard guarantees
that this is a numeric type but does not otherwise specify what it is (integer or floating-
point) , or the range or the precision of the values stored therein.
On GNU/Linux and Unix systems, time_ t values represent "seconds since the
Epoch." The Epoch is the beginning of recorded time, which is Midnight, January 1,
1970, UTe. On most systems, a t ime_t is a C lang into For 32-bit systems, this
means that the time_t "overflows" sometime on J anuary 19, 2038. By then, we hope,
the t i me_t type will be redefined to be at least 64 bits big.
Various functions exist to retrieve the current time, compute the difference between
two t i me_ t values, convert t i me_t values into a more usable representation, and format
both representations as character strings. Additionally, a date and time representation
can be converted back into a t i me_t, and limited time-zone information is available.
A separate set of functions provides access to the current time with a higher resolution
than one second. The functions work by providing two discrete values: the time as
seconds since the Epoch, and the number of mi croseconds within the current second.
These functions are described later in the book, in Section 14.3.1, "Microsecond Times:
ge tt i me of day( ) ," page 544.
166
6.1 T imes and Dares 167
6.1.1 Retrieving the Current Time: time () and difftime ()

The time () sys tem call retrieves the current date and time; difftime () computes
the difference between two time t values:
#include <time . h> ISOC
time_t time (t ime_t *t ) ;

doubl e difftime( time_t time1, time_t timeD);
time () returns the current time. If the t parameter is not NULL, then the value
pointed to by t is also filled in wi th the current time. It returns (t ime_ t ) -1 if there
was an error, and errno is set.
Although ISO C doesn ' t specify what's in a time_t value, POSIX does indicate that
it represents time in seco nds. Thus , it's both common and portable to make this assump-
tion. For example, to see if a time value represents something that is six m onths or more
in the past, one might use code like this:
/ * Error checking omit ted for brevit y * /
time_t now , then, some_time;
time (& now) ; Get current time

then = now - (6L * 31 * 24 * 60 * 60) ; Approximately six months ago
... set some_time, for example, via stat() .

if (some_time < then)
/* more than 6 months in the past * /
else
/ * less than 6 months in the past * /
H owever, since strictly portable code may need to run on non-POSIX systems, the
difftime () function exis ts to produce the difference between two times. The same
test, using difftime (), would be written this way:
time_t now, some_value;
const doubl e six_month s = 6 . 0 * 31 * 24 * 60 * 60 ;
time (& now) ; Get current time

. set some_time, for example, via stat() .
if (difftime(now, some_time ) >= s ix_months)

/ * more than 6 months in the past * /
else
/ * less than 6 months in the past */
The return type of difftime () is a double because a time_t could possibly repre-
sent fractions of a second as well. On POSIX systems, it always represents whole seconds.
168 Chapter 6 • General Library Interfaces - Part 1
In both of the preceding examples , note the use of typed constants to force the
computation to be done with the right type of math: 6L in the first instance for l ong
integers, 6 . 0 in the second, for floating point.
6.1.2 Breaking Down Times: gmtirne () and local time ( )

In practice, the "seconds since the Epoch" form of a date and time isn' t very useful
except for simple comparisons. Computing the components of a time yourself, such as
the month, day, year, and so on, is error prone, since the local time zone (possibly with
daylight-saving time) must be taken into account, leap years must be computed correctly,
and so forth. Fortunately, two standard routines do this job for you:
#inc l ude <time.h> ISO C
struct tm *gmtime (const time_t *tirnep ) ;

struct tm *localtirne (c onst time_t *timep ) ;
gmtime () returns a pointer to a struct tm that represents UTe time. loc al time ()
returns a pointer to a struct tm representing the local time; that is, it takes the current
time zone and daylight-saving time into account. In effect, this is "wall-clock time,"
the date and time as it would be displayed on a wall clock or on a wristwatch. (How
this works is discussed later, see Section 6.1.5, "Getting Time-Zone Information,"
page 178.)
Both functions rerum a pointer to a struct tm, which looks like this:
struct tm
in c tm _sec; /* sec onds * /
i nt tm_min; /* minu tes * /
in t tm_hour ; /* hour s * /
int tm_mday; /* day of the month */
i nt tm _mon; /* month * /
int tm-year; /* year * /
int tm_wday; /* day o f the week * /
int tm-yday; /* day in the year * /
i nt tm_ isdst; /* daylight saving time * /
} ;
The struct tm is referred to as a broken-down time, since the t i me_t value is

"broken down" into its component parts. The component parts , their ranges, alld their
meanings are shown in Table 6.1.
6.1 Times and Dates 169
TABLE 6.1
Fields in the s true t tm
Member Range Meaning

0-60 Second within a minute. Second 60 allows for leap seconds.
(C89 had the range as 0-61.)
tm_min 0- 59 Minute within an hour.
tm_hour 0-23 Hour within the day.
t m_mday 1-31 Day of the month.
tm_mon 0-11 Month of the year.
tm_year O-N Year, in years since 1900.
tm_wday 0-6 Day of week, Sunday = O.
tm_yday 0-365 Day of year, January 1 = O.
tm_isdst < 0,0, > 0 Daylight Savings Time flag.
The ISO C standard presents most of these values as "x since y." For example, tm_sec
is "seconds since the minute," tm_mon is "months since January, " tm_wday is "days
since Sunday," and so on. This helps to understand why all the values start at o. (The
single exception, logically enough, is tm_mday, the day of the month, which ranges
from 1-3l.) Of course, having them start at zero is also practical ; since C arrays are
zero-based, it makes using these values as indices trivial:
static const char *const days[] = ( Array of day names
" Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday",
) ;
time_t now;
struct tm *curtime;
time (& now); Get current time

curtime = gmtime(& now); Break it down
printf ( "Day of the week : %s \n ", days [curtime->tm_wday] ) ; Index and print
Both gmtime () and local time () return a pointer to a struct tm. The pointer
points to a static struct tm maintained by each routine, and it is likely that these
s truct tm structures are overwritten each time the routines are called. Thus, it's a
good idea to make a copy of the returned struct . Reusing the previous example:
static const char *const days[] { /* As before * / };

time_t now;
struct tm curtime; Structure, not pointer
time (& now ) ; Get current time

curtime = *gmtime (& now ) ; Break it down and copy data
printf ( "Day of the week : %s\n", days [curtime . tm_wday ] ) ; Index and print, use. not - >
The tm_i sdst field indicates whether or not daylight-saving time (DSn is currently
in effect. A value of 0 means DST is not in effect, a positive value means it is, and a
negative value means that no DST information is available. (The C standard is
purposely vague, indicating only zero, positive, or negative; this gives implementors
the most freedom .)
6.1.3 Formatting Dates and Times

The examples in the previous section showed how the fields in a st ru c t t m could
be used to index arrays of character strings for printing informative date and time values.
While you could write your own code to use such arrays for formatting dates and times,
standard routines alleviate the work.
6.1.3.1 Simple Time Formatting: asctime () and ctime ( )

The first two standard routines, listed below, produce outpur in a fixed format:
~inc l ud e <time . h> ISO C
char *asctime ( const struct tm *tm);

char *ctime (c onst time_t *timep ) ;
As with gmtirne () and local t i me ( ) , a s c time () and ct i me () return pOInters to

stati c buffers that are likely to be overwritten upon each call. Furthermore, these two
routines return strings in the same format. They differ only in the kind of argument
they accept. asctime () and c time ( ) should be used when all you need is simple date
and time information:
#include <stdio . h>
#include <time . h>
int main(void )
time (& now);

printf("%s", ctime(& now));
6.1 Times and Dares 171
When run , this program produces outp ut of the form: 'Thu May 22 15: 44 : 21
2 003'. The terminating newline is included in the result. To be m ore precise, the return
value points to an array of 26 characters, as shown in Figure 6.l.
o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
, , I ' I I
T r h r. u May 2 2 15
r (
: r 44
!
:21 2 r 0r 0 3
J
FIGURE 6.1
Return string from c time () and asctime ()
Much older Unix code relies o n the fact that the values h ave a fixed position in the
returned string. When using these rourines, remember th at they include a trailing
newline. Thus, the small example program uses a simple "%s" format string for
printf () , and not " %s\n ", as might be expected.
ct ime () saves you the step of calling lo caltime ( ); it's essentially eq uivalent to
time_t now ;
char *c urtime;
time( & now) ;

curtime = asctime(lo caltime(& now));
6 .1.3.2 Complex Time Formatting: strftime ()

While as ctime () and ct i me () are often adequate, they are also limited :
• The output format is fixed. There's no way to rearrange the order of the elements.
• The output does not include time-zone information.
• The output uses abbreviated month and day names.
• The ourpur assumes English names for the months and days.
For these reasons, C89 introduced the strftime () standard library routine:
#include <time . h> ISO C
size_t strftime (char *s, size_t max , const char *format,

const struct tm *tm);
strftime () is similar to sp rin tf (). The arguments are as follows:
char *s
A buffer to hold the formatted string.
size t max
The size of the buffer.
canst char *f a rma t
The format string.
canst struct tm *tm
A struct t m pointer representing the broken-down time to be formatted.
The format string contains literal characters, intermixed with conversion specifiers
that indicate what is to be placed into the string, such as the full weekday name, the
hour according to a 24-hour or 12-hour clock, a.m. or p.m. designations, and so on.
(Examples coming shortly.)
If the entire string can be formatted within max characters, the return value is the
number of characters placed in s, not including the terminating zero byte. Otherwise,
the return value is o. In the latter case, the contents of s are "indeterminate. " The fol-
lowing simple example gives the Bavor of how strftime () is used:
#include <stdio.h>
#include <time . h>
int main (vo id )
char buf[100];
time_t now;
struct tm *curtime;
time (& now) ;

curtime = localtime (& now) ;
(void ) strftime(buf, sizeof buf,
"It is now %A, %8 %d, %Y, %I :%M %p" , curtime);
printf ( "%s \ n", buf);

exit(O ) ;
When run, this program prints something like:

It is now Thursday, May 22, 2003, 04 : 15 PM
Table 6.2 provides the full list of conversion specifiers, their possible alternative
representations, and their meanings. In addition, the C99 standard added more specifiers
to the list; those that are new in C99 are marked with a ./ symbol.
6.1 Ti mes a nd Dates 173
TABLE 6 .2
strftirne ( ) conversion format specifiers
Specifier(s) egg Meaning

%a The locale's abbreviated weekday n ame.
%A The locale's full weekday name.
%b The locale's abbreviated month name.
%B The locale's full month name.
%c , %Ec The locale's "appropriate" date and time representation.
%C, %EC ./ The ce ntu ry ( 00 -99) .
%d, %Od The day of the month (0 1 - 31).
%D ./ Sam e as %rn / %d / %y .
%e, %O e ./ The day of the month. A single digit is preceded with a space (1-31) .
%F ./ Same as %Y- %rn- %d (ISO 8601 date format) .
%g ./ The last two digits of week-based year (00-99 ) .
%G ./ The ISO 8601 week-based year.
%h ./ Same as %b .
%H, %OH The hour in a 24-hour clock (00-2 3) .
%I, %O I The hour in a 12-hour clock (0 1-12) .
%j The day of the year (001-366).
%rn, %Orn The month as a number (01 - 12).
%M, %OM The minute as a number (00-59).
%n ./ A newline character (, \ n ' ) .
%p The locale's a.m'! p.m. designation.
%r ./ The locale's 12-hour clock time .
%R ./ Same as %H : %M.
%S, %OS The second as a number (00- 60).
%t ./ A TAB character ( , \ t ' ) .
%T ./ Same as %H: %M: %S (ISO 8601 time format) .
%u, %Ou ./ ISO 8601 weekday number, Monday = 1 (1-7) .
%U, %OU Week number, first Sunday is first day of week 1 (00-5 3).
174 Chapter 6 • General Library Interfaces - Parr 1
Spec ifier( s) e99 Meaning

%V,% OV ./ ISO 8601 week number (01-53) .
%w, %Ow The weekday as a number, Sunday = 0 (0-6).
%W, %OW Week number, first Monday is first day of week 1 (00-53) .
%x,%Ex The locale's "appropriate" date representation.
%X,%EX The locale's "appropriate" time representation.
%y, %Ey , %Oy The last two digits of the year (00- 99) .
%Y, %EY The year as a number.
%Z The locale's time zone, or no characters if no time-zone information
is available.
%% A single %.
A locale is a way of describing the current location, taking into account such things
as language, character set, and defaults for formatting d ates, times, and monetary
amounts, and so on. We deal with them in Chap ter 13 , "Internationalization and Lo-
calization," page 485. For now, it's enough to understand that the results from
strf time () for the same format string can vary, according to the current locale.
The versions starting with %E an d %0 are for "alternative representations." Some locales
h ave multiple ways of representing the sam e thing; these specifiers provide access to the
additional representations. If a particular locale does not support alternative representa-
tions, then strftime () uses th e regular version.
Many Unix versions of date allow you to provide, on the command line, a format
string that begins with a + character. da te then formats the current date and time and
prints it according to the format string:
$ date +'It is now ~oA, %B %d, %Y, %I:~~ %p'
I t is now Sunday, May 25, 2003 , 0 6 : 44 PM
Most of the new C99 specifiers come from such existing Unix date implementations.
The %n and %t formats are not strictly necessary in C, since the TAB and newline
characters can be directly embedded in the string. However, in the context of a dat e
format string on the command line, they make more sense. Thus, they' re included in
the specification for strft ime () as well.
6.1 Times and Dates 175
The ISO 8601 standard defines (among other things) how weeks are numbered
within a year. According ro this standard, weeks run Monday through Sunday, and
Monday is day 1 of the week, not day o. If the week in which January 1 comes out
contains at least four days in the new year, then it is considered to be week 1. Otherwise,
that week is the last week of the previous year, numbered 52 or 53 . These rules are used
for the computation of the %g, %G, and %v format specifiers. (While parochial Americans
such as the author may find these rules strange, they are commonly used
throughout Europe.)
Many of the format specifiers produce results that are specific to the current locale.
In addition, several indicate that they produce the "appropriate" representation for the
locale (for example, %x) . The C99 standard defines the values for the " c" locale. These
values are listed in Table 6.3.
TABLE 6.3
"e " locale values for certain strftime() formats
Specifier Meaning
%a The first three characters of %A.
%A One of Sunday, Monday, ... , Sa turday.
%b The first three characters of %B.
%B One of Janua r y, February, ... , December.
%c Same as %a %b %e %T %Y.
%p One of AM or PM.
%r Same as %1 : %M : %S %p.
%x Same as %m/%d/%y.
%X Same as %T.
%Z Implementation-defined.
It should be obvious that s trf time () provides considerable flexibility and control
over date- and time-related output, in much the same way as printf ( ) and spr intf ( )
do. Furthermore, s tr ft i me () cannot overflow its buffer, since it checks against the
passed-in size parameter, making it a safer routine than is spr i ntf ( ) .
As a simple example, consider the creation of program log files , when a new file is
created every hour. The filename should embed the date and time of its creation in
ltS name:
/* Error checking omitted for brevity * /
char fname[PATH_MAX1; /* PATH_MAX is in <limits.h> * /
time_t now;
struct tm *tm;
int fd;
time (& now ) ;

tm = localtime (& now ) ;
strftime(fname, sizeof fname, " / var / log/myapp . %Y-%m-%d- %H:%M " , tm ) ;
fd = c r eat(name , 0600 ) ;
The year-month-day-hour-minute format causes the filenames to sort in the order

they were created.
I ~OTE someb~ime
§
formats are more useful thandotherfis . For exa(WmPh le , d12-hour
l
times are am IgUOlJS, as are any pure y numenc ate ormats. at oes
I.~ '9 / 11' mean? it depends on where you live.) Similarly, two-digit years are also
I a bad idea. Use strftime () judiciously.
6.1.4 Converting a Broken-Down Time to a tirne_t

Obtaining seconds-since-the-Epoch values from the system is easy; that's how date
and times are stored in in odes and returned from t ime () and sta t ( ) . These values
are also easy to compare for equality or by < and > for simple earlier-than/later-than tests.
However, dates entered by humans are not so easy to work with. For example, many
versions of the touch command allow you to provide a date and time to which t ou c h
should set a file's modification or access time (with u t ime ( ), as described in Sec-
tion 5.5.3 , "Changing Timestamps: utime () ," page 157) .
Converting a date as entered by a person into a time_t value is difficult: Leap years
must be taken into account, time zones must be compensated for, and so on. Therefore,
the C89 standard introduced the mktime () function:
#include <time.h> ISOC
time_tmktime(struct tm *tm ) ;
To use mktime ( ) , fill in a struct tm with appropriate values: year, month, day,
and so on. If you know whether daylight-saving time was in effect for the given date,
6. 1 Times and Da[es 177
set the tm_i sdst field appropriately: 0 for "no ," and positive for "yes." Otherwise, use
a negative value for "don ' t know." The tm_wday and tm_yday fields are ignored.
mkt ime () assumes that the struct tm represents a local time, not UTe. It returns
a time_t value representing the passed-in date and time, or it returns (time_t) - 1
if the given d ate/time cannot be represented correctly. Upo n a successful return , all the
values in the st ruct tm are adjusted to be within the correct ranges, and tm_wday
and tm_yday are set correctly as well. H ere is a simple examp le:
1 1* ch06 -echoda te .c -- - demon stra te mktime() . *1
2
3 #include <stdio . h >
4 #include <time . h >
5
6 int mai n(void)
7
8 struct t m tm ;
9 time_t then;
10
!1 print f ("Enter a Da te /time as YYYY /MM/DD HH :MM: SS ") ;
12 scan f ("%d/%d/%d %d : %d : %d ",
13 & tm . tm_year , & tm . tm_mon , & tm . tm_mday,
14 & tm . tm_hour , & tm . tm_min, & tm . tm_sec) ;
15
16 1* Error checking on values om itt ed for brevity. *1
17 tm . tm_year -= 1900 ;
18 tm . tm_mon-- ;
19
20 tm . tm_isdst = -1; 1* Don't know about DST * 1
21
22 then = mktime(& tm);
23
24 p rintf ( "Got : %s ", ctime( & then)) ;
25 e xit(O);
26
Line 11 prompts for a date and time, and lines 12- 14 read it in. (Production code
should check the return value from scanf ( ) .) Lines 17 and 18 compensate for the
diffe rent basing of years and m onths, respectively. Line 22 indicates that we don't know
whether or not the given date and time represent daylight-saving time. Line 22 calls
mktime ( ), and line 24 prints the result of the conversio n. When compiled and run,
we see that it works:
$ ch06 -echodate
Ente r a Dat e/t ime as YYYY/MM/ DD HH : MM : SS 2003/5/25 1 9:07 : 2 3
Got : Sun May 25 19 : 07 : 23 200 3
6.1.5 Getting Time-Zone Information

Early Unix systems embedded time-zone information into the kernel when it was
compiled. The rules for daylight-saving time conversions were generally hard-coded,
which was painful for users outside the United States or in places within the United
States that didn 't observe DST.
Modern systems have abstracted that information into binary files read by the C li-
brary when time-related functions are invoked. This technique avoids the need to re-
compile libraries and system executables when the rules change and makes it much
easier to update the rules.
The C language interface to time-zone information evolved across different Unix
versions, both System V and Berkeley, until finally it was standardized by POSIX
as follows:
#include <time . h> POSIX
ext ern char *tzname[2};

e x tern l ong timezone ;
exte r n int day light ;
vo i d tz s et(void ) ;
The tz set () function examines the TZ environment variable to find time-zone and
daylight-saving time information.! If that variable isn't set, then tzset () uses an
"implementation-defined default time zone, " which is most likely the time zone of the
machine you're running on .
After t zse t () has been called, the local time-zone information is available in
several variables:
ex t ern char *t zname(2 )
The standard and daylight-saving time names for the time zone. For example, for
U.S. locations in the Eastern time zone, the time-zone names are 'EST' (Eastern
Standard Time) and 'EDT ' (Eastern Daylight Time).
1 Alrho ugh POS IX standardizes TZ's format, it isn 't all mat interesting, so we haven't bo th ered ro docum ent it
here. After all, it is tzs et () that has ro understand the format, not user-level code. Impl ementations can , and
do , u se formats that extend POSIX.
6. 1 Times and Da[es 179
extern long timez one

The difference, in seconds, berween the current time zone and UTe. The standard
does not explain how this difference works. In practice, negative values represe nt
time zones east of (ahead of, or later than) UTe; positive values represent time
zones west of (behind, or earlier than) UTe. If you look at this value as "how
much to change the local time to make it be the same as UTC, " then the sign of
the value makes sense.
ex tern in t d aylight
This variable is zero if daylight-saving time conversio ns should never be app lied
in the current time zone, and nonzero otherwise.
I@ NOTE Th e daylight varia ble do es not indicate whether daylig ht-saving tim e
I is currently in effect! Instead, it merely states whether the current time zone can
% even have daylight-saving time.
il
The POSIX standard indicates that c time ( ), localtime ( ), mkt i me ( ), and
strft ime () all act "as if" they call tzset () . This means that they need not actually
call tzset () , but they must behave as if it had been called. (The wording is intended
to provide a certain amount of flexibility for implementors while guaranteeing correct
behavio r for user-level code.)
In practice, this means that yo u will almost never have to call tzs et () yoursel f.
However, it's there if you need it.
6.1 .5 .1 BSD Systems Gotcha: timez one ( ) , Not timez one

Instead of the POSIX timezone variable, a number of sys tems derived from 4.4 BSD
provide a timezone () functio n:
#include <time . h> BSD
char *timezone (int zone, int dst ) ;
The zone argument is the number of minutes west of GMT, and ds t is true if day-
light-saving time is in effect. The return value is a string giving the name of the indicated
zone, or a value expressed relative to GMT. This function provides compatibility with
the V7 function of the same name and behavior.
Local Time: How Does It Know?

GNU/Linux systems store time zone information in files and directories underneath
/ u sr/share / z on e info:
$ cd /usr/share/zoneinfo
$ Is -Fe
Afri ca/ Canada ! Fa ctory Iceland MST7MDT Portugal W-SU
America ! Chile ! GB Indian ! Mexico ! ROC WET
Antar ct ica! Cuba GB-Eire Iran Mideast! ROK Zulu
Ar ctic! EET GMT Is r ae l NZ Singapo r e iso3166.tab
Asi a! EST GMT+ O Jamai ca NZ-CHAT SystemV! posix !
Atlantic ! EST5EDT GMT- O J apan Navajo Turkey posixrul e s
Aus trali a ! Egyp t GMTO Kwajalein PRC UCT right !
Brazil ! Eire Greenwi ch Libya PST8 PDT US! zone. tab
CET Et c! HST MET Pac if ic! UTC
CST6CDT Eur ope! Hongkong MST Poland Uni versa l
When possible, this directory uses hard links to provide the same data by multiple names.
For example, the fil es EST5 EDT and US / Eas tern are really the same:
$ Is -il ESTSEDT US/Eastern
724350 - rw - r - - r-- 5 root root 12 6 7 Sep 6 2002 EST5EDT
724350 - rw-r --r -- 5 ro ot r oo t 1267 Sep 6 2002 US!Eastern
Part of the process of installing a sys tem is to choose the tim e zo ne. The co rrect time-
zone data file is then placed in / etc / localtime:
$ file /etc/localtime
!etc!local time : timezon e dat a
On our system , thi s is a standalo ne copy of the time-zone file for our time zo ne. On
other systems , it m ay be a symbolic link to the file in / usr /s har e / z one in fo. The
advan tage of using a separate copy is that everything still works if / u sr isn 't mounted.
The TZ environment variable, if set, overrides the default time zone:
$ date Date and time in default time zone
Wed Nov 19 06 :4 4 : 50 EST 2 003
$ export TZ=PST8PDT Change time zone to US West Coast
$ date Print date and time
Wed Nov 19 03 : 45:09 PST 2 003
This function 's widespread existence makes portable use of the POSIX timezon e
variable diffi cult. Fortunately, we don ' t see a huge need for it: strftime () should be
sufficient for all but the most unusual needs.
6.2 Sorting and Searching Functions 181
6.2 Sorting and Searching Functions

Sorting and searching are two fundamental operations, the need for which arises
continually in many applications. The C library provides a number of standard interfaces
for performing these tasks.
All the rourines share a common theme; data are managed through vo i d * pointers,
and user-provided functions supply ordering. Note also that these APIs app ly to in-
memory data. Sorting and searching structures in files is considerably more involved
and beyond the scope of an introductory text such as this one. (However, the so rt
command works well for text files; see the sort(l) manpage. Sorting binary files requires
that a special-purpose program be written.)
Because no one algorithm works well for all applications, there are several different
sets of library routines for maintaining searchable collections of data. This chapter
covers only one simple interface for searching. Another, more advanced, interface is
described in Section 14.4, "Advanced Searching with Binary Trees," page 551. Further-
more, we purposely don't explain the underlying algorithms, since this is a book on
APIs, not algorithms and data structures. What's important ro understand is that you
can treat the APIs as "black boxes" that do a particular job, without needing to under-
stand the details of how they do the job.
6.2.1 Sorting: qsort ( )

Sorting is accomplished with q s o r t ( ) :
#include <stdlib . h> ISOC
void qsort(void *base, size_t nmemb, size_t size,

int ( *compare ) ( const void *, const void *)) ;
The name q s o r t () comes from CA.R. Hoare's Quicksort algorithm, which was
used in the initial Unix implementation. (Nothing in the POSIX standard dictates the
use of this algorithm for q s o r t ( ) . The GLIBC implementation uses a highly optimized
combination of Quicksort and Insertion Sort.)
qs o rt () sorts arrays of arbitrary objects . It works by shuffling opaque chunks of
memory from one spot within the array to another and relies on you, the programmer,
to provide a comparison function that allows it to determine the ordering of one array
element relative to another. The arguments are as follows:
vo id *base
The address of the beginning of the array.
s ize t runernb
The total number of elements in the array.
size_t size
The size of each element in the array. The best way to obtain this value is with
the C s izeo f operator.
int ( *compar e) (cons t vo i d * c a nst vo i d * )
A possibly scary declaration for a function pointer. I t says that « compa re points to
a function that takes two ' c onst v o id *' parameters, and returns an in t ."
Most of the work is in writing a proper comparison function. The rerum value should
mimic that of str cmp ( ) : less than zero if the first value is "less than" the second, zero
if they are equal, and greater than zero if the first value is "greater than" the second. It
is the comparison function that defines the meaning of "less than" and "greater than"
for whatever it is you're sorting. For example, to compare two double values, we could
use this function:
int dcomp(canst void *d1p, canst void *d2p )
const double *d1, *d2;
d1 (canst double *) d1p; Cast pointers to right type

d2 (canst double * ) d2p;
if ( *d1 < *d2 ) Compare and return right value

return -1;
else if (*d1 > *d2 )
return 1 ;
else if (*d1 == *d2 )
return 0
else
return -1; / * NaN sorts before real numbers * /
This shows the general boilerplate for a comparison function: convert the arguments
from v o i d * to pointers to the type being compared and then return a compari-
son value.
For Boating-point values, a simple subtraction such as 'return *dl - *d2' doesn't
work, particularly if one value is very small or if one or both values are special
6.2 Sorring and Searching Functio ns 183
"no t a number" or "infini ty" values. Thus, we have to do the comparison manually,
including taking into account the not- a- number value (which does n ' t even co mp are
equal to itselfl) .
6.2.1.1 Example: Sorting Employees

For more complicated structures , a m ore involved fun ction is necessary. For example,
co nsider the following (rather trivial) struct empl oyee :
st ruct employee {
cha r lastname [30];
char firstnam e [30];
long emp_id;
time_t start_date ;
};
We might write a function to sort employees by last name, first name, and ID number:
int emp_name_id_compar e(const void *e1p , const void *e2p )
const struct employee *e1 , *e2;

int last , fir st ;
e1 (const struct employe e * ) e1p ; Co nvert pointers

e2 (const struct employee * ) e2p;
if ((last = strcmp(e1->l astname, e2->lastname)} != O} Compare last names

return la s t ; Last names differ
/ * same las t name, check firs t name * /

if (( first = s trcmp (e 1-> firs tname, e2->firstname )} != 0) Compare first names
return first; First names differ
/ * same fir st name , chec k I D numbers * /

if (e1->emp _ id < e2->emp_id) Compare employ ee 10
return -1;
else if (e1 ->emp_id == e2->emp_id)
return 0 ;
els e
return 1;
The logic here is straightforward, initially comparing on last names, then first nam es,
and then using the employee ID number if the two names are the same. By using
st r cmp () on strings, we automatically get the right kind of negativelzero/ positive
value to return.
The employee ID comparison can't just use subtraction: suppose long is 64 bits and
int is 32 bits, and the two val ues differ only in the upper 32 bits (say the lower 32 bits
184 Chapter 6 • General Library Interfaces - Parr I
are zero) . In such a case, the subtraction result would automatically be cast to in t,
throwing away the upper 32 bits and returning an incorrect value.
~j~ NOTE We could have stopped with the comparison on first names , in which
*~ case all employees with the same last and first names would be grouped , but
Iw;t~~: :n~i;:hi':i:~~:!n"
m:
qsort (I does not guarantee a stab(e sort. A stabl,
sort is one in whi ch, if two elements compare equal based on some keyvalue(s),
1 they will maintain their original ordering, relative to each other, in the final
sorted array. For example, co nsider three employees wi th the same first and
last names, with employee numbers 17,4 2, and 81. Their order in the original
fu array might have been 42, 81, and 17 . (Meaning, employee 42 is at a lower
I
M index than employee 8 1, who , in turn , is at a lower index than employee 17.)
I After sorting, the order might be 81, 42, and 17 . If thi s is an issue, then the
co mparison routine must ta ke all important key values into consideration .
" (Ours does. )
Simply by using a different function , we can sort employees by seniori ty:

in t emp_s enior ity_compa re ( const void *e1p , const voi d *e2p )
cons t struct emp loyee *el, *e2;

double di ff ;
e1 ( cons t struct emp loyee *) e1p; Cast pointers to correct type

e2 ( cons t struct employee <) e2p;
diff = difftime(e l- >s tart_date, e 2->start_date); Compare times

if (diff < 0)
return -1;
else if (diff > 0)
return 1;
else
return 0;
For maximum portability we have used difftime (), which returns the difference
in seconds between two t i me_ t values. For this specific case, a cast such as-
return ( int ) difftime(e1->start_date , e 2->start_dat e);
6.2 Sorting and Searching Funcrions 185
- should do the trick, since ti me_t values are within reasonable ranges. Nevertheless,
we instead use a full three-way if statement, just to be safe.
Here is a sample data file, listing five U.S. presidents:
$ cat presdata . txt
Bush George 43 9800 1 36 00 Last name, first name, president number, inauguration
Cl i n ton William 4 2 727552800
Bus h George 41 60 1 322400
Re agan Ronald 40 34 88616 00
Carter James 3 9 2226 31200
c h 06- sor temp. c shows a simple program that reads this file into a struc t
empl oyee array and then SOrts it, using the two different comparison functions
just presented.
1 / * ch06 -sortemp . c Demonstr a t e qsor t () wi th t wo compar ison fu nc t i o ns. 'k /
2
3 #inc lude <std io . h>
4 #i nc lude <scd lib . h>
5 #inc lude <t ime . h >
6
7 st ruc t employ e e (
8 char lastname [30] ;
9 char firstn ame [30];
10 long emp_id ;
11 time_t start _ date;
12 };
13
14 / * e mp_name_id_compare --- compa re by name, t hen by ID * /
15
16 int emp_name_ id_ c ompare(const v oid *e1p, const v oid *e2 p )
17
... as shown previously, omitted to save space.
39
40
41 / * e mp_seni o r ity_compare - -- c ompare by seni ority */
42
43 int emp_senio ri t y_compare ( co ns t void *e 1p , const void *e 2 p )
44
... as shown previously, omitted to save space ...
58 }
59
60 / * ma in --- demon s trate so r t i ng * /
61
186 Chapter 6 • General Library Interfaces - Pan 1
62 int mai n (void)

63
64 #define NPRES 10
65 struct employee presidents [NPRES] ;
66 int i, npr es;
67 cha r buf [BUFSIZ] ;
68
69 / * Very simple code to read data: * /
70 for (npr es = 0 ; npres < NPRES && fgets (buf, BUFSIZ, stdin ) != NULL;
71 npres++ ) {
72 sscanf(buf, "%s %s %ld %ld\n",
73 presidents [npres] . lastname,
74 presidents [npres] .firstname,
75 & pre sidents [npres ] .emp_id ,
76 & presidents[npres] . start_date ) ;
77
78
79 /* npres is now number of actual lines read . * /
80
81 / * First, sort by name * /
82 qsort(presidents, npres, sizeo f ( struct employee ), emp_name_id_compar e );
83
84 /* Pri nt output * /
85 printf ( "Sorted by name : \n");
86 for (i = 0; i < npres; i++)
87 printf ( " \t% s %s \t%d\t%s" ,
88 presidents[i] . la stname,
89 president s[i ] . firstname,
90 presidents[i] . emp_id ,
91 ctime(& preside nts[i] . start_dat e));
92
93 / * Now, sort by senior ity * /
94 qsor t (pre sidencs, npre s, si ze o f ( struct employee), emp_seni ority_compare ) ;
95
96 /* And print again * /
97 printf( " Sorted by seniority : \n");
98 for (i = 0 ; i < npres; i++ )
99 prin tf("\t% s %s\t%d\t%s",
100 presidents[i] . lastname,
101 presidents[i] .f irst name,
102 presidents[i] .emp_id,
103 ctime(& presidents[i] . st art_ date) ) ;
104
Lines 70-77 read in the data. Note that any use of scanf () requires "well behaved"
input data. If, for example, any name is more than 29 characters, there's a problem. In
this case, we're safe, but production code must be considerably more careful.
Line 82 sorts the data by name and employee ID , and then lines 84-91 print the
sorted data. Similarly, line 94 re-sorts the data, this time by seniority, with lines 97-103
printing the results. When compiled and run, the program produces the following results:
6 .2 Sorting and Searching Funcrions 187
$ ch06-sorternp < presdata.txt

So rted by name :
Bush Geor ge 41 Fri Jan 20 13 : 00 : 00 1 989
Bush Geo rge 43 Sat Jan 20 13 : 00 : 00 2001
Carter Jame s 39 Thu J an 20 13 : 0 0 : 00 1977
Clint on Will iam 42 Wed Jan 20 13 : 00 : 00 1993
Reagan Ro nald 40 Tue Jan 20 13 : 00 : 00 1981
So rted by senio ri t y :
Car t e r J a mes 39 Thu Jan 20 13 : 00 : 00 1977
Reagan Ro nald 40 Tue Jan 20 13 : 00 : 00 1 981
Bush Geo rge 41 Fri Jan 20 13 : 00 : 00 19 8 9
Clint o n William 42 Wed Jan 20 1 3 : 00 : 00 199 3
Bus h Geo r g e 43 Sat Jan 20 13 : 00 : 00 2 001
(W e've used 1:00 p .m. as an app roximatio n fo r the time wh en all of the p residents
started wo rking. 2)
One poi nt is wo rth mentioning: q s ort () rearranges the d ata in the array. If each
array element is a large structure, a fot of data wi ll be copied back and forth as the array
is sorted. It may pay, instead , to set up a separate array o/pointers, each of which points
at one element of the array. T hen use qsort () to sort the po inter array, accessing the
unsorted data through the sorted po inters.
T he price pai d is the extra m emory to ho ld the pointers and m odi ficat io n of the
com parison fun ction to use an ex tra pointer indirection when co mparing the structures.
The benefit returned can be a considerable speedup , since only a four- or eight-byte
pointer is moved around at each step, instead of a large structure. (Our s truct
employee is at least 68 bytes in size. Swapping fo ur-byte pointers m oves 17 times less
data than does swap ping structures .) For thousands of in-memory structures, the differ-
ence can be significant.
Iill NOTE If you ' re a C++ progra mmer, beware! qso r t () may be dan gero us to
use wi th a rrays of o bjects ! q sor t () do es raw memo ry moves, copyi ng bytes.
illJ1 It's completely un aware of C++ constructs such as copy co nstructors or
i
H op e rato r= () functio ns. Instead , use one of the STL sorting fun ctions, or use
the separat e-array-of-pointers technique .
2 T he outpur shown here is fo r U.S. Eastern Srand ard Ti me. Yo u wi ll ger diffe rent res ults fo r the same program
and dara if yo u use a different rim e zo ne.
188 Chapter 6 • General Library Inrerfaces - Part 1
6.2.1 .2 Example: Sorting Directory Contents

In Section 5.3, "Reading Directories," page 132, we demonstrated that directory
entries are returned in physical directory order. Most of the time, it's much more useful
to have directory contents sorted in some fashion, such as by name or by modification
time. While not standardized by POSIX, several routines make it easy to do this, using
qs o r t () as the underlying sorting agent:
# incl ude <dir ent . h> Common
in t scandi r ( c o n st c har *dir, st ruct diren t ** *namelis t ,

int ( *s elect ) (c a ns t st ruc t dirent * ) ,
int ( * compare ) (c a ns t struct d iren t **, c a ns t s truct d irent ** )) ;
i n t alpha s o rt (const v o id * a, const v o i d *b) ;
in t v e rsion sor t ( co nst v o i d *a , cans t v o i d *b) ; CUBC
The s candir () and alphasort () functions were made available in 4 .2 BSD and
are widely supported. 3 vers i onsor t ( ) is a GNU extension .
scandir () reads the directory named by dir, creates an array of struct di rent
pointers by using mall oc ( ) , and sets *namel i s t to point to the beginning of that array.
Both the array of pointers and the pointed-to s truct dirent structures are allocated
with mall oc ( ) ; it is up to the calling code to use f ree () to avoid memory leaks.
Use the s e lect function pointer to choose entries of interest. When this value is
NULL, all valid directory en tries are included in the final array. Otherwise, (* s e 1 ec t) ()
is called for each entry, and those entries for which it returns nonzero (true) are included
in the array.
The compare function pointer compares two directory entries. It is passed to qs ort ( )
for use in sorting.
al pha s or t ( ) compares filenames lexicographically. It uses the s t rco ll () function
for comparison. s trc oll () is similar to s trcmp () but takes locale-related sorting rules
into consideration (see Section 13.4, "Can You Spell That for Me, Please?", page 521).
versi ons ort () is a GNU extension, that uses the GNU st r v ers cmp () function
to compare filenames (see strverscmp(3)). To make a long story short, this function
understands common filename versioning conventions and compares appropriately.
3 One notable exception is Sun's Solaris, wh ere these [\'10 fun cti ons exist only in th e hard-to-use BSD compatibil-
ity libraty.
6 .2 Sorting and Searching Funcr io ns 189
ch0 6 - sortdir. c shows a program similar to ch 04-c at d ir . c . However, It uses

scandir () and alpha sort () to do the wo rk.
7 / * ch06-sortdir .c --- Demonstrat e scand ir () , alphasort() . * /
2
3 #inc lude <st dio . h> / * for printf () etc . * /
4 #include <errno . h> / * for errno */
5 #include <sys/types. h> / * for system types ' /
6 #include <diren t . h> / * for directo ry function s * /
1
8 char *myname ;
9 in t process (const char *dir);
70
77 / * ma in --- loop over directo r y argument s * /
72
73 int main (int argc , char **argv )
74
75 int i ;
16 int errs = 0;
71
78 myname = argv[O];
79
20 if (ar gc == 1 )
27 errs proces s ( " . " ) ; / * default to curren t direc tory */
22 else
23 for (i 1; i < argc; i++ )
24 err s += process (argv [i] ) ;
25
26 return (err s ! = 0) ;
21
28
29 / * nodots --- igno r e dot file s , for use by scandir () */
30
37 int
32 nodot s (const struct dirent *dp)
33
34 ret urn (dp->d_name[ OJ != ' . ' );
35
36
31 /*
38 * proc ess -- - do something with the dire ct ory , in this cas e ,
39 print inode/ name pai rs on standard o utput .
40 * Return 0 if all OK, 1 o therwise.
47 */
42
190 Chapter 6 • G eneral Li brary Interfaces - Parr 1
43 int
44 process(const char *dir )
45 {
46 DIR *dp;
47 struct dirent **entries ;
48 int nents, i;
49
50 nent s = scandir (d ir , & entries, nodots, alphasort ) ;
51 if (nents < 0) {
52 fprintf(stderr, "%s : scandir fail ed : %s\ n", myname,
53 strerror (errno)) ;
54 return 1;
55
56
57 for ( i = 0 ; i < nents; i++) {
58 printf ("%8ld %s\n", entr ies [il->d_ino, entr ies [il->d_name);
59 free(entries[il) ;
60
61
62 free( entrie s);
63
64 return 0 ;
65
The main () program (lines 1-27) follows the standard boilerplate we've used before.
The n odo ts () fun ction (lines 3 1-35) acts as the s e lect parameter, choosing only
filen am es that don' t begin wi th a period.
The p r ocess () function (lines 4 3-65) is quite simple, with s candi r () doing most
of the wo rk. Note how each elem ent is released separately with f re e () (line 59) and
how the entire array is also released (line 62).
When run, the directory contents do indeed come out in sorted order, without ' . '
and' . . ':
$ ch06-sortdir Default actions displays current directory
2097176 OO - preface.tex i
2097187 01-intro .texi
2097330 02 -cmdline . texi
2097339 03- mem ory.texi
2097183 03-memory . tex i . save
2097335 04-fileio.texi
2097334 05- fileinfo . texi
20973,2 06 -genera l 1 . texi
6 .2 Sorting and Searching Fun c(ions 191
6.2.2 Binary Searching: bsearch ( )

A linear search is pretty much what it sounds like: You start at the beginning, and
walk through an array being searched until you find what you need. For something
simple like finding integers, this usually takes the form of a f or loop. Consider
this function:
/ * ifind -- - li n ea r search, re t urn index if found o r -1 if not */
i nt ifi n d( int x, c onst int ar ray (] , siz e_t ne lemsl
for (i = 0 ; i < nel ems ; i++ 1

if (ar ray(i] == xl / * found it */
retu r n i ;
re turn - 1;
The advantage to linear searching is that it's simple; it's easy to write the code cor-
rectly the first time. Furthermore, it always works. Even if elements are added to the
end of the array or removed from the array, there's no need to sort the array.
The disadvantage to linear searching is that it's slow. On average, for an array con-
taining nel ems elements, a linear search for a random element does 'nelems / 2'
comparisons before finding the desired element. This becomes prohibitively expensive,
even on modern high-performance systems, as nel ems becomes large. Thus, you should
only use linear searching on small arrays.
Unlike a linear search, binary searching requires that the input array already be
sorted. The disadvantage here is that if elements are added, the array must be re-sorted
before it can be searched. (When elements are removed, the rest of the array contents
must still be shuffled down. This is not as expensive as re-sorting, bur it can still involve
a lot of data motion.)
The advantage to binary searching, and it's a significant one, is that binary searching
is blindingly fast, requiring at most log2(N) comparisons, where N is the number of
elements in the array. The bse arc h () functio n is declared as follows:
#inc l ude <stdlib . h > ISOC
void *bsearch(cons t v oid *key, const void *base , size_t nmemb,

siz e_t size, int ( *compare) (const v o id * const void * )) ;
192 Chapter 6 • General Library In terfaces - Part 1
The parameters and their purposes are similar to those of qso r t ( ) :
cons t vo id *key
The object being searched fo r in the array.
c ons t vo id *ba se
T he start of the array.
siz e t nmernb
The number of elements in the array.
s i ze t size
T he size of each element, ob tained with si z eo f.
in t ( *c ompare ) (const void * , c onst v o i d *)
The comparison function. It must work the same way as the q sor t ( ) compa rison
function, returning negativelzero/positive according to whether the first parameter
is less than/equal to /greater than the second one.
bs earch () returns NULL if the object is not found . Otherwise, it returns a pointer
to the found object. If m ore than one array element m atches key, it is unspecified which
one is returned. Thus, as with qso r t ( ) , make sure th at the compariso n fun ctio n ac-
co unts for all relevant p arts of the searched data sttucture.
ch0 6- sea r chemp . c shows bsear ch ( ) in practice, extending the s truct employee
example used previously.
1 / * c h06 -sea rchemp . c Demonstra te bsear ch () . */
2
3 #i ncl ude <stdio .h>
4 #incl ude <e rrno . h>
5 #include <stdlib.h>
6
7 struct employee (
8 char last name[30];
9 char fir s t name[30] ;
10 long emp_i d ;
11 time_ t start_dat e ;
12 );
13
14 /* emp_id_compare -- - c ompare by I D * /
15
6 .2 So rrin g and Searching Funcrions 193
76 in t e mp_ id_compare (c onst voi d *e1 p, c on s t void *e2 p)

77
78 co ns t st r u ct e mployee *e1, * e2 ;
79
20 e1 (co n st s t r u cc employee ~ ) e 1p;
27 e2 (c o n st s truct employee * ) e2p ;
22
23 if (e1-> emp_ id < e2- >emp_id)
24 return - 1;
25 el se i f ( e1 - >emp_id == e2->emp_ id)
26 ret urn 0 ;
27 el se
28 r etu r n 1;
29
30
37 / * print_employee - -- p r int an e mp loyee stru c ture * /
32
33 v o i d p r int_employee (cons t s t ruct e mp loye e *emp )
34
35 pr intf( " %s %s\t%d\t%s", emp - > las t name , e mp - >fir s tna me,
36 e mp ->emp_id , ct ime (& e mp-> star t_dat e)) ;
37
Lines 7-12 define the st r u ct empl o y e e ; it's the same as before. Lines 16- 29 serve
as the comparison function , for both q s o rt ( ) and bse arc h () .It compares on employee
ID number only. Lines 33-37 define prin t _ e mpl o yee ( ) , which is a convenience
function for printing the structure since this is do ne fro m multiple places .
39 / * ma i n - - - d e monstr a t e s o rting * /
40
47 i nt ma in ( int a r gc , c har * *a r gv )
42
43 #de f ine NPRES 10
44 struct empl oy e e pres idents[N PR ES] ;
45 int i , npr e s ;
46 char buf[BUF S IZ] ;
47 s t r uct empl o y e e * the-pres ;
48 s t r uct emp loye e key ;
49 int id ;
50 FILE *fp;
57
52 if (argc ! = 2 ) (
53 f prin tf ( s tderr , "usage : %s d a t afi l e \n ", argv [O]) ;
54 e xit ( l ) ;
55
56
57 if ((fp = fopen (a rgv[l] , " r " )) == NULL ) (
58 f printf( s tderr, "%s : %s : coul d not o pen : %s\n " , argv[ O] ,
59 argv [l] , stre rr o r ( er rno )) ;
60 exit( l ) ;
67
62
194 Chapter 6 • General Library Interfa ces - Part 1
63 / * Very simple code to read data: * /

64 for (npres = 0; npres < NPRES && fgets (buf, BUFS1Z, fp ) != NULL;
65 npres++ ) {
66 sscanf(buf, "%s %s %ld %ld",
67 presidents[npres].lastname,
68 presidents[npres ] .firstname,
69 & presidents[npres].emp_id,
70 & presidents[npres ] .start_date ) ;
71
72 fclose(fp ) ;
73
74 / * npres is now number of actual lines read. */
75
76 / * First, s ort by id * /
77 qs o rt (presid ents, npres, sizeof(struct employee ) , emp_i d_c ompare ) ;
78
79 / * Print output * /
80 printf ( "Sorted by 10:\n");
81 f o r ( i = 0; i < npres; i++)
82 putchar ( '\t' ) ;
83 print_employee(& presidents[i] ) ;
84
85
86 for ( ;; ) {
87 printf ( "Enter 10 numbe r : " ) ;
88 if (fgets(buf, BUFSIZ, stdin ) NULL )
89 break;
90
91 sscanf(buf, "%d\n", & id) ;
92 key . emp_id = id;
93 the-pres = (struct e mp loyee * ) bsearch (& key, presidents, npres,
94 sizeof ( struct employee ) , emp_i d_compare ) ;
95
96 if ( the-pres != NULL )
97 printf ( "Found : " ) ;
98 print_employee ( the-pres ) ;
99 e l se
100 printf ( "Employee with 10 %d not found! \ n" , id ) ;
101
102
103 putchar ( , \ n' ) ; / * Print a newline on EOF . * /
104
105 exit (O) ;
106 }
The main () function starts with argument checking (lines 52- 55). It then reads the
data from the named file (lines 57-72). Standard input cannot be used for the employee
data, since that is reserved for prompting the user for the employee ID to search for.
6. 3 User and Group Names 195
Lines 77-84 sort the data and then print them. The program then goes into a loop,
starting on line 86. It prompts for an employee ID number, exiting the loop upon
end-oE-file. To search the array, we use the s truet employee named key. It's enough
to set just its emp_id field to the entered 10 number; none of the other fields are used
in the comparison (line 92).
If an entry is found with the matching key, bs ea r e h () returns a pointer to it.
Otherwise it returns NUL L . The return is tested on line 96, and appropriate action is
then taken. Finally, line 102 prints a newline character so that the system prompt will
come our on a fresh line. Here's a transcript of what happens when the program is
compiled and run:
$ ch06-searchemp presdata.txt Run the program
So rted by ID :
Car ter James 39 Thu Jan 20 13 : 00 : 00 1977
Reagan Ronal d 40 Tu e Jan 20 13 : 00 : 00 1981
Bush George 41 Fri Jan 2 0 13 : 0 0 : 00 1989
Clinton Will iam 42 Wed Jan 20 13 : 00 : 00 1993
Bus h George 43 Sat Jan 20 13 : 00 : 00 2001
Enter I D number : 42 Enter a valid number
Found : Clinton William 42 Wed Jan 20 13 : 00 : 00 1993 It's found
Ent e r ID number : 29 Enter an invalid number
Employee wich ID 29 not found! It's not found
Ente r ID number : 40 Try another good one
Found : Reagan Ronald 40 Tue Jan 20 13 : 00 : 00 198 1 This one is found too
Ente r ID number : AD CTRL-D entered for EOF
$ Ready for next command
Additional, more advanced, APIs for searching data collections are described III
Section 14.4, "Advanced Searching with Binary Trees ," page 551.
6.3 User and Group Names

While the operating system works with user and group ID numbers for storage of
file ownership and for permission checking, humans prefer to work with user and
gro up names.
Early Unix systems kept the information that mapped names to 10 numbers in
simple text files, / ete / p asswd and j et e / group. These files still exist on modern sys-
tems, and their format is unchanged from that ofV7 Unix. However, they no longer
tell the complete story. Large installations with many networked hosts keep the infor-
mation in network databases: ways of storing the information on a small number of
servers that are then accessed over the network. 4 However, this usage is transparent to
most applications since access to the information is done thro ugh the same API as was
used for retrieving the information from the text files. It is for this reason that POSIX
standardizes only the APIs; the / etc / pas swd and / et c / group files need not exist, as
such, for a system to be POSIX compliant.
The APIs to the two databases are similar; most of our discussion focuses on the
user database.
6.3.1 User Database

The traditional /e tc / p a sswd format maintains one line per user. Each line has
seven fields, each of which is separated from the next by a colon character:
$ grep arnold /etc/passwd
arnold : x : 20 7 6 : 10 : Arnold D. Robb ins : / home /arnol d : / bin /bash
In o rder, the fields are as follows:

The user name
T his is what the user types to log in, what sh ows up for 'Is - 1' and in any other
co ntext that displays users.
The password field
On older systems, this is the user's encrypted password. On newer systems, this
field is likely to be an x (as shown), meaning that the passwo rd info rmation is
h eld in a different file . T h is separation is a security meas ure; if the encrypted
p assword isn' t available to nonprivileged users, it is much harder to "crack. "
The user ID number
T his should be unique; one number per user.
The group ID number
This is the user's initial group ID number. As is discussed later, on modern systems
processes have multiple groups associated with them.
4 Common nerwork databases include Sun Microsystems' Nerwork Info rmation Service (NIS) and NIS+, Kerberos
(Hesiod), MacOS X NetInfo (versions up to an d including 10.2), and LDAP, the Lighrweight Directory Access
Prorocol. BSD system s keep user information in on-disk databases an d generate the l etc / passwd an d
I et c I gr oup fil es automatically.
6. 3 User and Group Names 197
The user's real name

This is at least a first and last name. Some systems allow for comma-separated
fields, for office location, phone number, and so on, but this is not standardized.
The login directory
This directory becomes the home directory for users when they log in ($HOME-the
default for the cd command).
The login program
The program to run when the user logs in. This is usually a shell, but it need not
be. If this field is left empry, the default is I bi n / sh .
Access to the user database is thtough the routines declared in <pwd . h >:
#include <s y s /types . h> XSI
#incl ude <pwd . h>
struct pas swd *getpwent( void ) ;

void setpwent (void) ;
v oid e ndpwe n t(vo i d) ;
struct pas swd *getpwnam( c onst char *name ) ;

struct pas swd *getpwui d (uid_t uid) ;
The fields in the s tr uc t pa s swd used by the vario us API routines correspond di-
rectly to the fields in the password file:
struct pa sswd {
cha r *pw_name ; /* user name * /
char *pw....passwd ; /* user p as sword * /
uid- t pw_uid ; /* user id * /
gid_t pw_g id ; /* group i d */
cha r *pw_gecos ; /* real n ame */
c har *pw_dir; /* home direct ory * /
char *pw_shell; /* shell pr ogram * /
};
(The name pw_gecos is historical; when the early Unix systems were being developed,
this field held the corresponding information for the user's acco unt on the Bell Labs
Honeywell systems running the GECOS operating system.)
The purpose of each routine is described in the following list.
st ruct pa ss wd *getpwent (void)

Returns a pointer to an internal stat i c struct p asswd structure contalnlng
the "current" user's information. This routine reads through the entire password
database, one record at a time, returning a pointer to a structure for each user.
The same pointer is returned each time; that is, the internal struct passwd is
overwritten for each user's entry. When getpwent () reaches the end of the pass-
word database, it returns NULL . Thus, it lets you step through the entire database,
one user at a time. The order in which records are returned is undefined.
void setpwent(void )
Resets the internal state such that the next call to getpwent () returns the first
record in the password database.
v oid endpwent( void)
"Closes the database," so to speak, be it a simple file , network connection, or
something else.
struct passwd *getpwnam(const char *name)
Looks up the user with a pw_name member equal to name, returning a pointer to
a stati c struct passwd describing the user or NULL if the user is not found.
struct passwd *getpwuid(uid_t uid)
Similarly, looks up the user with the user ID number given by uid, returning a
pointer to a static struct passwd describing the user or NULL if the user is
not found.
ge tpwuid () is what's needed when you have a user 10 number (such as from a
st ruct stat) and you wish to print the corresponding user name. ge tpwnam () con-
verts a name to a user ID number, for example, if you wish to use chown ( ) or fchown ( )
on a file. In theory, both of these routines do a linear search through the password
database to find the desired information. This is true in practice when a password file
is used; however, behind-the-scenes databases (network or otherwise, as on BSO systems)
tend to use more efficient methods of storage, so these calls are possibly not as expensive
in such a case. 5
getpwent () is useful when you need to go through the entire password database.
For instance, you might wish to read it all into memory, sort it, and then search it
quickly with bsearc h ( ) . This is very useful for avoiding the multiple linear searches
inherent in looking things up one at a time with getpwuid () or getpwnam ( ) .
5 Unfortunately, if performance is an issue, there's no stand ard way to know how your library does things, and
indeed, the way it works can vary at runtim e' (See the nsswitch.conf(5) man page on a GNU/Linux system. ) On
the other hand, the point of the API is, aft er all, ro hide th e details.
6.3 User and Group Names 199
I NOTE The pointers returned by getpwent (), ge tpwnam (), and getpwuid ()
,@
y allpointtointernal static data. Thus,youshouldmakea copy of their
I'I
I; contents if you need to save the information.
Take a good look at the struct passwd definition. The members t hat
represent character strings are pointers; they too point at internal static data,
',~.5 and if you're going to co py the structure , make sure to copy the data each
1I: member points to as well.
,~
6.3.2 Group Database

The format of the je te / group group database is similar to that of /e tc / passwd,
but with fewer fields:
$ grep arnold fete/group
mail :x: 12 : mail,postfix,arnold
uucp : x : 14 : uucp,arnold
fl oppy : x : 19:arnold
devel : x : 42 : miriam ,arnol d
arnold : x : 2076 : arnold
Again, there is one line per group, with fields separated by colons. The fields are
as follows:
The group name

This is the name of the group, as shown in 'ls -1' or in any other context in
which a group name is needed.
The group password
This field is historical. It is no longer used.
The group ID number
As with the user ID , this should be unique to each group.
The user list
This is a comma-separated list of users who are members of the group.
In the previous example, we see that user arnold is a member of multiple groups.
This membership is reflected in practice in what is termed the group set. Besides the
main user ID and group ID number that processes have, the group set is a set of addi-
tional group ID numbers that each process carries around with it. The system checks
all of these group ID numbers against a file 's group ID number when performing
200 Chapter 6 • General Library Inrerfaces - Parr 1
permission checking. This subject is discussed in more detail in Chapter 11, "Permissions
and User and Group ID Numbers, " page 403.
The group database APIs are similar to those for the user database. The following
functions are declared in <grp. h > :
# include <sys/types . h> XSI
#include <grp . h>
struct group *getgrent(void) ;

void setgrent(void);
void endgrent(void);
struct group *getgrnam(const char *name ) ;

struct group *getgrgid(gid_ t gid ) ;
The struct gr oup corresponds ro the records in jetc/gr o up:

struct group (
char *gr_namei / * group name * /
char *gr-passwd ; /* group password * /
gid_t gr_gid; / * group i d * /
char **gr_mem; /* group members * /
) ;
The g r _me m field bears some explanation. While declared as a pointer to a pointer
(char * * ), it is best thought of as an array of strings (like argv). The last element in
the array is set to NULL. When no members are listed, the first element in the array
is NULL.
c h 06-gr oupi n f o . c demonstrates how to use the s truct gr oup and the gr_mem
field. The program accepts a single user name on the command line and prints all group
records in which that user name appears:
1 / * ch06-groupinfo.c --- Demonstrate getgrent () and struct group * /
2
4 #include <sys / types . h>
5 #include <grp . h>
6
7 extern void print_group (const struct group *gr ) ;
8
9 / * main - - - print group lines for user named in argv[l) */
10
6.3 User and Grou p Names 201
11 int
12 main(int argc, char **argv)
13
14 struct group *gr;
15 int i ;
16
17 if (a rgc I = 2) ( Check arguments
18 fprintf(stderr, "usage : %s user\n", argv [O]) ;
19 e xit ( l ) ;
20
21
22 while ((g r = getgrent ()) ! = NULL ) Get each group record
23 for ( i = 0 ; gr->gr_mem[i] ! = NULL; i++) Look at each member
24 if (strcmp(gr->gr_mem[ i ] , argv[l]) == 0) Iffound the user ,,'
25 print_group(gr) ; Print the record
26
27 endgrent() ;
28
29 e xi t (0) ;
30 }
The rna in () routine first does error checking (lines 17-20). The heart of the program
is a nested loop . The o uter loop (line 22) loops over all the group database records. The
inner loop (line 23) loops over the members of the gr_mem array. If one of the members
matches the name from the command line (line 24), then print_group () is called to
print the record (line 25).
32 / * p rint _group --- print a group record */
33
34 void
35 print_group(const struct g r oup *gr )
36 (
37 int i;
38
39 printf("%s : %s : %ld :", gr->gr_name, gr->gr-passwd, (long) g r->gr_gid ) ;
40
41 for (i = 0; gr - >gr_mem[i] != NULL ; i++) (
42 printf("%s", gr - >gr_mem[i]);
43 if (gr->gr_mem[i+l ] ! = NULL)
44 putc har ( , , , ) ;
45
46
47 putchar(' \n ' ) ;
48
The print_group () function (lines 34-48) is straightforward, with logic similar

to that of main () for printing the m ember list. Group list members are comma sepa-
rated; thus, the loop body has to check that the next element in the array is not NULL
before printing a comma. This code works correctly, even if there are no members in
202 Chapter 6 • G eneral Library Imerfaces - Pan 1
the group. However, for this program, we know there are members, or print_gr oup ()
wouldn't have been called! Here's what happens when the program is run:
$ ch 06-groupinfo arnol d
mail:x : 12:mail,postfix , arnold
uucp : x : 14:uucp,arnold
floppy : x: 19 : arnold
dev el : x:42 : miriam , arn old
a r nold :x: 2076 : arnold
6.4 Term in als: isa t ty ( )

The Linux/Unix standard input, standard output, standard error model discourages
the special treatment of input and output devices. Programs generally should not need
to know, or care, whether their output is a terminal , a file , a pipe, a physical device,
or whatever.
However, there are times when a program really does need to know what kind of a
file a file descriptor is associated with. The stat () family of calls often provides enough
information: regular file, directory, device, and so on . Sometimes though, even that is
not enough, and for interactive programs in particular, you may need to know if a file
descriptor represents a tty
A tty (short for Teletype, one of the early manufacturers of computer terminals) is
any device that represents a terminal, that is, something that a human would use to
interact with the computer. This may be either a hardware device, such as the keyboard
and monitor of a personal computer, an old-fashioned video display terminal connected
to a computer by a serial line or modem, or a software p seudoterminal, such as is used
for windowing systems and network logins.
The discrimination can be made with i sa t ty ( ) :
#i nclude <unistd . h> POSIX
i nt isatty ( int desc) ;
This function returns 1 if the file descriptor de se represents a terminal, 0 otherwise.

According to POSIX, i satty() may set errno to indicate an error; thus you should
set e rrn o to 0 before calling i sat ty () and then check its val ue if the return is o. (The
GNU/Linux isatty(3) manpage doesn't mention the use of er rno.) The POSIXstandard
also points out that just because isatty () returns 1 doesn't mean there's a human at
the other end of the file descriptor!
6.6 Summary 203
One place where isatty() comes into use is in modern versions of ls, in which
the default is to print filenames in columns if the standard output is a terminal and to
print them one per line if not.

1. Mastering Algorithms With C, by Kyle Loudon. O'Reilly & Associates, Se-
bastopol, California, USA, 1999. ISBN: 1-56592-453-3.
This book provides a practical, down-to-earth introduction to algorithms and
data structures using C, covering hash tables, trees, sorting, and searching,
among other things.
2. The Art ofComputer Programming Volume 3: Sorting and Searching, 2nd edition,
by Donald E. Knuth. Addison-Wesley, Reading Massachusetts, USA, 1998.
ISBN: 0-201-89685-0.
This book is usually cited as the final word on sorting and searching. Bear in
mind that it is considerably denser and harder to read than the Loudon book.
3. The GTK+ project 6 consists of several libraries that work together. GTK+ is
the underlying toolkit used by the GNU GNOME Project? At the base of the
library hierarchy is Glib, a library of fundamental types and data structures and
functions for working with them. Glib includes facilities for all the basic oper-
ations we've covered so far in this book, and many more, including linked lists
and hash tables. To see the online documentation, start at the GTK+ Docu-
mentation Project's web site,S click on the "Download" link, and proceed to
the online version.
6.6 Summary
• Times are stored internally as time_t values, representing "seconds since the
Epoch." The Epoch is Midnight, January 1,1970 UTC for GNU/Linux and Unix
6 http : // www . gtk . org

7 http : // www . gnome.org
8 http : // www . gtk . org/rdp
systems. The current time is retrieved from the system by the time () system call,
and difftime ( ) returns the difference, in seconds, between two time_t values.
• The struc t tm structure represents a "broken-down time," which is a much
more usable representation of a date and time. gmtime ( ) and loc al time ( )
convert time_t values into struct tm values, and mktime () goes in the oppo-
site direction.
• asc t i me () and c t ime () do simplistic formatting of time values, returning a
pointer to a fixed-size, fixed-format s t a t i c character string. strft ime () provides
much more flexible formatting, including locale-based values.
• Time-zone information is made available by a call to t zs et ( ) . Since the standard
routines act as if they call tzset ( ) automatically, it is rare to need to call this
function directly.
• The standard routine for sorting arrays is q sort ( ). By using a user-provided
comparison function and being told the number of array elements and their size,
q so r t () can sort any kind of data. This provides considerable flexibility.
• sc andir ( ) reads an entire directory into an array of st ruc t d irent . User-pro-
vided functions can be used to select which entries to include and can provide
ordering of elements within the array. alphaso r t () is a standard function for
sorting directory entries by name; scandi r () passes the sorting function straight
through to q so r t ( ) .
• The bsear c h () function works similarly to qso r t ( ) . It does fast binary searching.
Use it if the cost of linear searching outweighs the cost of sorting your data. (An
additional API for searching data collections is described in Section 14.4, "Ad-
vanced Searching with Binary Trees, " page 55l.)
• The user and group databases may be kept in local disk files or may be made
available over a network. The standard API purposely hides this distinction. Each
database provides both linear scanning of the entire database and direct queries
for a user/group name or user/group ID.
• Finally, for those times when s t a t () just isn't enough, i s a t t y () can tell you
whether or not an open file represents a terminal device.
6. 7 Exercises 205
Exercises
l. Write a simple version of the da te command that accepts a format string on

the command line and uses it to format and print the current time.
2. When a file is more than six months old, 'ls -1' uses a simpler format for
printing the modification time. The file GNU version of ls . c uses this
computatlon:
3043 / * Consider a time to be recent if it is within the past six
3044 months . A Gregorian year has 365 . 2425 * 24 * 60 * 60
3045 31556952 seconds on the average . Write this value as an
3046 integer constant to avoid floating point hassles . * /
3047 six_months_ago = current_time - 31556952 / 2;
Compare this to our example computation for computing the time six months
in the past. What are the advantages and disadvantages of each method?
3. Write a simple version of the touch command that changes the modification
time of the files named on the command line to the current time.
4. Add an option to your touch command that accepts a date and time specifica-
tion on the command line and uses that value as the new modification time of
the files named on the command line.
5. Add another option to your version of t ouch that takes a filename and uses
the modification time of the given file as the new modification time for the
files named on the command line.
6. Enhance ch06 - sortemp . c to sort a separate array of pointers that point into
the array of employees.
7. Add options to ch06-sortdir. c to sort by inode number, modification time,
access time, and size. Add a "reverse option" such that time-based sorts make
the most recent file first and other criteria (size, inode) sort by largest value first.
8. Write a simple version of the chown command. Its usage should be
chown user[ : group] files ...
Here, user and group are user and group names representing the new user
and group for the named files. The group is optional; if present it is separated
from the user by a colon.
To test your version on a GNU/Linux system, you will have to work as r oot .
Do so carefully!
9. Enhance your chown to allow numeric user or group numbers, as well as names.
10. Write functions to copy user and group structures, including pointed-to data.
Use ma 1 1 0c () to allocate storage as needed.
11. Write a specialized user-lookup library that reads the entire user database into
a dynamically allocated array. Provide fast lookup of users, by both user ID
number and name. Be sure to handle the case in which a requested user
isn't found.
12. Do the same thing for the group database.
13. Write a s t at program that prints the contents of the struct stat for each
file named on the command line. It should print all the values in human-
readable format: time_ t values as dates and times, uid_t and g i d _ t values
as the corresponding names (if available) , and the contents of symbolic links.
Print the st_mode field the same way that 1s would.
Compare your program to the GNU Coreutils st a t program, both by com-
paring outputs and by looking at the source code.
In this chapter
• 7.1 V71s0ptions page 208

• 7.2 V7 Is Code page 209
207
T he V7 1 s command nicely ties together everything we've seen so far. It uses
almost all of the APIs we've covered, touching on many aspects of Unix pro-
gramming: memory allocation, file metadata, dates and times, user names, directory
reading, and sorting.
7.1 V7 18 Options
In comparison to modern versions of 1 s , the V7 1 s accepted only a handful of options
and the meaning of some of them is different for V7 than for current ls. The options
are as follows:
-a Print all directory entries. Without this, don' t print' .' and ' .. '. Interestingly
enough, V7 ls ignores only'. ' and' .. ', while VI through V6 ignore any file
whose name begins with a period. This latter behavior is the default in modern
versions of 1 s, as well.
-c Use the in ode change time, instead of the modification time, with -t or - l.
-d For directory arguments, print information about the directory itself, not its
contents.
- f "Force" each argument to be read as a directory, and print the name found in
each slot. This options disables - 1, - r, - s and - t, and enables -a. (This option
apparently existed for filesystem debugging and repair.)
- g For 'ls - 1', use the group name instead of the user name.
-i Print the inode number in the first column along with the filename or the long
listing.
-1 Provide the familiar long format output. Note, however, that V7 'l s -1' printed
only the user name, not the user and group names together.
-r Reverse the sort order, be it alphabetic for filenames or by time.
-s Print the size of the file in 5I2-byte blocks. The V7 is (1 ) manpage states that
indirect blocks-blocks used by the filesystem for locating the data blocks oflarge
files-are also included in the computation, but, as we shall see, this statement
was Incorrect.
-t Sort the output by modification time, most recent first , instead of by name.
-u Use the access time instead of the modification time with -t and/or -1.
208
7. 2 V7 ls Code 209
The biggest differe nces between V7 Is and modern Is concern the -a option and
the -1 option. Modern systems omit all dot files unless - a is given, and they include
both user and group names in the -1 long listing. On modern sys tems, - g is taken to
mean print only the group name, and - 0 means print only the user name. For what
it's worth, GNU Is has over 50 options!
7.2 V7 1 8 Code
The file /usr / src / cmd/l s . c in the V7 dis tribution contains the code. It is all of
425 lines long.
7 / *
2 * list f i l e or direct ory
3 */
4
5 #include <sys/pa r am . h>
6 #include <sys/st at . h>
7 #include <s ys /d ir.h>
8 #inc lude <stdio . h>
9
70 #def ine NFILES 1 024
77 FILE *pwdf, *dirf ;
72 char stdbuf [BUFSIZ];
73
74 struct lbuf { Collects needed info
75 union {
76 cha r lname[15] ;
77 cha r *namepi
78 In ;
79 char ltype;
20 short lnum ;
27 s hort lfl ags;
22 s hort lnl;
23 s hort luid ;
24 s hort 19id;
25 lo ng lsi ze;
26 long lmtime;
27 };
28
29 int aflg, dflg, lflg, sflg , tflg , uflg, iflg, fflg, gflg, cflg ;
30 int rflg 1;
37 long year; Global variables: auto init to 0
32 int flags;
33 int lastuid = - 1;
34 char tbuf [16] ;
35 long tblocks;
36 int statreq ;
37 struct lbuf *flist [NFILES ] ;
38 struct lbu f **lastp = flist ;
210 Chapter 7 • Purring It All Toge(her: I s
39 struct lbuf **firstp flist;

40 char *dotp = " ".
41
42 char *makenarne ( ) ; char *makename(char *dir, char *file);
43 struct lbuf *gstat(); struct Ibuf *gstat(char *file, int argjl);
44 char *ctime ( ) ; char *ctime(time_t *t);
45 long nblock () ; long nblock(long size);
46
47 #define ISARG 0100000
The program starts with file inclusions (lines 5-8) and variable declarations. The
s truct l buf (lines 14-27) encapsulates the parts of the st r uct s tat that are of in-
terest to l s . We see later how this structure is filled.
The variables a flg, d flg , and so on (lines 29 and 30) all indicate the presence of
the corresponding option. This variable naming style is typical ofV7 code. The f lis t ,
1as t p , and f i r stp variables (lines 37-39) represent the files that ls reports information
about. Note that fl i st is a fixed-size array, allowing no more than 1024 files to be
processed. We see shortly how all these variables are used.
Mter the variable declarations come function declarations (lines 42-45) , and then
the definition of ISARG, which distinguishes a file named on the command line from
a file found when a directory is read.
49 main ( argc, argy l int main(int argc, char **argv)
50 char *argv[];
51 {
52 int i;
53 register struct lbuf *ep, **ep1; Variable and function declarations
54 register struct lbuf **slastp;
55 struct lbuf **epp;
56 s truct lbuf Ib;
57 char *t;
58 int compar ( ) ;
59
60 setbuf ( stdout, stdbuf ) ;
61 time (&lb . lmtime); Get current time
62 year = lb.lmtime - 6L*30L*24L*60L*60L; / * 6 months ago */
The ma in () function starts by declaring variables and functions (lines 52-58), setting
the buffer for standard output, retrieving the time of day (lines 60-61), and computing
the seconds-since-the-Epoch value for approximately six months ago (line 62). Note
that all the constants have the L suffix, indicating the use of l ong arithmetic.
63 i f ( --argc > 0 && *argv[l] == ' - ' ) {
64 argv++i
65 while (*++*argv) switch (**argv) Parse options
66
7.2 V7 Is Code 211
67 cas e ' a I : All directory entries

68 a flg++;
69 cont inue i
70
71 case Is I :
Size in blocks
72 sflg++;
73 sta treq++;
74 con tinue;
75
76 case 'd ' : Directory info, not contents
77 d£1g ++ ;
78 continue;
79
80 case 'g' : Group name instead of user name
81 g£1g++ ;
82 continue;
83
84 case ' l' : Long listing
85 lflg++ ;
86 statreq++;
87 continue;
88
89 cas e I r I:
Reverse sort order
90 r£1g = - 1 ;
91 continue ;
92
93 case 't' : Sort by time, not name
94 t fl g++;
95 stat req++j
96 con tinu e;
97
98 case 'u' : A ccess time, not modification time
99 uflg ++;
100 con tinue;
101
102 case 'c' : Inode change time, not modification time
103 cflg+ +;
104 continue ;
105
106 ca se I i I :
Include inode number
107 i£1g++ ;
108 cont inu e;
109
110 case ' f' : Force reading each arg as directory
111 f£1g++ ;
112 continue ;
11 3
114 default: Ignore unknown option letters
115 continue ;
116
117 argc--;
118
212 Chapter 7 • Pu tting It All Together: Is
Lines 63-118 parse the command-line options . Note the manual parsing code:
getopt ( ) hadn't been invented yet. The s ta t r eq variable is set to true when an option
requires the use of the s t at () system call.
Avoiding an unnecessary stat () call on each file is a big performance win. The
s ta t () call was particularly expensive, because it could involve a disk seek
to the inode
location, a disk read to read the inode, and then a disk seek back to the location of the
directory contents (in order to continue reading directory entries).
Modern systems have the inodes in groups, spread out throughout a filesystem instead
of clustered together at the front . This makes a noticeable performance improvement.
Nevertheless , stat ( ) calls are still not free; you should use them as needed, but not
any more than that.
119 i f I fflg) { ·f overrides -I, Os, -t, adds -a
120 aflg ++;
121 lflg 0;
122 sflg = 0;
123 tflg = 0;
124 statreq 0;
125
126 ifllflg) ( Open password or group file
127 t = " /etc/pa sswd" ;
128 if (gflg)
129 t = "/etc/group" ;
130 pwd f = fo pen I t , " r") ;
131
132 if l argc ==O) { Use current dir if no args
133 argc++;
134 argv = &dotp - 1;
135
Lines 119-125 handle the -f option, turning off - 1, -s, -t, and stat r eq. Lines
126-131 handle - 1, setting the file to be read for user or group information. Remember
that the V7 Is shows only one or the other, not both.
If no arguments are left, lines 132-135 set up a r gv such that it points at a string
representing the current directory. The assignment ' argv = &dotp - l' is valid, al-
though unusual. The '- l ' compensates for the ' ++ a r gv' on line 137. This avoids
special case code for ' argc == l' in the main part of the program.
7.2 V7 Is Code 213
136 for (i=O; i < argc; i++) { Get info about each file
137 if ((ep = gstat(*++argv, l))==NULL)
138 cont i nue;
139 ep->ln . namep = *argv;
140 ep->lflags 1= ISARG;
14 1
142 qsort(firstp, lastp - firstp , sizeof *lastp, compar);
143 slastp = lastp ;
144 for (epp =firstp; epp<slastp; epp++) { Main code, see text
145 ep = *epp;
146 if (ep-> ltype== 'd' && dflg==O II fflg)
147 if (arg c>l)
148 printf (" \n%s : \n" , ep->ln . namep) ;
149 lastp = slastp;
150 readdir(ep->ln . namep);
151 if (fflg==O)
152 qsort(slastp,lastp - slastp,sizeof *lastp,compar);
153 if (lflg II sflg)
154 printf ( "total %D\n", tbl ocks) ;
155 for (epl=sla stp; epl<lastp; epl++)
156 pentry(*epl);
157 else
158 pentry(ep) ;
159
160 e xit (O) ;
161 End ofmai n()
Lines 136-141 loop over the arguments, gathering information about each one. The
second argument to gstat () is a boolean: true if the name is a command-line argument,
false otherwise. Line 140 adds the ISARG Hag to the Iflags field for each command-
line argumen t.
The gstat () function adds each new struct Ibuf into the global fl i st array
(line 137). It also updates the lastp global pointer to point into this array at the current
last element.
Lines 142-143 sort the array, using q s o r t (), and save the current value of lastp
in s las t p . Lines 144-159 loop over each element in the array, printing file or directory
info, as appropriate.
The code for directories deserves further explication:
if (ep->ltype= = 'd' && d fl g= =O II ffl g)

Line 146. If the file type is directory and if - d was not provided or if - f was, then
ls has to read the directory instead of printing information about the directory
itself.
214 Chapter 7 • Putting It All Together: 1 s
if (arge>l) printf( " \n%s : \n ", ep->1n . namep)

Lines 147-148. Print the directory name and a colon if multiple files were named
on the command line.
lastp = s1astp; readdir (ep ->ln.namep )
Lines 149-150. Reset 1astp from slastp. The £list array acts as a two-level
stack of filenames. The command-line arguments are kept in firstp through
s1astp - 1. When readdir () reads a directory, it puts the struet 1buf
structures for the directory contents onto the stack, starting at s1astp and going
through 1astp. This is illustrated in Figure 7. 1.
firstp slastp lastp
+ + f
+
struct struct struct struct struct : struct
Ibuf * Ibuf * Ibuf * Ibuf * Ibuf * : Ibuf * flist array
f
f+-- From command line ---l"~I"'''-- From readdir () --...j
FIGURE 7.1
The flist array as a two-level stack
if (ff1g==0) qsort(slastp , 1astp - s1astp,sizeof *1astp,eompar)

Lines 151-152. Sort the subdirectory entries if -f is not in effect.
if (1f1g I I sf1g) printf("tota1 %D\n ", tb1oeks)
Lines 153-154. Print the total number of blocks used by files in the directory, for
- 1 or -so This total is kept in the variable tb1oeks, which is reset for each direc-
tory. The %D format string for printf () is equivalent to %1d on modern systems;
it means "print a long integer." (V7 also had %1d, see line 192.)
for (epl=slastp; epl<1astp; epl++) pentry(*epl)
Lines 155-156. Print the information about each file in the subdirectory. Note
that the V7 1s descends only one level in a directory tree. It lacks the modern -R
"recursive" option.
7.2 V7 15 Code 215
163 pentry (ap) void pentry(struct Ibuf" ap)

164 struct lbuf *ap;
165
166 struct { char dminor, dmajor;}; Unused historical artifact from V6 Is
167 register t;
168 register struct lbuf *p ;
169 register char *cp ;
170
171 p = ap ;
172 if (p->lnum -1)
173 return;
174 if (iflg)
175 printf ("%5u p->lnum) ; Inode number
176 if (sflg)
177 printf("%4D ", nblock(p->lsize)); Size in blocks
The pen try ( ) routine prints information about a file. Lines 172-1 73 check whether
the lnurn field is -1 , and return if so. When 'p - >lnurn == -1' is true, the struct
lbuf is not valid. Otherwise, this field is the file 's in ode number.
Lines 174-1 75 print the inode number if -i is in effect. Lines 176-1 77 print the
total number of blocks if - s is in effect. (As we see below, this number may not be ac-
curate.)
178 if (lflg) ( Long listing.'
179 putchar(p->ltype) ; - File type
180 pmode(p->lflags) ; - Permissions
181 printf("%2d ", p->lnl); - Link count
182 t = p->luid;
183 i f (gflg)
184 t = p->lgid;
185 if (getname(t , tbuf)==O)
186 printf("%-6 . 6s", tbuf); - User or group
187 else
188 printf( " %-6d", t);
189 i f (p->l type== 'b' II p-> 1 type== 'c ' ) - Device: major and minor numbers
190 printf("%3d,%3d", major((int)p->lsize) , minor((int)p->lsize));
191 else
192 printf ( " %7ld", p->lsize); - Size in bytes
193 cp = ctime(&p->lmtime);
194 if (p->lmt ime < year ) - Modification time
195 printf(" %-7 . 7s %-4 . 4s cp+4, cp+20); else
196 printf( " %-12 . 12s ", cp+4);
197
198 if (p->l flags&ISARG) - Filename
199 printf("%s\n", p->ln . namep);
200 else
201 printf("% . 14s\n " , p->ln . lname);
202
216 Chapter 7 • Putting It All Together: Is
Lines 178-197 handle the - 1 option. Lines 179-181 print the file's type, permissions,
and number oflinks. Lines 182-184 set t to the user ID or the group ID , based on
the -g option. Lines 185-188 retrieve the corresponding name and print it if available.
Otherwise, the program prints the numeric value.
Lines 189-192 check whether the file is a block or character device. If i t is, they print
the major and minor device numbers, extracted with the maj o r () and mi n o r ( ) macros.
Otherwise, they print the file 's size.
Lines 193-196 print the time of interest. If it's older than six months, the code prints
the month, day, and year. Otherwise, it prints the month, day, and time (see Sec-
tion 6.1.3 .1, "Simple Time Formatting: a s c time () and ct i me () ," page 170, for the
format of c time ( ) 's result).
Finally, lines 198-201 print the filename. For a command-line argument, we know
it's a zero-terminated string, and %s can be used. For a file read from a directory, it may
not be zero-terminated, and thus an explicit precision, %. 1 4 s, must be used.
204 getname (uid, buf ) int getnam e(int uid, char bum)
205 int uid;
206 c h ar buf [] ;
207 {
208 int j, c, n, i;
209
210 if (uid== lastuid ) Simple caching, see text
211 return( 0) ;
2 12 if(pwdf == NULL ) Safety check
21 3 return ( -1) ;
214 rewind (pwdf ) ; Start at fron t offile
215 lastuid = -1;
2 16 do {
217 i = 0; Index in buf array
218 = 0;
j Counts fields in line
219 n = 0; Converts numeric value
220 while((c = fgetc(pwdf )) != ' \n' ) { Read lines
221 if ( c == EOF )
222 r etu rn (- l ) ;
223 if (c== ' : ' ) { Count fields
224 j + +;
225 C = ' 0 ';
226
227 if (j ==0) First field is name
228 b u f[i++] c;
229 if (j == 2 ) Third field is numeric 10
230 n = n * 10 + c - '0';
231
232 wh i le (n ! = uid ) ; Keep searching untillD found
7. 2 V7 l s Code 217
233 buf[i++] = '\0';

234 lastuid = uid;
235 return (0) ;
236
The getname () function converts a user or group ID number into the corresponding
name. It implements a simple caching scheme; if the passed-in u i d is the same as the
global variab le 1as tuid, then the function returns 0, for OK; the buffer will already
contain the name (lines 210-211). 1as tui d is initialized to - 1 (line 33), so this test
fails the first time g e tname () is called.
pwdf is already open on either /e t c/pa sswd or / etc/group (see lines 126-130).
The code here checks that the open succeeded and returns - 1 ifit didn 't (lines 212- 213).
Surprisingly, ls does not use getpwuid () or getgrgid ( ) . Instead, it takes advantage
of the facts that the format of / etc /pas swd and / etc / group is identical for the first
three fields (name, password, numeric ID) and that both use a colon as separator.
Lines 216-232 implement a linear search through the file. j counts the number of
colons seen so far: 0 for the name and 2 for the ID number. Thus, while scanning the
line, it fills in both the name and the ID number.
Lines 233-235 terminate the n a me buffer, set the global 1as tui d to the found 10
number, and return 0 for OK.
238 long long nblock(long size)
239 nblock(size)
240 long size ;
24 1
242 return ( (si ze+511 »>9) ;
243
The nb 10ck ( ) function reports how many disk blocks the file uses. This calculation
is based on the file 's size as returned by st at ( ) . The V7 block size was 512 bytes-the
size of a physical disk sector.
The calculation on line 242 looks a bit scary. The '»9 ' is a right-shift by nine bits.
This divides by 512, to give the number of blocks. (On early hardware, a right-shift
was much faster than division.) So far , so good. Now, a file of even one byte still takes
up a whole disk block. However, '1 / 512 ' comes out as zero (integer division trun-
cates), which is incorrect. This explains the' s i z e+ 511 ' . By adding 511 , the code ensures
that the sum produces the correct number of blocks when it is divided by 512.
218 Chapter 7 • Putting It All Together: ls
This calculation is only approximate, however. Very large files also have indirect
blocks. Despite the claim in the V7 Is(l) manpage, this calculation does not account
for indirect blocks.
Furthermore, consider the case of a file with large holes (created by seeking way past
the end of the file with lseek ( )) . Holes don' t occupy disk blocks; however, this is not
reflected in the size value. Thus, the calculation produced by nbl oc k ( ), while usually
correct, could produce results that are either smaller or larger than the real case.
For these reasons, the s t _blocks member was added into the struct stat at 4.2
BSD, and then picked up for System V and POSrx.
245 int m1 [] 1, S_IREAD»0 , I r' I - };
246 int m2 [] l, S_IWRITE»O, 'W', - };
241 int m3 [] 2, S_ISUID, 's' , S_IEXEC»O, 'X', };
248 int m4 [] 1, S_IREAD»3, I r t I - };
249 int mS [] 1, S_IWRITE»3, ' w', - };
2 50 int m6 [] 2, S_ISGID, s I I I S_IEXEC»3, 'x' f };
25 1 int m7 [J 1, S_IREAD»6 , 'r', - };
252 int mB [] 1, S_IWRITE»6, 'Wi, - };
253 int m9 [] 2, S_ISVTX, 't' , S_IEXEC»6, 'X', - };
254
255 int *m[ ] ml, m2, m3, m4, mS, m6, m7, mB, m9} ;
256
251 pmode (aflag ) void prnode(int aflag)
258 {
259 register int **mp;
260
261 f l ags = aflag;
262 for (mp = &m [O] ; mp < &m[sizeo f (m) I sizeof (m[O])]; )
263 select (*mp++) ;
264
265
266 select (pairp ) void se/ect(register int *pairp)
261 register int *pairp;
268
269 register int n;
210
211 n = *pairp++;
212 while (--n>= O && ( flags&*pairp++ ) ==O )
213 pairp++;
214 putchar ( *pairp ) ;
215
7.2 V7 ls Code 219
Lines 245-275 print the file's permissions. The code is compact and rather elegant;
it requires careful study.
• Lines 245-253: The arrays ml through m9 encode the permission bits to check
for along with the corresponding characters to print. There is one array per char-
acter to print in the file mode. The first element of each array is the number of
(permission, character) pairs encoded in that particular array. The final element
is the character to pnnt in [he event that none of the given permission bits
are found.
Note also how the permissions are specified as ' I_READ»O' , 'I_ READ»3' ,
'I_READ»6' , and so on. The individual constants for each bit (S_IRUSR, S_ IRGRP,
etc.) had not been invented yet. (See Table 4.5 in Section 4.6.1, "Specifying Initial
File Permissions," page 106.)
• Line 255: The m array points to each of the ml through m9 arrays .
• Lines 257-264: The pmode () function first sets the global variable flags to the
passed-in parameter aflag . It then loops through the m array, passing each element
to the select () function . The passed-in element represents one of the ml to
m9 arrays.
• Lines 266-275: The select () function understands the layout of each ml through
m9 array. n is the number of pairs in the array (the first element); line 271 sets it.
Lines 272-273 look for permission bits, checking the global variable flags set
previously on line 261.
Note the use of the ++ operator, both in the loop test and in the loop body. The effect
is to skip over pairs in the array as long as the permission bit in the first element of the
pair is not found in fla g s.
When the loop ends, either the permission bit has been found, in which case pa irp
points at the second element of the pair, which is the correct character to print, or it
has not been found , in which case pa i rp points at the default character. In either case,
line 274 prints the character that pai r p points to.
220 Chapter 7 • Pu tti ng It All T o~ther: 1 s
A final point worth noting is that in C, character constants (such as 'x') have type
int, not c h ar . l So there's no problem putting such constants into an integer array;
everything works correctly.
277 char * char *makename(char *dir, char *file)
278 makename(dir, file )
279 char *dir, *file;
280
281 static char dfile[lOO);
282 register char *dp, *fp;
283 register int i;
284
285 dp = dfile;
286 fp = dir;
287 while ( *fp )
288 *dp++ = *fP++i
289 *dp++ = ' / ' ;
290 fp = file;
291 for (i=O; i<DIRSIZ; i++ )
292 *dp++ = *fp++;
293 *dp = 0;
294 return (dfile) ;
295
Lines 2 7 7-295 define the makename () function. Its job is to concatenate a directory
name and a filename, separated by a slash character, and produce a string. It does this
in the stat i c buffer dfile . Note that dfile is only 100 characters long and that no
error checking is done.
The code itself is straightforward, copying characters one at a time. makename () IS
used by the readdir () function.

297 readdir(dir ) void readdir(char *dir)
298 char *dir;
299
300 static struct direct dentry;
301 register int j;
302 register struct lbuf *ep;
303
304 if ((dirf = fopen(dir, "r" )) == NULL )
305 printf("%s unreadable\n", dir);
306 return;
307
308 tblocks = 0;
This is different in C++ : There, character constants do have rype char. This difference does nO[ affect this par-
ticul ar code.
7.2 V7 ls Code 221
309 for ( ; ; ) {
310 i f ( fread ((c har * )&dentry, sizeo f (den try ) , 1, dirf) ! = 1)
311 b reak;
3 12 if (dent ry . d_ino==O
313 II aflg= =O && dentry . d_name [ Ol = = ' . ' && (dentry . d_name[ll = = ' \0'
314 II dentry . d_name [ l l ==' . ' && dentry . d_name [ 2 l ==' \ 0' ))
315 cont inue;
316 ep = g sta t (makename (dir , den t r y . d_name), 0) ;
3 17 i f (ep ==NULL )
3 18 co ntinue;
3 19 if (ep->lnum ! = - 1)
320 ep->lnum = dentry . d_ino ;
32 1 for (j = O; j<DIRS IZ ; j++)
322 ep->ln . lname[jl = dent ry . d _name [jl ;
323
324 fclose(d irf ) ;
325
Lines 297-325 define the readdir () function , whose job is to read the contents of
directories named on the command line.
Lines 304-307 open the directory for reading, returning if fopen () fails. Line 308
initializes the global variable tbl oc ks to o. This was used earlier (lines 153-154) to
print the total number of blocks used by files in a directory.
Lines 309-323 are a loop that reads directory entries and adds them to the flis t
array. Lines 310-311 read one entry, exiting the loop upon end-of-file.
Lines 312- 315 skip uninteres tin g entries. If the inode number is zero, this slot isn't
used. Otherwise, if -a was not given and the filename is either ' . ' or ' .. ', skip it.
Lines 316-318 call gstat () with the full name of the file , and a second argument
of false, indicating that it's not from the command line. gs ta t () updates the global
las tp pointer and the fli s t array. A NULL return value indicates some sort of failure.
Lines 319-322 save the inode number and name in the struct lbu f. If ep ->lnum
comes back from g s tat () set to -1, it means that the stat () o peration on the file
failed. Finally, line 324 closes the directory.
The following fun ction, gs t a t () (lines 327-398), is the core function for the oper-
ation of retrieving and storing file information.
222 Chapter 7 • Purting It All T ogether: ls
327 s t r uct 1buf * struct Ibuf "gstat(char "{lIe, int argfl)

328 gstat(fi1e, argf1)
329 char * fi1 e ;
330
331 e x tern char *ma11oc();
332 struct stat statb;
333 register struct 1buf *rep;
334 static int nomocore ;
335
336 if (nomocore) Ran out of memory earlier
337 return (NULL) ;
338 rep = (struct 1buf *)ma11oc(sizeof(str uct 1buf));
339 i f ( r ep= =NULL) (
340 fp r intf(stde rr, "ls : out o f memory\n " ) ;
341 nomocore = 1 ;
342 retur n (NULL) ;
343
344 if (lastp >= &f1ist[NFILES]) Check wheth er too many {lIes given
345 static int msg ;
346 1as t p -- ;
347 if (msg= =O) (
348 fprintf(stderr , " ls : t oo many fi1es\n " );
349 msg++ i
350
351
352 *lastp++ = rep ; Fill in information
353 rep->lf1ags = 0;
354 rep->lnum = 0;
355 rep->ltype = '-'; Default {lIe type
The static variable nomoc o r e [sic] indicates that malloc () failed upon an earlier
call. Since it's static, it's automatically initialized to 0 (that is, false). If it's true upon
entry, gstat () just returns NULL . Otherwise, if malloc () fails, ls prints an error
message, sets n o mocore to true, and returns NULL (lines 334-343).
Lines 344- 351 make sure that there is still room left in the flist array. If not, ls
prints a message (but only once; note the use of the static variable msg), and then
reuses the last slot in f l i s t.
Line 352 makes the slot 1astp points to point to the new struct 1bu f (rep).
This also updates 1 a stp, which is used for sorting in main () (lines 142 and 152).
Lines 353-355 set default values for the flags, inode number, and type fields in the
struct 1buf .
7.2 V7 15 Code 223
356 if (argfl II statreq) {

351 if (stat(file, &statb)<O) stat() fa iled
358 printf("%s not found\n", file);
359 statb .s t_ino = -1;
360 statb . st_size 0;
36 1 statb . st_mode = 0 ;
362 if (argfl) (
363 lastp--;
364 return( 0) ;
365
366
361 rep->lnum = statb . st_ino ; stat() OK, copy info
368 rep->lsize = statb . st_size;
369 switch(statb . st_mode&S_IFMT)
310
311 case S IFDIR :
312 rep->ltype 'd' ;
313 break ;
314
315 case S IFBLK :
316 rep->ltype 'b' ;
311 rep->lsize statb . st_rdev;
318 break ;
319
380 case S IFCHR :
381 rep->ltype 'C'i
382 rep->lsize statb.st_rdev;
383 break;
384
385 rep->lflags = statb . st_mode & -S_IFMT;
386 rep->luid = statb . st_uid;
381 rep->lgid = statb . st_gid;
388 rep->lnl = statb . st_nlink;
389 if(uflg)
390 rep->lmtime statb . st_atime;
391 else i f (cflg)
392 rep->lmtime statb . st_ctime ;
393 else
394 rep->lmtime statb . st_mtime;
395 tblocks += nblock(statb . st_size) ;
396
391 return(rep) ;
398
Lines 356-396 handle the call to stat () . If this is a command-line argument or if

statreq is true because of an option, the code fills in the struct lbuf as follows:
• Lines 357-366: Call stat () and if it fails, print an error message and set values
as appropriate, then return NULL (expressed as 0 ).
224 Chapter 7 • Putting It All Together: Is
• Lines 367-368: Set the inode number and size fields from the struc t stat if
the s t a t () succeeded.
• Lines 369-384: Handle the special cases of directory, block device, and character
device. In all cases the code updates the 1 type field. For devices, the ls iz e value
is replaced with the st_r dev value.
• Lines 385-388: Fill in the lf l ag s, l uid, 19i d, and I n l fields from the corre-
sponding fields in the st r uct sta t . Line 385 removes the file-type bits, leaving
the 12 permissions bits (read/write/execute for user/group/other, and setuid, setgid,
and save-text).
• Lines 389-394: Based on command-line options, use one of the three time fields
from the struct stat for the Imtime field in the st r uct Ibuf .
• Line 395: Update the global variable tblocks with the number of blocks in the file.
400 compar (ppl, pp2) int cornpar(struct Ibuf **pp 7,

40 1 struct lbuf * *ppI, * *pp2; struct Ibuf**pp2)
402
403 register struct lbuf *pI, *p 2 ;
404
405 pI = *ppI;
406 p2 = *pp2;
407 if (dflg==O)
408 if (pI->lflags&ISARG && pI->ltype=='d') {
409 if (! (p2->lflag s&ISARG && p2->ltype=='d') )
4 10 return(l);
41 1 else {
412 if (p2->lflags&ISARG && p2->ltype== 'd' )
4 13 return ( -I ) ;
4 14
4 15
416 if (tflg)
4 17 i f (p2->lmtime == pl->lmtime )
41 8 return(O ) ;
419 if (p2->lmtime > pl->lmtime)
420 return(rflg);
421 return (-rflg) ;
422
423 return (r flg * strcmp(pl->lflags&ISARG? pl->ln . namep : pl->ln . lname,
424 p2->lflags&ISARG? p2->ln.namep: p2->ln . lname) ) ;
425
The compar () function is dense: There's a lot happening in little space. The first
thing to remember is the meaning of the return value: A negative value means that the
first file should sort to an earlier spot in the array than the second, zero means the files
7.3 Summary 225
are equal, and a positive value means that the second file should sort to an earlier spot
than the first.
The next thing to understand is that ls prints the contents of directories after it
prints information about files . Thus the result of sorting should be that all directories
named on the command line follow all files named on the command line.
Finally, the rfl g variable helps implement the - r option , which reverses the sorting
order. It is initialized to 1 (line 30). If - r is used, r flg is set to - 1 (lines 89-91 ).
The following pseudocode describes the logic of compa r ( ) ; the line numbers in the
left margin correspond to those of 1 s . c :
407 if ls has to read directories # dflg == 0
408 if pI is a command-line arg and pI is a directory
409 if p2 is not a command-line arg and is not a directory
4 10 return 1 # first comes after second
else
fall through to time test
4 11 else
# pi is not a command-line directory
4 12 if p2 i s a command-line arg and is a directory
4 13 return -1 # first comes before second
else
fall through to time test
4 16 if sorting is based on time # tflg is true

# compare times :
4 17 if p2's time is equal to pI's time
4 18 return 0
4 19 if p2's time> pI's time
420 return the va l ue of r flg (pos i tive or negative)
# p2's time < pi ' s time
42 1 return oppos i te of rflg (negative or positive)
423 Multiply rflg by the result o f strcmp()

424 on the two n a me s and return the result
The arguments to strcmp () on lines 423-424 look messy. What's going on is that
different members of the 1n union in the st r uct 1buf must be used, depending on
whether the filename is a command-line argument or was read from a directory.
7.3 Summary
• The V7 ls is a relatively small program, yet it touches on many of the fundamental
aspects of Unix programming: file I/O , file metadata, directory contents , users
and groups, time and date values, sorting, and dynamic memory management.
226 Chapter 7 • Putting It All Together: 1 s
• The most notable external difference between V7 ls and modern l s is the treat-
ment of the - a and -1 options. The V 7 version has many fewer options than do
modern versions; a noticeable lack is the - R recursive option.
• The management of f list is a clean way to use the limited memory of the PDP-ll
architecture yet still provide as much information as possible. The s t r uct 1buf
nicely abstracts the information of interest from the s t r u ct stat; this simplifies
the code considerably. The code for printing the nine permission bits is compact
and elegant.
• Some parts of ls use surprisingly small limits, such as the upper bound of 1024
on the number of files or the buffer size of 1 0 0 in rna kenarne () .
Exercises
1. Consider the ge t n arne () function. What happens if the requested ID number

is 216 and the following two lines exist in / etc / p a s swd, in this order:
joe : xyzzy : 2160 : 10 : Joe Us er : /us r /joe : /bin/sh
j a n e:zzyxx : 216 : 12 : Ja ne User : / us r/jan e : /bin/sh
2. Consider the rnakenarne ( ) function. Could it use sp ri ntf () to make the

concatenated name? Why or why not?
3. Are lines 319-320 in readdir () really necessary?
4. Take the stat program you wrote for the exercises in "Exercises" for Chapter 6,
page 205. Add the nb1 0ck ( ) function from the V 7 ls, and print the results
along with the st_b1ocks field from the struc t s t at . Add a visible marker
when they're different.
5. How would you grade the V7 ls on its use of rnalloc () ? (Hint: how often is
f ree () called? Where should it be called?)
6. How would you grade the V7 ls for code clarity? (Hint: how many comments
are there?)
7. Outline the steps you would take to adapt the V 7 ls for modern systems .
In this chapter
• 8.1 Mounting and Unmounting Filesystem s page 228

• 8.2 Fil es for Filesystem Administration page 238
• 8.3 Retrieving Per-Filesystem Information page 244
• 8.4 Moving Around in the File Hi erarc hy page 256
• 8.5 Walking a File Tree: GNU du page 269
• 8 .6 Changing the Root Directory: chroot ( ) page 276
227
T his chapter completes the discussion of Linux (and Unix) filesystems and di-
rectories. We first describe how a disk partition containing a filesystem is
added to (and removed from) the logical filesystem namespace, such that in general
a user need neither know nor care where a file is physically located, along with the
APls for working with filesystems.
We then describe how to move around within the hierarchical file namespace, how
to retrieve the full pathname of the current working directory, and how to easily
process arbitrary directory hierarchies (trees) , using the nftw () function. Finally,
we describe the specialized but important chroot () system call.
8.1 Mounting and Unmounting Filesystems

The unified hierarchical file namespace is a great strength of the Linux/Unix design.
This section looks at how administrative files, commands, and the operating system
cooperate to build the namespace from separate physical devices that contain file data
and metadata.
8.1.1 Reviewing the Background

Chapter 5, "Directories and File Metadata," page 117, introduced inodes for file
metadata and described how directory entries link filenames to inodes. It also described
partitions and filesystems, and you saw that hard links are restricted to working within
a single filesystem because directories contain only inode numbers and inode numbers
are not unique across the entire set of in-use filesystems.
Besides inodes and data blocks, filesystems also contain one or more copies of the
superblock. This is a special disk block that describes the filesystem; its information is
updated as the filesystem itself changes. For example, it contains counts of free and
used inodes, free and used blocks, and other information. It also includes a magic
number: a unique special value in a special location that identifies the type of the
filesystem. (We'll see how this is relevant, shortly.)
Making a partition that contains a filesystem available for use is called mounting the
filesystem. Removing a filesystem from use is called, not surprisingly, unmounting
the filesystem.
228
8.1 Mounring and Unmounring Filesystems 229
These two jobs are accomplished with the mount and umount [sic] programs, named
for the corresponding system calls. Every Unix system's mo un t () system call has a
different interface. Because mounting and unmounting are considered implementation
issues, POSIX purposely does not standardize these system calls.
You mount a filesystem onto a directory; such a directory is referred to as the
filesystem's mount point. By convention the directory should be empty, but nothing
enforces this. However, if the mount point is not empty, all of its contents become
completely inaccessible while a filesystem is mounted on it. I
The kernel maintains a unique number, known as the device number, that identifies
each mounted partition. For this reason, it is the (device, inode) pair that together
uniquely identifies a file; when the struct stat structures for two filenames indicate
that both numbers are the same, you can be sure that they do refer to the same file.
As mentioned earlier, user-level software places the inode structures and other
metadata onto a disk partition, thereby creating the filesystem. This same software
creates an initial root directory for the filesystem. Thus, we have to make a distinction
between "the root directory named / ," which is the topmost directory in the hierarchical
filename names pace, and " the root directory of a filesystem ," which is each filesystem 's
individual topmost directory. The / directory is also the "root directory" of the "root
filesystem. "
For reasons described in the sidebar, a filesystem's root directory always has inode
number 2 (although this is not formally standardized). Since there can be multiple
filesystems, each one's root directory has the same inode number, 2 . When resolving a
pathname, the kernel knows where each filesystem is mounted and arranges for the
mount point's name to refer to the root directoty of the mounted filesystem. Further-
more, ' . . ' in the root of a mounted filesystem is made to refer to the parent directory
of the mount point.
Figure 8.1 shows two filesystems: one for the root directory, and one for / u s r , before
/ us r is mounted. Figure 8.2 shows the situation after / usr is mounted.
I GNU/ Linux and Solaris all ow you ro moum o ne fil e on rop of anorh er; rhis has advanced uses, which we don ' r
orherwise di scuss.
230 Chapter 8 • Filesystems and Directory Walks
Root
filesystem
Dev: 3 02h1770d Inod&. 2

Dev: 302h17 7 0d Inod&.
bin Dev: 302h/770d Inode: 547288
/-=~:
Inode: 64385
Inode: 33 8017
/dev/ h dal
bin usr
Dev: 302h17 7 0d Inode: 547288 Dev: 3 02h1770d Inode: 57 Mount point

Dev: 302h!770d Inode: 2 Dev: 3 02h1770d Inode: 2 for / usr
i s Dev: 302h!7 7 0d Inode: 54 73 3 8 is empty
sh Dev: 30 2h 1770d Inode: 5 47322
/ usr
filesystem
~~~~~~~~~
Dev: 305h 1773d Inode: 2

Dev : 305h 1773d Inode : 2
bin Dev : 30 5h1773d Inode : 354 113 /dev / hda3
tmp Dev : 30 5h 17 7 3d Inode: 3 38232
s ha re Dev : 30 5h1773d Inode : 450689
FIGURE 8.1
Separate filesystems, before mounting
The 1 directory, the root of the entire logical hierarchy, is special in an additional
way: 1 . and 1 . . refer to the same directory; this is not true of any other directory on
the system. (Thus, after something like ' c d I . . 1 .. / .. 1 .. ', you're still in I .) This
behavior is implemented in a simple fashion : Both / . and 1 .. are hard links to the
filesystem's root directory. (You can see this in both Figure S.l and Figure S.2.) Every
filesystem works this way, but the kernel treats 1 specially and does not treat as a special
case the' . . ' directory for the filesystem mounted on / .
8.1 Mounring and U nmounri ng Filesystems 231
Dev, 302h/770d in ode,

Dev, 302h!770d in ode, 2
bin Dev, 302 h /770d in ode, 547288
/,::'~:
in ode, 64385 Dot·dot is
inode, 338017 on root
fi iesystem
bin usr
Dev, 302h!770d Inode, 547288 Dev, 305h/773d inode,

Dev, 302h!770d Inode, Dev, 302h!770d inode,
is Dev, 302h!770d Inode, 547338 bin Dev , 305h!773d Inode , 354113
sh Dev, 302h!770d Inode, 547322 crop Dev, 305h1773d Inode, 338232
share Dev, 305h1773d Inode, 450689
FIGURE 8.2
Separate filesystems , after mounting
Root Inode Numbers

The inode number for the root directo ry of a fil esys tem is always 2 . Why is that? T he
answer has to do with bo th technology and histo ry.
As mentioned in Section 5.3 , "Readin g Directories, " page 132, a directo ry entry with
an inode number of zero indicates an unused, or emp ty, slot. So inode 0 cannot be used
for a real fil e or directo ry.
O K, so what abo ut inode 1? Well , particularly in the 1970s and 1980s, disks were no t
as well made as they are now. When yo u bought a disk, it came with a (paper) list of
bad blocks-known loca ti ons on the disk that were not usable. Each operating system
had to have a way to keep track of those bad blocks and avoid using them .
U nder U nix, you did this by creating a special-purpose file, whose data blocks were the
ones known to be bad. T his file was attached on inode 1, leaving inode 2 as the first inode
usable for regular files o r directories.
Modern disk drives have considerable built-in electronics and handle bad blocks on their
own. Thus, technically, it would be feasible to use inode 1 for a file. H owever, since so
much U nix software assumes that inode 2 is the in od e for filesystem root directories,
Linux follo ws this con venrion as well. (H owever, Linux sometimes uses inode 1 for
nonn ative filesystems, such as vfat or ! proc.)
232 Chapter 8 • Filesysrems and Direcrory Walks
8.1.2 Looking at Different Filesystem Types
II NOTE The discussion in this section is specific to Linux. However, most modern
Unix systems have similar features. We encourage you to explore your system 's
I documentation.
Historically, V7 Unix supported only a single filesystem type; every partltlon's

metadata and directory organization were structured the same way. 4.1 BSD used a
filesystem with the same structure as that ofV7, bur with a 1024-byte block size instead
of a 512-byte one. 4.2 BSD introduced the "BSD Fast Filesystem," which dramatically
changed the layout of inodes and data on disk and enabled the use of much larger block
sizes. (In general, using larger contiguous blocks of data provides better throughput,
especially for file reads.)
Thtough 4.3 BSD and System V Release 2 in the early and mid-1980s, Unix systems
continued to suppOrt just one filesystem type. To switch a computer from one filesystem
to another, 2 you had to first back up each filesystem to archival media (9-track tape),
upgrade the system, and then restore the data.
In the mid-1980s, Sun Microsystems developed a kernel architecture that made it
possible to use multiple filesystem architectures at the same time. This design was im-
plemented for their SunOS operating system, primarily to support Sun's Network File
System (NFS). However, as a consequence it was also possible to support multiple on-
disk architectures . System V Release 3 used a similar architecture to support the Remote
File System (RFS), but it continued to support only one on-disk architecture. 3 (RFS
was never widely used and is now only a historical foomote.)
Sun's general design became popular and widely implemented in commercial Unix
systems, including System V Release 4. Linux and BSD systems use a variant of this
design to support multiple on-disk filesystem formats. In particular, it's common for
all Unix variants on Intel x86 h ardware to be able to mount MS-DOS/Windows FAT
filesystems , including those supplying long filenames, as well as ISO 9660-formatted
CD-ROMs.
2 For example, consider upgrading a VAX 11/780 from 4.1 BSD to 4.2 BSD.
3 System V Release 3 supported rwo differenr bl ock sizes: 512 bytes and 1024 bytes, but otherwise the disk organi-
zati on was the same.
S.l Mounting and Unmounting Filesystems 233
Linux has several native (that is, on-disk) filesystems. The most popular are the ext2
and ext3 filesystems. Many more filesystem types are available, however. You can
fi nd information about most of them in the / usr / src/linux/Docurnentation /
fi lesys terns / directory (if you have kernel source installed). Table 8.1 lists the various
fi lesystem names , with brief descriptions of each. The abbreviation "RW" means
"read/write" and "RO " means "read only. "
TABLE 8.1
Supported in-kernel Linux filesystems (kernel 2.4.x)
Name Support Description

afs RW The Andrew File System.
adfs RW Acorn Advanced Disc Fi lin g System.
affs RO , RW Amiga Fast File System. Read only vs. read/write depends upon
the version of the filesystem.
autofs RW Filesystem for interacting with the automounter daemon.
be f s RO BeOS Filesystem . Marked as alpha software.
bfs RW SCO UnixWare Boot Filesystem.
binfmt_misc RW Special filesystem for running interpreters on compiled fil es (for
example, Java files).
efs RW A filesystem developed for SGI's U nix vari ant named Irix.
coda RW An experimental distributed filesystem developed at C MU.
cramfs RO A small filesystem for stori ng files in ROM.
devfs RW A way to dynamically provide device files for / dev (o bsolete).
devpts RW Special filesystem for pseudo-ttys.
ext2 RW The Second Extended Filesystem. This is the default GNU/Linux
filesystem, although some distributions now use ext3.
ext3 RW The ext2 filesystem with journaling.
hfs RW Apple Mac OS Hierarchical File System.
hpfs RW The OS/2 High Performance File System.
intermezzo RW An experimental distributed filesystem for working while
disconnected. See the InterMezzo web site
(http: // www . inter-mezzo . org).
Name Support Description

jffs RW Journaled Flash Filesystem (for embedded systems).
jffs2 RW Journaled Flash Filesystem 2 (also for embedded systems).
is09660 RO The ISO 9660 CD-ROM filesystem. The Rock Ridge extensions
are also supported, making a CD-ROM that uses them look like
a normal (but read-only) filesystem.
jfs RW IBM 's Journaled File System for Linux.
ncp RW Novell's NCP protocol for NetWare; a remote filesystem client.
ntfs RO Support for Windows NTFS filesystem.
openpromfs RO A / proc filesystem for the PROM on SPARC systems.
proc RW Access to per-process and kernel information .
qnx4 RW The QNX4 (a small, real-time operating system) filesystem .
rarnfs RW A filesystem for creating RAM disks.
rei serfs RW An advanced journaling filesystem .
romfs RO A filesystem for creating simple read-only RAM disks.
smbfs RW Client support for 5MB filesystems (Windows file shares).
sysv RW The System V Release 2, Xenix, Minix, and Coherent filesystems .
coherent, minix, and xenix are aliases.
tmpfs RW A ramdisk filesystem, supporting dynamic growth.
udf RO The UDF filesystem format used by DVD-ROMs.
ufs RO ,RW The BSD Fast Filesystem; read/write for modern systems.
urnsdos RW An extension to vfat making it look more like a Unix filesystem.
usbfs RW A special filesystem for working with USB devices. The original
name was usbdevfs and this name still appears, for example, in
the output of mount.
vfat RW All variants ofMS-DOS/Windows FAT filesystems. msdos and
fat are components.
vxfs RW The Veritas VxFS journaling filesystem .
xfs RW A high-performance journaling filesystem developed by SGI for
Linux. See the XFS web site
(http: //o ss . sgi. com / projects / xfs / ).
8.1 MOLlnring and Unmounring Filesystems 235
Not all of these filesystems are supported by the moun t command; see mount(8) for
the list of those that are supported.
Journaling is a technique, pio neered in database systems, for improving the perfo r-
mance of file updates, in such a way th at filesystem recovery in the event of a crash can
be done both correctly and quickly. As of this writing, several different journaling
filesystems are available and competing for prominence in the GNU/Linux world. ext 3
is one such; it has the advantage of being upwardly compatible with existing ext2
filesystems, and it's easy to convert a filesystem back and forth between the two types.
(See tune2fi(8).) ReiserFS and XFS also have strong followings.
The fat, ms dos, umsdos , and vfat filesystems all share common code. In general,
yo u sh o uld use v f a t to mount Windows FAT-32 (o r other FAT-xx) partition s,
and ums dos if yo u wish to use a FAT partiti o n as the root filesystem for your
GNU/Linux sys tem.
The Coherent, MINIX, original System V, and Xenix filesystems all have similar
on-disk structures. The sys v filesystem type supports all of them; the four names
c oherent, mi nix, s y s v , and xenix are aliases one for the other. The co he rent and
xenix names will even tually be removed.
The BSD Fas t Filesys tem has evolved somewhat over the years. The ufs filesystem
supportS read/write o perati on for the version from 4.4 BSD , which is the bas is for the
three widely used BSD operating systems: FreeBSD , NetBSD , and OpenBSD . It also
supportS read/write operation for Sun's Solaris filesystem, for both SPARe and Intel
x86 systems. The original BSD format and that from the NeXTStep operating system
are supported read-only.
The "RO" designations for be fs and ntfs mean that filesystems of those types can
be mounted and read but files cannot be written on them or removed from them. (This
may change with time; check your system's documentation.) The cramf s , i s o966 0,
r omfs, and ud f filesys tems are marked "RO" because the underlying m edia are inher-
ently read-only.
Two filesystem types no longer exist: ext, which was the original Extended Filesystem,
and xiafs , which extended the original MINIX filesystem for longer names and larger
file sizes. xiafs and ext2 came out approximately simultaneously, but ext2 eventually
became the dominant filesystem. 4
8.1.3 Mounting Filesystems: mount

The moun t command mounts filesystems, splicing their contents into the system file
hierarchy at their mount points. Under GNU/Linux, it is somewhat complicated since
it has to deal with all the known filesystem types and their options. Normally, only
ro o t can run mount, although it's possible to make exceptions for certain cases, as is
discussed later in the chapter.
You specify the filesystem type with the -t option:
mount [ options J device mount-point
For example (# is the r oot prompt):

# mount -t is09660 /dev/cdrom /rnnt/cdrom Mount CD-ROM
# mount -t vfat / dev/fdO /rnnt/floppy Mount MS-DOS ~oppy
# mount -t n fs files.exarnple.com:/ /rnnt/files Mount NFS filesystem
You can use '-t aut o' to force mount to guess the filesystem type. This usually
works, although if you know for sute what kind of filesystem you have, it helps to
supply the type and avoid the chance that mount will guess incorrectly. mount does
this guessing by default, so ' - t auto' isn' t strictly necessary.
GNU/Linux systems provide a special kind of mounting by means of the loopback
device. In this way, a filesystem image contained in a regular file can be mounted as if
it were an actual disk device. This capability is very useful, for example, with CD-ROM
images. It allows you to create one and try it out, without having to burn it to a writable
CD and mount the CD. The following example uses the first CD image from the Red
Hat 9 distribution of GNU/Linux:
# ls -1 shrike-i386-discl. iso Examine CD image file
-rw-r- - r-- 1 arnold devel 668991488 Apr 11 05:13 shrike-i386-disc1.iso
# mount -t is09660 -0 ro, loop shrike-i386-discl. iso /rnnt/cdrom Mount it on / mnt/ cdrom
# cd /rnnt/cdrom Go there
# 1s Look at files
autorun README. it RELEASE-NOTES-fr.html
dosutils README. ja RELEASE-NOTES . html
EULA README.ko RELEASE-NOTES-it . html
4 Source: http : //www . ife.ee . ethz.ch/music/ s oftwarel sag / subsection2_5_4_3 . html.

8.1 Mounring and Unmounring Filesys(ems 237
GPL README . pt RELEASE-NOTES-ja . html

images README . pt_BR RELEASE - NOTES-ko . html
isolinux README . zh_CN RELEASE-NOTES-pt_ER . html
README README . zh_TW RELEASE-NOTES-pt . html
README-Accessibility RedHat RELEASE-NOTES-zh_CN . html
README . de REL EASE-NOTES RELEASE-NOTES-zh_TW . html
README . es RELEASE-NOTES-de . html RPM-GPG-KEY
README . fr RELEASE-NOTES-es . html TRANS . TEL
# cd Change out
# urnount Imnt/cdrom Unmount
Being able to mount an ISO 9660 image this way is particularly helpful when you
are testing scripts that make CD images. You can create an image in a regular file,
mount it, and verify that it's arranged correctly. Then , once you're sure it's correct, yo u
can copy the image to a writab le CD ("burn" the CD). The loopback facility is useful
for mounting Boppy di sk images, too .
8.1.4 Unmounting Filesystems: umoun t

The umount command unmounts a filesystem , removing its contents from the system
file hierarchy. The usage is as fo llows:
umount file-or-device
The filesystem being unmounted must not be busy. This means that there aren't any
processes with open files on the filesystem and that no process has a directo ty on the
filesystem as its current working directory:
$ mount Show what's mounted
Idev/hda2 on I type ext3 (rw) / is on a real device
none on Iproc type proc (rw)
usbdevfs on Iproc/bus/usb type usbdevfs (rw)
Idev/hda5 on Id type ext3 (rw) So is / d
none on Idev/pts type devpts (rw , gid=5,mode=620)
none on Idev/shm type tmpfs (rw)
none on Iproc/ sys/fs/b i nfmt_misc type binfmt_mi s c (rw)
$ su Switch to superuser
Password : Password does not echo
# cd Id Make / d the current directory
# urnount Id Try to unmount / d
umount : Id : device is busy Doesn't work; it's still in use
# cd I Change out of/ d
# urnount Id Try to unmount / d again
# Silence is golden: un mount worked
238 Chapter 8 • Filesystems and Directory W alks
8.2 Files for Filesystem Administration

The / etc / f stab file 5 lists filesystems that can be mounted. Most are automatically
mounted when the system boots. The format is as follows:
device mount-point fs-type options dump-freq fsck-pass
(The dwnp-freq and f sck - pass are administrative features that aren' t relevant to the
current discussion.) For example, on our system, the file looks like this:
$ cat /etc/fstab
# device mount-point type options freq passno
Idev/hda3 I ext3 defaults 1 1 Root f1esystem
Idev/hdaS Id ext3 defaults 1 2
none Idev/pts devpts gid=5,mode=620 o 0
none Iproc proc defaults o 0
Idev/shm tmpfs
partition:
Iwin vfat
$ cat letc/mtab
Idev/hda2 I ex t3 rw 0 0
none Iproc proc rw 0 0
usbdevfs Iproc/bus/usb usbdevfs rw 0 0
Idev/hdaS Id ext3 rw 0 0
none Idev/pts devpts rw,gid=5,mode=620 0 0
none Idev/shm tmpfs rw 0 0
none Iproc/sys/fs/binfrnt_rnisc binfrnt_rnisc rw 0 0
Idev/hdal Iwin vfat rw,noexec,nosuid,nodev,uid=2076,gid=10,user=arnold 0 0
The kernel makes (almost) the same information available in /p roc /mount s, in the
same format:
$ cat /proc/mounts
rootfs I rootfs rw 0 0
Idev /root I ext3 rw 0 0
Iproc Iproc proc rw 0 0
O n GNU/Linux and most systems. Solaris and so m e systems based on System V Release 4 use I etc Ivf stab,
possibly with a different format.
8.2 Files for Filesystem Administration 239
usbdevfs /proc/bus/u s b usbdevfs rw 0 0

/ dev /hda5 /d ext3 rw 0 0
none /dev /pts devp t s rw 0 0
none /dev / shm tmpfs rw 0 0
none / proc / sys/fs / binfrn t _misc binfrnt_rnisc rw 0 0
/ dev / hdal / win vfat rW,nosuid , nodev,noexec 0 0
Note that / etc / mtab has some information that / proc / mounts doesn ' t. (For ex-
ample, see the line for the / win mount point.) On the flip side, it's possible (using
'mount -f') to put entries into / etc / mtab that aren 't real (this practice has its uses,
see mount(8)). To sum up, / proc / moun t s always describes what is really mounted;
however, / etc / mtab contains information about mount options that / pro c / mounts
doesn 't. Thus, to get the full picture, you may have to read both files.
8.2.1 Using Mount Options

The mount command supports options that control what operations the kernel will
or will not allow for the filesystem. There are a fair number of these. Only two are really
useful on the command line:
ro
Mount the filesystem read-only. This is necessary for read-only media such as
CD-ROMs and DVDs.
l o op
Use the loop back devi ce for treating a regular file as a filesystem . We showed
an example of this earlier (see Section 8.1.3, "Mounting Filesystems: mount,"
page 236).
Options are passed with the -0 command-line option and can be grouped, separated
by commas. For example, here is the command line used earlier:
mount -t iso9660 -0 ro,loop s h rike-i386-discl . iso / rnnt /c drom
The rest of the options are in tended for use in / etc / f s tab (although they can also
be used on the command line). The following list provides the ones we think are most
important for day-to-day use.
aut o ,noauto
Filesystems marked auto are to be mounted when the system boots through
'mount -a' (mount all filesystems). noauto filesystems must be mounted manually.
Such filesystems still appear in / etc / fstab along with the other filesystems. (See,
for example, the entry for /win in our / etc / f s tab file, shown previously.)
defau l t s
Use the default options rw, s ui d, dey , exe c , a u t o, nouser, and asyn c . (a sync
is an advanced option that increases I/O throughput.)
dev, n odev
Allow (don't allow) the use of character or block device files on the filesystem.
ex e c, noexe c
Allow (don' t allow) execution of binary executables on the filesystem.
user,nouser
Allow (don' t allow) any user to mount this filesystem. This is useful for CD-ROMs;
even if you' re on a single-user workstation, it's convenient to not have to switch
to root just to mount a CD. Only the user who mounted the filesystem can un-
mount it. us e r implies the n oexec , n osu id, and nodev options.
su id,nosui d
Support (don't support) the setuid and setgid bits on executables on the filesystem.
rw
Mount the filesystem read-write.
The n odev, n oe x e c , and n o s u i d options are particularly valuable for security on

floppy-disk and CD-ROM filesystems. Consider a student environment in which stu-
dents are allowed to mount their own floppies or CDs. It's trivial to craft a filesystem
with a setuid-r oot shell or a world-writable device file for the hard disk that could let
an enterprising user change permissions on system files.
Each filesystem has additional options specific to it. One important option for ext 2
and ext3 is the grp i d option. We defer discussion of this option until Section 11.5.1,
"Default Group for New Files and Directories, " page 412. The details for all supported
filesystems can be found in the mount(8) manpage.
As a concrete example, reconsider the line for the Windows partition on our system:
# device mount - point type options freq passno
/ dev / hdal / win vfat noauto , defaults,user, u id=2 076,gid=1 0 0 0
The noau to option prevents the Windows partition from being mounted at boor
time. The defaults option is the same as rw,sui d ,dev ,exe c ,async . The user
8 .2 Files for Filesys rem Adminisrrarion 241
option allows us to mount the filesystem without being r oo t . The uid= and gid=
options force the files in / win to belong to us as a regular user so that we don ' t need
to be r oo t when working on that partition.
8.2.2 Working with Mounted Filesystems : getmntent ()

Any of / etc / fsta b , / et c/ mtab, and /p r oc/ mounts can be read programmatically
with the getmntent () suite of routines:
#include <stdio . h> CUBe
#include <mntent . h>
FILE *setmntent(const char *filename, const cha r *type) ;

struct mntent *getmntent(FILE * filep);
int addmntent ( FILE *filep, const struct mnten t *mnt) ;
int endmnten t (F I LE * f i lep);
char *hasmntopt(const s truct mntent *mnt, const char *opt ) ;
setmntent ( ) opens the file containing mount point entries. The filename argument
is the file to open. The type argument is like the second argument to fopen ( ) , indi-
cating read, write, or read/write access. (Consider the mount command, which has
to add an entry to / etc / mtab for each filesystem it mounts, and umount , which
has to remove one.) The returned value of type FIL E * is then used with the rest of
the routines.
getmntent () reads through the file , returning a pointer to a static struct
mntent, which is filled in with the appropriate values . This static storage is overwritten
on each call. It returns NULL when there are no more entries. (This is similar to the
routines for reading the password and group files; see Section 6.3, "User and Group
Names," page 195.)
addmntent () is called to add more information to the end of the open file; it's in-
tended for use by moun t .
e n dmntent () closes the open file; call it when you're done processing. Don 't just
call fclose () ; other internal data structures associated with the F I LE * variable may
need to be cleaned up.
hasmnt opt ( ) is a more specialized function. It scans the struct mntent passed
as the first parameter for a mount option matching the second argument. If the option
is found, it returns the address of the matching substring. Otherwise, it returns NU L L .
242 Chapter 8 • Filesysrems and Directory Walks
The fields in the struc t mntent correspond directly to the fields in the / et c/ fstab
file. It looks like this:
struct mntent {
char *mnt_fsname; 1* Device or server for filesystem. *1
char *mnt_dir; 1* Directory mounted on . *1
char *mnt_type; 1* Type of filesystem: ufs, nfs, etc. *1
char *mnt_opts; 1* Comma-separated options for fs. *1
int mnt_freq; 1* Dump frequency (in days ) . * 1
int mnt.J)assno; 1* Pass number for 'fsck' . * 1
};
The normal paradigm for working with mounted filesystems IS to write an outer
loop that reads / etc /mtab, processing one struct mntent at a time. Our first example,
ch08-mounted . c, does exactly that:
1* chOB-mounted.c --- print a list of mounted filesystems * /
2
3 1* NOTE: GNU/Linux specific! * /
4
5 #include <stdio.h>
6 #include <errno.h>
1 #include <mntent.h> / * for getmntent() , et al. * /
8 #include <unistd .h> 1* for getopt () * /
9
10 void process (const char *f ilename) ;
11 void print_mount (const struct mntent *fs ) ;
12
13 char *myname;
14
15 / * main --- process options * /
16
18
19 int c;
20 char *file = "/etc/mtab"; 1* default file to read *1
21
22 myname = argv[O];
23 while ((c = getopt(argc, argv, "f:")) != -1 ) {
24 switch (c) {
25 case 'f':
26 file = optarg;
21 break;
28 default:
29 fprintf(stderr, "usage : %s [-f fstab-file]\n", argv[O]);
30 exit (1);
31
32
33
34 process(file);
35 return 0;
36
8.2 Files for Filesystem Adminisuation 243
37
38 / * process --- read struct mntent structures f r om file */
39
40 void process ( const cha r *f ilename)
41
42 FILE *fp;
43 struct mntent *fs;
44
45 fp = setmntent(filename, " r" ) ; / * read only */
46 if ( fp == NULL) {
47 fprintf(stderr, "%s : %s : could not open : %s\n",
48 myname , filename, strerror(errno) ) ;
49 e x it ( l ) ;
50
51
52 while (( fs = getmntent (fp)) I = NULL)
53 print_mount(fs) ;
54
55 endmntent ( fp ) ;
56
57
58 / * print_mount --- print a single mount entry * /
59
60 void print_mount (const struct mntent * fs)
61
62 printf("%s %s %s %s %d %d\n " ,
63 fs->mnt_fsname ,
64 fs->mnt_dir,
65 fs->mnt_type,
66 fs->mnt_opts,
67 fs->mnt_freq,
68 fs - >mnt-passno ) ;
69
Unlike most of the programs that we've seen up to now, this one is Linux specific.
Many Unix systems have similar routines, but they' re not guaranteed to be identical.
By default, ch08 - mounted reads /etc/mtab, printing the information about each
mounted filesystem. The - f option allows you to specify a different file to read, such
as / proc / mounts or even /etc/ f stab.
The main () function processes the command line (lines 23-32) and calls proce s s ( )
on the named file. (This program follows our standard boilerplate.)
p r oc e ss ( ) , in turn, opens the file (line 45) , and loops over each returned filesystem
(lines 52-53). When done, it closes the file (line 55).
244 Chapter 8 • Filesysrems and Directory W alks
The p ri n t_moun t () function prints the information in the st r uct mntent. The
output ends up being much the same as that of ' ca t / etc / mta b':
$ ch08-mounted Run the program
/dev/hda2 / ext3 rw 0 0
none /proc proc rw 0 0
usbdevfs /proc/bus/usb usbdevfs rw 0 0
/dev/hda5 /d ext3 rw 0 0
none /dev/pts devpts rw , g i d = 5,mode =620 0 0
none /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
/dev/hda1 /win vfat rw,noexec , nosu i d,nodev , uid=2076,gid=10,user=arnold 0 0
8.3 Retrieving Per-Filesystem Information

Printing per-filesystem information is all fine and good, but it's not exciting. Once
we know that a particular mount point represents a filesystem, we want information
about the fil esystem. This allows us to do things like print the information retri eved by
df and 'd f - i ' :
$ df Show free/ used space
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda2 6198436 4940316 943248 84%
/dev/hda5 61431520 27618536 30692360 4 8% /d
none 256616 0 256616 0% /dev/shm
/dev/hda1 8369532 2784700 5584832 34% / win
$ df -i Show free/ used inodes

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda2 788704 233216 555488 30%
/dev/hda5 7815168 503243 7311925 7% /d
none 64154 1 64153 1% /dev /shm
/dev/hda1 0 0 0 /win
8.3.1 POSIX Style: statvfs () and fstat v fs ()

Early Unix systems had only one kind of filesystem. For them, it was sufficient if df
read the superblock of each mounted filesystem, extracted the relevant statistics, and
formatted them nicely for printing. (The superblock was typically the second block in
the filesystem; the first was the boot block , to hold bootstrapping code.)
However, in the modern world, such an approach would be untenable. POSIX
provides an XSI extension to access this information. The main function is called
8.3 Rerrieving Per-Filesystem Informatio n 245
statv f s () . (The "vfs" part comes from the underlying SunOS technology, later used
in System V Release 4, called a virtual filesystem. ) There are two functions:
#include <sys/type s. h> XSI
#include <sy s / sta tvf s. h>
int statvfs(cons t c ha r *path, struct sta t v fs *buf ) ;

int fstatvfs(int fd , struct statvf s *buf) ;
s t atv fs () uses a pathname for any file; it returns information abo ut the filesystem
co ntaining the file. fstatvfs () accepts an open file descriptor as its first argument;
here too, the information returned is about the filesystem containing the open file. The
struct statvfs contains the fo llowing members:
struct statvfs {
unsigned long int f_bs i z e; Block size
un s igned long int f _ frsi z e; Fragment size ("fundamenta l block size")
f sblkcnt_t f_blocks; Total number of blocks
fsblkcnt _t f_bfree ; Total number of free blocks
fsb lkcn t _t f_bavail ; Number of available blocks ($ f-bfree)
fsfi l cnt_t f_files ; Total number of in odes
f sfi l cnt_t f _f free ; Total number of free inodes
fsfi lcnt_t f_favai l; Number of available in odes ($ fJiles)
u nsi g n ed long int f_f s id ; Filesystem /0
unsi g n ed l ong int f_flag ; Flags: ST_ROONLY and/ or ST_NOSUID
u nsi gned long int f_namema x ; Maximum filename length
} ;
T he information it contains is enough to write df :
unsigned l ong i nt f _ b size

The block size is the preferred size for do ing I /O. The filesystem attempts to keep
at least f_bsiz e bytes worth of data in contiguous sectors on disk. (A sector is the
smallest amount of addressable data on the disk. Typically, a disk sector is 512
bytes.)
unsigned long i nt f frsize
Some filesystems (s uch as the BSD Fast Filesystem ) distinguish between blocks
and ftagments of blocks. Small files wh ose total size is smaller than a block reside
in so me number of fragments. This avoids wasting disk space (at the admitted
cost of more co mplexity in the kernel code). The fragment size is chosen at the
time the filesystem is created.
fsblkcnt_t f_blocks
The total number of blocks (in units of Cbsize) in the filesystem.
f sb lkcnt_t f_bfr e e
The total number of free blocks in the filesystem.
fsblkcnt_t f_bava i l
The number of blocks that may actually be used. Some filesystems reserve a per-
centage of the filesystem's blocks for use by the superuser, in case the filesystem
fills up. Modern systems reserve around 5 percent, although this number can be
changed by an administrator. (See tune2fi(B) on a GNU/Linux system, and
tunefi(B) on Unix systems.)
fsfilcn t _t f_fi l e s
The total number of inodes ("file serial numbers," in POSIX parlance) on the
filesystem. This number is usually initialized and made permanent when the
filesystem is created.
fs f i lc nt_t f_ffre e
The total 11 umber of free inodes.
fsfi lcnt_ t f_f avail
The number of inodes that may actually be used. Some percentage of the inodes
are reserved for the superuser, just as for blocks.
unsigned l on g i n t f_fs id
The filesystem ID. POSIX doesn't specify what this represents, and it's not used
under Linux.
un si g ned long int f_fl ag
Flags giving information about the filesystem. POSIX specifies two: S T_ RDONLY ,
for a read-only filesystem (such as a CD-ROM), and ST_ NOSU I D, which disallows
the use of the setuid and setgid permission bits on executables. GNU/Linux systems
provide additional flags: They are listed in Table B.2.
u n s i g n ed lon g int f_narnernax
The maximum length of a filename. This refers to each individual component in
a pathname; in other words, the maximum length for a directory entry.
8.3 Reuieving Per-Filesyste m Information 247
TA BLE 8.2
GLl BC values for Cflag
Flag POS IX M eaning

ST_MANDLOCK Enforce mandatory locking (see Section 14.2, page 531) .
ST_NOATIME Don 't update the access time field on each access.
Disallow access through d evice files.
ST_NODlRATIME Don't update the access time field of directories.
ST_NOEXEC Disallow execution of binaries.
ST_NOSUID Filesystem disallows the use of seruid and setgid bits.
ST_RDONLY ./ Filesystem is read-only .
ST_SYNCHRONOUS All writes are synchronous (see Section 4.6.3 , page 110).
The fsblkcnt_t and fsfilcnt_t types are defined in <sys/types . h>. They are
typically unsigned long, but on modern sys tems, they may be even be a 64-bit type,
since disks have gotten very large. The following program, ch08 -s tatvfs . c , shows
how to use statvfs ( ) :
1 / * ch08-statvfs . c --- demonstrate statvfs */
2
3 / * NOTE: GNU /Linux specific! * /
4
6 #include <errno . h>
1 #include <mntent . h> / * for getmntent(), et al . */
8 #include <unistd . h> / * for getopt() */
10 #include <sys/statvfs . h>
11
12 void process (const char *filename);
13 void do_statvfs(const struct mntent *fs ) ;
14
15 int errors = 0;
16 char *myname;
11
18 / * main --- process options * /
19
248 Chapter 8 • Filesys(ems and Directory Walks

21
22 int Ci
23 char *file = '/etc/mtab'; / * default file to read * /
24
25 myname = argv[O] ;
26 while (( c = getopt(argc , argv, ' f : ' )) != -1 ) {
27 switch (c ) (
28 case 'f' :
29 file = optarg;
30 break;
31 default :
32 fprintf (stderr, 'us age : %s [-f fstab-file] \n', argv[O] ) ;
33 exit(l ) ;
34
35
36
37 process(file ) ;
38 return (errors != 0);
39
40
41 /* process --- read s truct mntent structures from file * /
42
43 void process (const char *filename )
44 (
45 FILE *fp;
46 struct mntent *f s;
47
48 fp = setrnntent(filename, 'r'); / * read only * /
49 if (fp == NULL ) (
50 fprintf(stderr, ' %s : %s : could not open : %s\n',
51 myname, filename, strerror ( errno )) ;
52 exit(l ) ;
53
54
55 while ( ( fs = getmntent ( fp )) != NULL )
56 do_statvfs(fs) ;
57
58 endmntent (fp ) ;
59
Lines 1-59 are essentially the same as ch08-mounted. c. main () handles the com-
mand line, and process () loops over each mounted filesystem. do_statvfs () does
the real work, printing the struct statvfs for each interesting filesystem .
8. 3 Re(rieving Per-Filesys(em Informa(ion 249
61 /* do_statvfs --- Use statvfs and print info * /

62
63 void do_statvfs(const struct mntent *fs)
64 {
65 struct statvfs vfs ;
66
67 if (fs->mnt_fsname[OJ '= 'I') / * skip nonreal filesystems * /
68 return ;
69
70 if (statvfs(fs ->mnt_dir, & vfs) != 0) (
71 fprintf(stderr, "%s : %s : statvfs failed : %s\n",
72 myname , fs->mnt_dir, strerror(errno));
73 errors++;
74 return ;
75
76
77 printf("%s, mounted on %s : \n ", fs->mnt_dir , fs->mnt_fsname) ;
78 printf("\tf_bsize : %ld\n" , (long) vfs . f_bsize ) ;
79 printf("\tf_frsize : %ld\n " , (long) vfs . f _frsize) ;
80 printf("\tf_blocks : %lu\n " , (unsigned long) vfs . f _blocks) ;
81 printf("\tf_bfree : %lu\n" , (unsigned long) vfs . f _bfree) ;
82 printf( " \tf_ bavail : %lu\n" , (unsigned long) vfs . f_bavai 1) ;
83 printf("\tf_ files : %lu\n " , (unsigned long) vfs . f_files) ;
84 printf("\tf_ffree : %lu\n" , (unsigned long) vfs . f _ffree) ;
85 printf("\tf_ favail : %lu\n" , (unsigned long) vfs . f_favail) ;
86 princf("\tf_ fsid : %#lx\n " , (unsigned long) vfs . f_fsi d) ;
87
88 printf("\tf_flag : ");
89 i f (vfs . f_flag==O)
90 printf ( " (none) \n") ;
91 else {
92 if ((vfs . f_flag & ST_RDONLY) ! = 0)
93 printf ("ST_RDONLY ") ;
94 if ((vfs . Cflag & ST_NOSUID) ! = 0)
95 printf ( "ST_NOSUID");
96 printf (" \n ") ;
97
98
99 printf("\tf_namemax : %#ld\n " , (long) vfs . f_namemax);
100 }
Lines 67-68 skip filesystems that are not based on a real disk device. This means
that filesystems like / proc or / dey /p t s are ignored. (Admittedly, this check is a
heuristic, but it works: In / e t c/ mt ab mounted devices are listed by the full device
pathname: for example, / dev / hd al.) Line 70 calls statvf s () with appropriate error
checking, and lines 77-99 print the information.
Lines 89-96 deal with flags: single bits of information that are or are not present.
See the sidebar for a discussion of how Bag bits are used in C code. Here is the output
of c h 08-statvfs :
250 Chapter 8 • Filesystems and Direcrory Walks
$ ch08-statvfs Run the program

I , mounted on I dev / hda2: Results for ext2 filesystem
f_bsize: 4096
f_frsize: 4096
f_blocks: 1549609
f_bfree : 316663
f_bavail: 237945
f _files: 788704
f_ffree : 555482
f_favail: 555482
f _f sid : 0
f_flag: (none )
f_namemax: 255
I win, mounted on Idev /hda1 : Results for vfat fi1esystem

f_bsize : 4096
f_frsize : 4096
f_bl o cks : 2092383
f_bfree: 1391952
f_bavail : 1391952
f_files : 0
f_ffree : 0
f_favail: 0
f _ fsid : 0
f_flag : ST_ NOSUID
f_namema x: 26 0
As of this writing, for GLIBC 2.3 .2 and earlier, GNU df doesn 't use statvfs () .
This is because the code reads / etc / mtab, and calls s ta t () for each mounted filesystem,
to find the one on which the device number matches that of the file (or file descriptor)
argument. It needs to find the filesystem in order to read the mount options so that it
can set the f_flag bits. The problem is that stat ( ) on a remotely mounted filesystem
whose server is not available can hang indefinitely, thus causing df to hang as well. This
problem has since been fixed in GLIBC, bur df won't change for a while so that it can
continue to work on older systems.
I NOTE Although POSIX specifies statvfs () and fstatv£s (), not all systems
. support them or support them correctly. Many systems (including Linux, as
~ described shortly) , have their own system calls that provide similar information.
Iw
GNU df uses a library routine to acquire filesystem information; the source file
I for that routine is full of #i fdefs for a plethora of different systems. With time,
I the portability situation should improve .
8.3 Retrievi ng Per-Filesystem Information 251
Bit Flags
A common technique, applicable in many cases, is to have a set of flag values; when a
flag is set (that is, true), a certain fact is true or a certain condition applies. Flag values
are defined with either #defined symbolic constants or enums. In this chapter, the
n ft w() API (described later) also uses flags. There are only two flags for the struct
statvfs field f_flag :
#define ST_RDONLY 1 1* read- on l y filesystem * 1 Sample definitions
#define ST_NOSUID 2 1* setuid/setgid n o t al lowed * 1
Physically, each symbolic constant represents a different bit position within the f_flag
value. Logically, each value represe nts a separate bit of state information; that is, some
fact or condition that is or isn't true for this particular instance of a struct statvfs .
Flags are set, tested, and cleared with the C bitwise operators. For example, s ta tvf s ( )
would set these flags , using the bitwise OR operator:
int statvfs (c o nst char *path , struct statvfs *vfs)
... fill in most of *vfs .

vfs->f_flag = 0 ; Make sure it starts out as zero
if ( filesystem is read-only)
vfs->f_flag 1= ST_RDONLY; Add the ST_RDONL Y flag
if (filesystem disallows setuid )
vfs->f_flag 1= ST_NOSUID; Add the ST_NOSUID flag
... rest of routine.
The bitwise AND operator tests whether a flag is set, and a combination of the bitwise
AND and COMPLEMENT operators clears a flag:
if (( vfs . Cflag & ST_RDONLY) != 0) True ifST_RDONLY flag is set
vfs . f_flag &= - (ST_RDONLY 1 ST_NOSUID) ; Clear both flags

The bitwise operators are daunting if you've not used them before. However, the example
code just shown represents common C idioms. Study each operation carefully; perhaps
draw yourself a few pictures showing how these operators work. Once you understand
them, you can train yourself to recognize these operators as high-LeveL operations for
managingjlag vaLues instead of treating them as Low-LeveL bit manipulations.
The reason to use flag values is that they provide considerable savings in data space. The
single uns igned long field lets you store at least 32 separate bits of information.
GLIBC (as of this writing) defines 11 different flags for the Cf lag field. ' If you used
a separate char field for each flag, that would use 11 bytes instead of just the four used
by the unsigned l ong. If you had 32 flags, that would be 32 bytes instead of just four!
. See l usr / include/bits/statvfs .h o n a GNU/ Linux sysrem.

252 Chapter 8 • Filesystems and D irecto ry W alks
8.3.2 Linux Style: statfs () and fstatfs ()

The s ta tf s () and f s ta tf s () system calls are Linux specific. Their declarations
are as follows:
#include <sys/types . h> CLiBe
#include <sys/vfs . h>
int statfs ( const char *path, struct statfs *buf ) ;

int fstatfs(int fd, struct statfs *buf ) ;
As with statvfs () and fstatvfs (), the two versions work on a filename or an
open file descriptor, respectively . The struct statfs looks like this:
struct statfs {
long f _type; /* type of filesystem * /
long f _bsize; /* optimal transfer block size * /
long f_blocks; /* total data blocks in filesystem * /
long f _bfree; /* free blocks in fs * /
long f_bavail; /* free blocks avail to nonsuperuser * /
long f_files; /* total file n odes in file system * /
long f _ffree ; /* free file nodes in fs * /
fsid_ t f_fsid; /* filesystem id * /
long f _namelen; /* maximum length of filenames * /
long f _spare[6] ; /* spare for later * /
};
The fields are analogous to those in the struct statvfs . At least through
GLIBC 2.3.2 , the POSIX statvfs () and fstatvfs () function s are wrappers around
statfs () and fstatfs () , respectively, copying the values from one kind of struct
to the other.
The advantage to using statfs () or fstatfs () is that they are system calls. The
kernel returns the information directly. Since there is no f_flag field with mount op-
tions, it's not necessary to look at every mounted filesystem to find the right one.
(In other words , in order to fill in the mount options, s ta tv f s () must examine each
mounted filesystem to find the one containing the file named by path or fd . statf s ()
doesn 't need to do that, since it doesn 't provide information about the mount options.)
There are two disadvantages to using these calls. First, they are Linux specific. Second,
some of the information in the struct statv f s isn't in the struct s t a t fs ; most
noticeably, the mount flags (f_flag) and the number of available inodes (C f a vail) .
(Thus, the Linux statvfs () has to find mount options from other sources, such as
/ etc / mtab, and it "fakes" the information for the stru ct statvf s fields for which
real information isn't available.)
8.3 Retrieving Per-Filesystem Information 253
One field in the S true t S ta tf S deserves special note. This is the f_ type field,
which indicates the type of the filesystem. The value is the filesystem's magic number,
extracted from the superblock. The statfs(2) manpage provides a list of commonly used
filesystems and their magic numbers, which we use in eh08 - statfs . c . (Alas, there is
no separate # inelude file. )
7 /* ch08 -statfs . c --- demonstrate Linux statfs */
2
3 / * NOTE : GNU/Linux specific! */
4
7 #include <mntent . h> /* for ge tmntent(), et al . * /
8 #include <unistd . h> / * for getopt () * /
70 #include <sys/vfs . h>
77
72 /* Defines taken from statfs(2) man page : * /
73 #define AFFS_SUPER_MAGIC OxADFF
74 #define EFS_SUPER_MAGIC Ox00414A53
75 #define EXT_SUPER_MAGIC Ox137D .
76 #define EXT2 OLD_SUPER_MAGIC OxEF51
77 #define EXT2_SUPER_MAGIC OxEF53
78 #define HPFS SUPER_MAGIC OxF995E849
79 #define ISOFS_SUPER_MAG IC Ox 9660
20 #define MINIX_SUPER_ MAGIC Ox137 F /* orig . minix * /
27 #define MINIX_SUPER_MAGIC2 Ox 138F /* 30-char mini x * /
22 #define MINIX2_SUPER_MAGIC Ox2468 /* min i x V2 * /
23 #define MINIX2_SUPER_MAGIC2 Ox 2478 / * minix V2, 30 char names * /
24 #define MSDOS_SUPER_MAGIC Ox4d44
25 #define NCP_SUPER_MAGIC Ox564c
26 #define NFS SUPER_MAG IC Ox6969
27 #define PROC_SUPER_MAGIC Ox9fa O
28 #define 5MB_SUPER_MAGIC Ox517B
29 #define XENIX_SUPER_MAGIC Ox012 FF7B4
30 #define SYSV4_SUPER_MAGIC Ox01 2 FF7B5
37 #define SYSV2_SUPER_MAGIC Ox012 FF7B6
32 #define COH_SUPER_MAGI C Ox012FF7B7
33 #define UFS_MAGIC Ox00011954
34 #define XFS SUPER_MAGIC Ox58465342
35 #define _ XIAFS_SUPER_MAGI C Ox012FD16D
36
37 void process (const cha r * filename) ;
38 v o id do_statfs(const struct mntent *fs ) ;
39
254 Chapter 8 • Filesysrems and Direcrory Walks
40 int errors = 0;
41 char *myname;
42
... main() is unchanged, process() is almost identical.
85
86 /* type2str --- convert fs type to printable string, from statfs ( 2 ) * /
87
88 const char *type2str(long type )
89
90 static struct fsname (
91 long type;
92 const char *name;
93 table[ ) = {
94 { AFFS_SUPER_ MAGIC, "AFFS" } ,
95 COH_SUPER_MAGI C , "COW' } ,
96 EXT2_0 LD_SUPER_MAGIC, "OLD EXT2" },
97 EXT2_SUPER_MAGIC, "EXT2" } ,
98 HPFS_SUPER_MAGIC, "HPFS" ) ,
99 ISOFS_SUPER_MAGIC, "ISOFS" } ,
100 MINIX2_SUPER_MAGIC, "MINIX V2" } ,
101 MINIX2_SUPER_MAGIC2, "MINIX V2 3 0 char" },
102 MINIX_SUPER_MAGIC, "MINIX" },
103 MINIX_SUPER_MAGIC2, "MINIX 3 0 char" } ,
104 MSDOS_SUPER_MAGIC, "MSDOS" } ,
105 NCP_SUPER_MAGIC, "NCP" },
106 NFS_SUPER_MAGIC, "NFS" },
107 PROC_SUPER_MAGIC, "PROC" L
108 5MB_SUPER_MJl.GIC, "SMB" L
109 SYSV2_SUPER_MAGIC, "SYSV2" } ,
110 SYSV4_SUPER_MAGIC, " SYSV4" } ,
111 UFS_MAGIC, "UFS" } ,
112 XENIX_ SUPER_MAGIC, "XENIX" } ,
113 _XIAFS_SUPER_ !1AGIC, "XIAFS" j ,
114 0, NULL } ,
11 5 };
116 static char unknown[lOO ) ;
117 int i;
11 8
119 for (i = 0; table[i ) .type '= 0; i++ )
120 if (table[i) . type == type )
121 return table[i) . name;
122
123 sprintf (unknown, "unknown type : %#x", type ) ;
124 return unknown;
125
126
127 / * do statfs --- Use statfs and print info * /
128
8.3 Retrieving Per-Filesys[em Informa[ion 255
129 void do_statfs(const struct mntent *fs)

130 {
13 1 struct statfs vfs;
732
133 if (fs ->mnt_fsnarne[Oj '= '/') / * skip nonreal filesysterns * /
134 return;
135
136 if (statfs(fs->rnnt_dir, & vfs) != 0) (
131 fprintf(stderr , "%s : %s : statfs failed : %s\n",
138 myname, fs - >mnt_dir, strerror(errno));
139 errors++ ;
140 return;
141
142
143 printf("%s , mounted on %s : \n ", fs->mnt_dir, fs->mnt_fsnarne ) ;
144
145 printf("\tf_type : %s\n" , type2str(vfs . f_type)) ;
146 printf( " \tf _bsize : %ld\n", vfs . f_bsize);
141 print f ("\tf_bloc ks : %ld\n" , vfs . f_blocks ) ;
148 printf("\tCbfree : %ld\n", vfs . f_bfree) ;
149 printf( " \tf_bavail: %ld\n", vfs . f_bavail);
150 printf("\tf_files : %ld\n" , vfs . f_files);
15 1 printf ( " \tf_ffree : %ld\n", vfs.f_ffree);
152 printf("\cf_n arnelen : %ld\n", vfs . f_namelen) ;
153
To save space, we've omitted main ( ) , which is unchanged from the other programs
presented earlier, and we've also omitted p r ocess () , which now calls do_st a t f s ()
instead of do_ statvfs ( ) .
Lines 13-35 contain the list offilesystem magic numbers from the statfi(2) manpage.
Although the numbers could be retrieved from kernel source code header files, such
retrieval is painful (we tried), and the presentation here is easier to follow. Lines 86-125
define t ype2 s tr ( ) , which converts the magic number to a printable string. It does a
simple linear search on a table of (value, string) pairs. In the (unlikely) event that the
magic number isn't in the table, types2str () creates an "unknown type" message
and returns that (lines 123-124).
do_ stat f s () (lines 129-153) prints the information from the struc t statfs .
The Cf s i d member is omitted since f si d_t is an opaque rype. The code is straight-
forward; line 145 uses t ype2st r () to print the filesystem rype. As for the similar
program using s ta tvf s ( ) , this function ignores filesystems that aren ' t on local devices
(lines 133-134). Here is the output on our system:
$ chOB-statfs Run the program

/, mounted on Idev/hda2 : Results for ext2 filesystem
f_type : EXT2
f_bsize : 4096
f _blocks : 1549609
f_bfree: 316664
f_bavail : 237946
Cfiles: 7 88704
f_ffree: 55548 3
f_narnelen : 255
/win , mounted on / dev / hda1 : Results for vfat filesystem

f_type : MSDOS
f_bsize : 4096
f_blocks : 2092383
f_bfree : 1391952
f_bavail: 1391952
f_files : 0
f_ffree : 0
f_namelen : 260
In conclusion , whether to use s tatv fs () or statfs () in your own code depends

on your requirements . As described in the previous section, GNU df doesn't use
statvfs () under GNU/Linux and in general tends to use each Unix system's unique
"get filesystem info" system call. Although this works, it isn 't pretty. On the other hand,
sometimes you have no choice: for example, the GLIBC problems we mentioned above.
In this case, there is no perfect solution.
8.4 Moving Around in the File Hierarchy

Several system calls and standard library functions let you change yo ur current direc-
tory and determine the full pathname of the current directory. More complicated
functions let you perform arbitrary actions for every filesystem object in a
directory hierarchy.
8.4.1 Changing Directory: chdir () and fchdir ()

In Section 1.2, "The Linux/Unix Process Model," page 10, we said:
The current working directory is the one to which relative pathnames (those
that don't start with a I ) are relative. This is the directory you are "in"
whenever you issue a ' cd someplace' command to the shell.
8.4 Moving Around in (he File Hierarchy 257
Each process has a current working directory. Each new process inherits its current
directory from the process that started it (its parent). Two functions let you change to
another directory:
#include <uni s td . h>
i n t chdir(const char *path) ; POSIX

int fchdi r ( i nt fd); XSI
The c hdir ( ) function takes a string naming a directory, whereas f c h d ir ( ) expects

a file descriptor that was opened on a directory with open ( ) .6 Both return 0 on success
and -1 on error (with e rrno set appropriately). Typically, if open ( ) on a directory
succeeded, then fchd ir () will also succeed, unless someone changed the permissions
on the directory between the calls. (f c hd ir () is a relatively new function; older Unix
systems won ' t have it.)
These functions are almost trivial to use. The following program, c h 08-c hdir.c,
demonstrates both functions. It also demonstrates that f c hdir ( ) can fail if the permis-
sions on the open directory don't include search (execure) permission:
1 /* chOB-chdir . c --- demonstrate chdir() and fchdir() .
2 Error checking omitted for brevity * /
3
5 #include <fcntl . h>
8 #include <sys/stat . h>
9
10 int main(void)
11
12 int fd ;
13 struct stat sbuf ;
14
15 fd = open( " . ", O_RDONLY) ; 1* open di r ec t or y fo r reading *1
16 fstat(fd , & sbuf); 1* obtai n info , need orig i nal permi s sions *1
17 chdi r ( " .. " ) ; 1 * 'cd . . ' *1
18 fchmod(fd , 0) ; 1* zap p e rmissions on original d i rectory *1
19
20 if (fchdir(fd) < 0) 1* try to 'cd ' back, should fail *1
21 perror("fchdir bac k ") ;
22
G O n G NU / Linux and BS D systems, you can apply th e dirfd () function to a DIR * po inter to obtain the
underlying fil e descri pto r; see the GNU/ Linux di1d(3) manpage.
23 fchrnod(fd, sbuf.st_rnode & 07777); / * restore original permissions * /

24 close(fd); /*alldone* /
25
26 return 0 ;
27
Line 15 opens the current directory. Line 16 calls fs tat () on the open directory
so that we have a copy of its permissions. Line 17 uses e h d i r () to move up a level in
the file hierarchy. Line 18 does the dirty work, turning off all permissions on the original
directory.
Lines 20-21 attempt to change back to the original directory. It is expected to fail,
since the current permissions don 't allow it. Line 23 restores the original permissions.
The 'sbu f. st_ mode & 07777' retrieves the low-order 12 permission bits; these are
the regular 9 rwxr wx r wx bits, and the setuid, setgid, and "sticky" bits, which we discuss
in Chapter 11, "Permissions and User and Group ID Numbers ," page 403. Finally,
line 24 cleans up by closing the open file descriptor. Here's what happens when the
program runs:
$ ls -ld . Show current permissions
drwxr-xr-x 2 arnold devel 4096 Sep 9 16 : 42 .
$ ch08-chdir Run the program
fchdir back : Permission denied Fails as expected
$ 18 -ld . Look at permissions again
drwxr-xr-x 2 arnold devel 4096 Sep 9 16: 42 Everything is back as it was
8.4.2 Getting the Current Directory: getcwd ( )

The aptly named g e tewd () function retrieves the absolute pathname of the current
working directory:
#include <unistd.h> POSIX
char *getcwd (char *buf, size_t size);
The function fills in buf with the pathname; it expects b u f to have s ize bytes.
Upon success, it returns its first argument. Otherwise, ifit needs more than s i ze bytes,
it returns NULL and sets errno to ERANGE. The intent is that if ERANG E happens, you
should try to allocate a larger buffer (with malloe () or realloe ( ) and try again.
If any of the directory components leading to the current directory are not readable
or searchable, then get ewd () can fail and e r r no will be EACC ES. The following simple
program demonstrates its use:
8.4 Moving Around in [he File Hierarchy 259
1* chOB-getcwd . c --- demonstrate getcwd() .

Error checking omitted for brevity *1

#include <unistd.h>
#include <sys/types . h>
#include <sys/stat . h>
int main (void)
char buf[PATH_MAX1;
char *cp;
cp = geccwd(buf , sizeof(buf));
printf("Current dir : %s\n", buf) ;
printf("Changing to .. \n");
chdir ( " .. " ) ; 1 * 'cd *1
cp = getcwd(buf, sizeof(buf)) ;
printf("Current dir is now : %s\n", buf);
retu rn 0;
This simple program prints the current directory, changes to the parent directory,
and then prints the new current directory. (ep isn't really needed here, but in a real
program it would be used for error checking.) When run, it produces the
following output:
$ ch08-getcwd
Current dir : Ihome/arnold/work/prenhall /proge x /code/ch08
Changing to
Current dir is now : / home /arnold/work/prenhal l/p rogex /code
Formally, if the buf argument is NULL, the behavior of getewd () is undefined. In

this case, the GLIBC version of getewd () will call malloe () for you, allocating a
buffer of size size. Going even further out of its way to be helpful, if size is 0 , then
the buffer it allocates will be "big enough" to hold the returned pathname. In either
case, you should call free ( ) on the returned pointer when you're done with the buffer.
The GLIBC behavior is helpful, but it's not portable. For code that has to work
across platforms, you can write a replacement function that provides the same
functionality while having your replacement function call getewd () directly if on a
GLIBC system.
GNU/Linux systems provide the file / proc / self / cwd. This file is a symbolic link
to the current directory:
$ cd /tmp Change directory someplace
$ 1s -1 Iproc/self/cwd Look at the file
lrwxrwxrwx 1 arnold devel o Sep 9 17:29 Iproc/self/cwd -> Itmp
$ cd Change to home directory
$ Is -1 /proc/self/cwd Look at it again
lrwxrwxrwx 1 arnold devel o Sep 9 17 : 30 Iproc/self/cwd -> Ihome/arnold
This is convenient at the shell level but presents a problem at the programmatic level.
In particular, the size of the file is zero! (This is because it's a file in / proc , which the
kernel fakes; it's not a real file living on disk. )
Why is the zero size a problem? If you remember from Section 5.4.5, "Working with
Symbolic Links," page 151 , l stat () on a symbolic link returns the number of characters
in the name of the linked-to file in the s t size field of the struct sta t . This
number can then be used to allocate a buffer of the appropriate size for use with
read link (). That won't work here, since the size is zero. You have to use (or allocate)
a buffer that you guess is big enough . However, since r e a dl ink () does not fill in any
more characters than you provide, you can't tell whether or not the buffer is big enough;
readlink () does not fail when there isn't enough room. (See the Coreutils
xre a d link () function in Section 5.4.5, "Working with Symbolic Links," page 151,
which solves the problem.)
In addition to ge t cwd ( ) , GLIBC has several other nonportable routines . These save
you the trouble of managing buffers and provide compatibility with older BSD systems.
For the details, see getcwd(3 ).
8.4.3 Walking a Hierarchy: nftw ( )

A common programming task is to process entire directory hierarchies: doing
something for every file and every directory and subdirectory in an entire tree. Consider,
for example, du, which prints disk usage information, 'chown -R' , which recursively
changes ownership, or the fi nd program, which finds files matching certain criteria.
At this point, you know enough to write your own code to manually open and read
directories, call stat () (or lstat ()) for each entry, and recursively process subdirec-
tories. However, such code is challenging to get right; it's possible to run out of file
descriptors if you leave parent directories open while processing subdirectories; you
have to decide whether to process symbolic links as themselves or as the files they point
to; you have to be able to deal with directories that aren't readable or searchable, and
so on. It's also painful to have to write the same code over and over again if you need
it for multiple applications.
8.4.3.1 The nftw () Interface

To obviate the problems, System V introduced the ftw () (" file tree walk") function.
ftw () did all the wo rk to "walk" a file tree (hierarchy). You supplied it with a pointer
to a function , and it called the function for every file object it encountered. Your
function could then process each filesystem object as it saw fit.
Over time, it became clear that the ftw () interface didn ' t quite do the full job;7 for
example, originally it didn ' t support symbolic links. For this reason, nftw () (" new
ftw ( ) " [s ic]) was added to the XlOpen Portability Guide; it's now part of POSIX.
Here's the prototype:
#include <ftw . h> XSI
int nftw(const char *d ir, Starting point

int (*fn) (const char *file , Function pointer to
const struct stat *s b, function of four arguments
int flag , struct FTW *s ) ,
int depth , i nt flags) ; Max open fds, flags
And here are the arguments:
const char *dir

A string naming the starting point of the hierarchy to process.
int (*fn) (c onst char *file, const struct stat *sb, int flag,
struct FTW *s)
A pointer to a function with the given arguments. This function is called for every
object in the hierarchy. Details below.
int depth
This argument is misnamed. To avoid running out of file descriptors, nftw ( )
keeps no more than depth simultaneously open directories. This does not prevent
7 POS IX standardizes the ftw () inrerface ro supporr existing code, an d GNU/Linux and commercial Un ix systems
conri nue ro supply it. However, sin ce it's underpowered, we don't oth erw ise discuss it. See ftw(3) if you 're
inrerested.
nftw () from processing hierarchies that are more than depth levels deep; but
smaller values for depth mean that nftw () has to do more work.
flag s
A set of flags, bitwise OR'd, that direct how nftw ( ) should process the hierarchy.
The nftw () interface has two disjoint sets of flags. One set controls nftw () itself
(the flags argument to nftw ( )). The other set is passed to the user-supplied function
that nftw () calls (the flag argument to (*fn) ()) . However, the interface is confusing,
because both sets of flags use names starting with the prefix 'FTW_'. We'll do our best
to keep this clear as we go. Table 8.3 presents the flags that control nftw ( ) .
TABLE 8.3
Control flags for nftw ( )
Flag Meaning
FTW_CHDI R When set, change to each directory before opening it. This action is more
efficient, but the calling application has to be prepared to be in a different di-
rectory when nf tw () is done.
FTW_DEPTH When set, do a "depth-first search ." This means that all of the files and subdi-
rectories in a directory are processed before the directory itself is processed.
FTW_MOUNT When set, stay within the same mounted filesystem . This is a more specialized
option.
FTW_PHYS When set, do not follow symbolic links.
FTW_ CHDIR provides greater efficiency; when processing deep file hierarchies, the
kernel doesn't have to process full pathnames over and over again when doing as ta t ( )
or opening a directory. The time savings on large hierarchies can be quite noticeable. 8
FTW_DEPTH mayor may not be what you need; for some applications it's just right.
Consider 'chmod -R u-rx . ' . This removes read and execute permission for the
owner of all files and subdirectories in the current directory. If this permission change
is applied to a directory before it's applied to the directory's contents, any subsequent
attempt to process the contents will fail! Thus, it should be applied after the contents
8 Some older GLIBC versions have problems with FTW_CHDIR. This is not true for GLIBC 2.3.2 and later, and
it's unlikely that yo u'll encounter problems.
have been processed. 9 The GNU/Linux nftw(3) manpage notes for F TW_PHYS that "this
is what you want." This lets you process symbolic links as themselves, which is usually
what's necessary. (Consider du ; it should count the link's space separately from that of
the linked-to file.)
8.4.3 .2 The nftw () Callback Function

As n ft w () runs, it calls a function to which you supply a pointer. (Such functions
are termed callback functions since they are "called back" from library code.) The callback
function receives four arguments:
con s t char * file

The name of the current file (directory, symbolic link, etc.) being processed.
con st st r uc t st at *sb
A pointer to a st r uct sta t for the file.
int fla g
One of several flag values (described below) indicating what kind of file this is or
whether an error was encountered for the object.
s tr uc t FTW * s
This structure provides two separate pieces of information:
struct FTW (
int base; /* Index in file of base part of filename * /
int level; / * Depth of this item relative to starting point * /
) ;
The fl ag parameter has one of the values listed in Table S.4.

The s tru c t FTW * s provides additional information that can be useful. s- >b a se
acts as an index into fil e ; fi le is the full pathname of the object being processed
(relative to the starting point). 'fi l e + s- >base' points to the first character of the
filename compon ent of the file.
9 Why anyone would want ro make such a change, we don't kn ow , but (he "you asked for it, you got it" philosophy
applies here roo!
264 Chapter 8 • Filesystems and Directory W alks
TABLE 8.4
Flag values for nftw () callback function
Flag Meaning
Object is a regular file.
Object is a directory.
Object is a directory that wasn 't readable.
FTW_ SL Object is a symbolic link.
FTW_NS Object is not a symbolic link, and s t a t () failed.
FTW_DP Object is a directory whose children have already been processed. This can
only happen ifFTw_DEPTH was used in the call to n ftw ().
Object is a symbolic link pointing to a nonexistent file. This can only happen
ifFTw_PHYS was not used in the call to n f t w() .
s - >level indicates the current depth in the hierarchy; the original starring point is
considered to be at level o.
The callback function should return 0 if all is well. Any nonzero return causes n f tw ( )
to stop its processing and to return the same nonzero value. The manpage notes that
the callback function should stop processing only by using its return value so that
nftw () has a chance to clean up: that is, free any dynamic storage, close open file de-
scriptors, and so on. The callback function should not use lon gjrnp () unless the pro-
gram will immed iately exit. (longjrnp () is an advanced function , which we describe
in Section 12.5 , "Nonlocal Gotos, " page 446.) The recommended technique for handling
errors is to set a global variable indicating that there were problems, return 0 from the
callback, and deal with the failures once nftw () has completed traversing the file hier-
archy. (GNU du does this, as we see shortly.)
Let's tie all this together with an example program. eh08 -nftw. e processes each file
or directory named on the command line, running nftw () on it. The function that
processes each file prints the filename and type with indentation, showing the hierarchical
position of each file. For a change, we show the results first , and then we show and
discuss the program:
$ pwd Wh ere weare

I home / arnold / wo rk / prenha i l / pr o gex
$ code/ch08/ch08-nftw code Walk the 'code' directory
code (directory ) Top-level directory
ch0 2 (directory) Subdirectories one level indented
ch02-printenv . c ( file) Files in subdirs two levels inden ted
ch03 (directory )
ch03-memaddr . c ( file )
ch04 (directory)
ch04-holes . c ( file )
ch 04 - cat . c ( file )
ch04 - maxfds . c ( fi l e )
v7cat . c ( file )
H ere's the program itself:

1 1* ch08-nftw . c --- demo nstrate nftw () * 1
2
3 #define _ XOPEN SOURCE 1 1* Required under GLIBC for nftw() * /
4 #define XOPEN_S OURCE EXTENDED 1 1* Same *1
5
8 #include <getopt . h>
9 #includ e <f t w. h> 1* gets <sys / types . h> and <sys / stat . h > f o r us * 1
10 #include <limits . h> 1* f or PATH MAX * 1
11 #include <unistd . h> 1* f o r g etdtablesize () , getcwd () dec l arations *1
12
13 #define SPARE FDS 5 1* fds for use by other functions, see text * 1
14
15 extern int process (const char *file, c onst struct stat *sb,
16 int flag, struct FTW *s ) :
17
18 1* usage --- print message and die * 1
19
20 void usage(const char *name)
21
22 fprintf (stderr, "usage : %s [ - cl directory .. . \ n " , name ) :
23 exit(l ) :
24
25
26 1* main --- call nftw() on each command-line argument * 1
27
29
30 int i, c, nfds :
31 int errors = 0:
32 int flags = FTW_ PHYS:
33 char start [PATH_MAXl , finish[PATH_MAX1:
34
35 whi l e ((c = getopt(argc , argv , " c" )) != -1) {

36 s wi tch Ie) {
37 c as e ' c' :
38 flags 1= FTW_CHDIR;
39 break;
40 default :
41 usage (argv [0] ) ;
42 break;
43
44
45
46 if (opti nd == argc)
47 u sage(argv[O]l;
48
49 get c wd(start, sizeof start);
50
51 nfd s = getdtablesize () - SPARE_ FDS; / * leave some spare d e scriptors * /
52 for (i = optind; i < argc; i++ l {
53 if (nftw(argv[i], process, nfds, flag s ) ! = 0 )
54 fprintf( s tde rr, "%s : %s : stopped e a r ly\n" ,
55 a r gv[O], a r gv [i]) ;
56 error s++;
57
58
59
60 if (( flags & FTW_ CHDIR l ,= 0) {
61 getcwd(fini s h , s izeof finish ) ;
62 printf ( "Sta r t ing dir : %s \ n" , start ) ;
63 pr i ntf ( "Finishing dir : %s\n", finish ) ;
64
65
66 return (errors != 0) ;
67 }
Lines 3-11 include header files. Through at least GLIBC 2.3.2, the #defines for
_ XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED are necessary before any header file
inclusion. They make it possible to get the declarations and flag values that nftw ( )
provides over and above those of ft w ( ) . This is specific to GLIBC. The need for it will
eventually disappear as GLIBC becomes fully compliant with the 200 1 POSIX standard.
Lines 35-44 process options . The - c option adds the FTW_ CHDIR flag to the nftw ( )
flags. This is an experiment to see if you can end up somewhere different from where
you started. It seems that if nftw () fails, you can; otherwise, you end up back where
you were. (POSIX doesn't document this explicitly, but the intent seems to be that you
do end up back where you started. The standard d oes say that the callback function
should not change the current directory.)
Line 49 saves the starting directory for later use, using ge t cwd ( ) .
Line 51 computes the number of file descriptors nftw () can use. We don't want it
to use all available file descriptors in case the callback function wants to open files too.
The computation uses getdtables ize () (see Section 4.4.1, "Understanding File
Descriptors, " page 92) to retrieve the maximum available number and subtracts
SPARE_FD S , which was defined earlier, on line 13.
This procedure warrants more explanation. In the normal case, at least three descrip-
tors are already used for standard input, standard output, and standard error. nftw ( )
needs some number of file descriptors for opening and reading directories ; under the
hood, opendir () uses open () to open a directory for reading. If the callback function
also needs to open files, we have to prevent nftw ( ) from using up all available file de-
scriptors with open directories. We do this by subtracting some number from the
maximum available. For this example, we chose five, but if the callback function needs
to open files, a larger number should be used . (nftw () knows how to recover when it
runs out of file descriptors; we don't have to worry about that case.)
Lines 52-58 are the main loop over the arguments; lines 53-57 check for errors;
when they occur, the code prints a diagnostic and increments the erro r s variable.
Lines 60-64 are part of the experiment for FTW_CHDIR, printing the starting and
finishing directories if -c was used.
The function of real interest is process ( ) ; this is the callback function that processes
each file. It uses the basic template for an nftw () callback function , which is a swi tch
statement on the flag value:
69 / * process --- print out each file at the right level * /
10
11 int process (co nst char *file, const struct stat *sb,
12 int flag, struct FTW *s )
13
14 int retval = 0 ;
15 const char *name = file + s->base;
16
11 printf("%*s ", s->level * 4, "" ) ; / * indent ove r */
18
19 switch (flag )
80 case FTW_F :
81 printf( " %s (file)\n ", name) ;
82 break ;
83 case FTW_D :
84 printf ( " %s (directory) \n" , name);
85 break ;
86 ca s e FTW_DNR:
87 pr intf ("%s (unreadable d ire c t o r y) \n ", name);
88 b rea k;
89 ca s e FTW_SL:
90 princf ( "%s ( symbol ic link ) \ n", name ) ;
91 b r eak;
92 ca s e FTW_NS :
93 pr intf ( " %s (stat failed) : %s \n", name, strerror(e rrno)) ;
94 brea k;
95 c a s e FTW_DP :
96 c ase FTW_SLN :
97 pri ntf( " %s: FTW_DP or FTW_SLN: c an ' t happen I \n", name) ;
98 retval = 1 ;
99 brea k ;
100 d e f a ul t:
101 p ri ntf("%s: unknown fla g %d : can't happen! \ n", n a me, f lag ) ;
102 retval = 1;
103 break;
104
105
106 return retval;
107
Line 75 uses ' file + s->ba se' to get at the name part of the full pathname. This
pointer value is saved in the name variable for reuse throughout the function .
Line 77 produces the right amo unt of indentation , using a nice trick. Using %* s ,
prin tf () takes the field width from the firs t argument. This is comp uted dynamically
as ' l e v e l * 4' . T he string to be pri nted is "" , the null string. T he end result is that
pr intf ( ) produces the right amount of space for us, without our having to run a loop .
Lines 79-104 are the s wi t ch statem ent. In this case, it doesn't do anything terribly
interesting except print the file's name and its type (file, directory, etc. ).
Although this program doesn' t use the stru c t sta t , it should be clear that you
could do anything you need to in the callback function.
iIII
NOTE Jim Meyering, th e main tain er of the GNU Coreutils, notes that the
I nftw () design isn 't perfect , because of its recursive nature . ( It calls itself
I recursively when processing subdirectories.) If a directory hierarchy gets really
I deep, in the 20 ,000- 40 ,000 level range (!) , nftw ( ) can run out of stack space,
I killing the program . There are other problems related to nftw( ) 's design as
I well. The post-S.O version ofthe ..Gf\jU Coreutils fixes this by using the BSD
IIS
fts ( ) suite of routines (see fts(3) y:
8.5 W alking a File Tree: GNU du 269
8.5 Walking a File Tree: GNU du

The GNU version of du in the GNU Coreutils uses nftw () ro traverse one or more
file hierarchies, gathering and producing statistics concerning the amount of disk space
used. It has a large number of options that control its behavior with respect (0 symbolic
links, output format of numbers, and so on. This makes the code harder to decipher
than a simpler version would be. (However, we're not going (0 let that stop us. ) Here
is a summary of du' s options, which will be helpful shortly when we look at the code:
$ du --help
Usage : du [OPTION 1 . . . [FILE 1 . ..
Summarize disk usage of each FILE, recurs ively for directories .
Mandatory arguments to long options are mandatory for short options too .
-a, --all write counts for all files, not just directories
- -apparent-siz e print apparent sizes , rather than disk usage; although
the apparent size is usually smaller, it may be
larger due to holes in ('sparse') files, internal
fragmentation, indirect blocks , and the like
-B, --block-size=SIZE use SIZE-byte blocks
-b, --bytes equivalent to '--appa rent-size --block-size=l'
-c, --total produce a grand total
-D , --dereference-args dereference FILEs that are symbolic links
-h , --human-readable print sizes in human readable format (e .g., 1K 234M 2G)
-H, --si l ikewise, but use powers of 1000 not 1024
-k l ike - - block-size=lK
-1, --count-links count sizes many times if hard linked
-L, --dereference derefer ence all symbolic links
-S , --separate-dirs do not include size of subdirectories
-s , --summarize display only a total f o r each argument
-x , --one-file-system skip directories on different filesystems
-x FILE, --exclude-from=FILE Exclude files that match any pattern in FILE.
--ex clude=PATTERN Ex clude files that match PATTERN .
--max-depth=N print the total for a directory (or file, with - -all)
only if it is N o r fewer levels below the command
line argument; --max-depth=O is the same as
- - summarize
--help display this help and exit
--vers i on output version informati on and exit
SIZE may be (o r may be an integer optionally followed by) one o f f ollowing :

kB 1000, K 1024, MB 1,000,000 , M 1 , 048 ,5 76, and so on for G, T, P, E, Z, Y .
Report bugs to <bug-coreutils@gnu . org> .

To complicate matters further, du uses a private version of nftw() that has some
extensions. First, there are additional Bag values for the callback function:
FTW_DCHP
This value signifies that nftw( ) could not execute 'chdir ( " .. " ) ' .
FTW_DCH
This value signifies that nftw () could not use chdir () to change into a directory
itself.
FTW_DPRE
The private nftw () calls the callback function for directories, twice. This value is
used the first time a directory is encountered. The standard FTW_DP value is used
after all the directory's children have been processed.
The private nf tw () also adds a new member, int skip, to the struct FTW. If the
current object is a directory and the callback function sets the skip field to nonzero,
nftw () will not process that directory any further. (The callback function should set
skip this way when the flag is FTW_ D PRE; doing it for FTW_DP is too late.)
With that explanation under our belt, here is the process_file () function from
du. c . Line numbers are relative to the start of the function:
/* This function is called once for every file system object that nftw
2 encounters . nftw does a depth-first traversal. This function knows
3 that and accumulates per-directory totals based on changes in
4 the depth of the current entry. */
5
6 static int
7 process_file (const char *file, const struct stat *sb, int file_type,
8 struct FTW *info )
9
10 uintmax_t size;
11 uintmax_t size_to-print;
12 static int first _ call = 1;
13 static size_t prev_level;
14 static size_t n_alloc;
15 static uintmax_t *sum_ent;
16 static uintmax_t *sum_subdir;
17 int print = 1;
18
19 /* Always define info->skip before returning . */
20 info->skip = excluded_filename (exclude , file + info->base); For --exclude
8.5 W alking a File Tree: GNU du 271
This function does a lot since it has to implement all of du 's options. Line 17 sets
p r int to true (1) ; the default is to print information about each file. Later code sets it
to false (0 ) if necessary.
Line 20 sets inf o - > skip based on the --exc lude option. Note that this excludes
subdirectories if a directory matches the pattern for - -exclude .
22 switch (file_type)
23
24 case FTW_NS :
25 error (0, errno, _( "cannot access %s " ) , quote (file)) ;
26 G_fail = 1 ; Se t global var for later
27 return 0 ; Return 0 to keep going
28
29 case FTW_DCHP :
30 error (0, errno, _("cannot change to parent of directory %s"),
31 quote (file)) ;
32 G_fail = 1 ;
33 return 0 ;
34
35 cas e FTW_DCH :
36 1* Don ' t return just yet , since although nftw couldn ' t chdir into the
37 directory, it was able to stat it , so we do hav e a size . * 1
38 error (0 , errno, _("cannot change to directory %s") , quote (file)) ;
39 G_fail = 1 ;
40 break ;
41
42 case FTW_DNR :
43 1* Don't return just yet , since although nftw couldn ' t read the
44 d i rectory, it was able to stat it , s o we do hav e a s i ze . *1
45 e r ro r (0 , e r rno, _("cannot read directory %s") , quote (file)) ;
46 G_fa i l = 1 ;
47 break ;
48
49 defaul t:
50 brea k;
51
52
53 1* If thi s i s the f irs t(pr e-o r de r ) encounter with a dire c to r y ,
54 return righ t away . * I
55 i f (file_type == FTW_DPRE )
56 return 0 ;
Lines 22-51 are the standard swi t ch statement. Errors for which there's no size in-
formation set the global variable G_fail to 1 and return 0 to keep going (see lines
24-27 and 29-33). Errors for which there is a size also set G_ f a i l but then break out
of the swi t ch in order to handle the statistics (see lines 35-40 and 42-47).
Lines 55-56 return early if this is the first time a directory is encountered.
58 / * If the file is being e x cluded or if it h a s already been counted
59 via a hard link, then don ' t let it cont r ibute to the sums . * /
60 if ( info->skip
61 II ( ! opt_coun call
62 & & 1 < s b->st_ nl i nk
63 && hash_ i ns (sb->st_ino, s b->st_dev) ))
64
65 / * Note that we mus t not simpl y return he r e .
66 We still have to update prev_level and maybe propagate
67 some sums up t h e h i e r archy . * /
68 size = 0;
69 print = 0;
70
71 else
72
73 si z e ( apparent_si z e
74 ? sb- >st_ si z e
75 : ST_ NBLOC KS ( * sb) * ST_NBLOC KSIZE ) ;
76
Now it starts to get interesting. By default, du counts the space used by hard-linked
files just once. The --c ount-li n ks option causes it to count each link's space; the
variable opt_coun t _all is true when - - c ount-links is supplied. To keep track of
links, du maintains a hash table 10 of already seen (device, inode) pairs.
Lines 60-63 test whether a file should not be counted, either because it was excluded
(in f o - >skip is true, line 60) or because --count - links was not supplied (line 61)
and the file has multiple links (line 62) and the file is already in the hash table (line 63).
In this case, the size is set to 0, so that it doesn 't add to the running totals, and print
is also set to false (lines 68-69).
If none of those conditions hold, the size is computed either according to the size in
the st r uct s t at or the number of disk blocks (lines 73-75) . This decision is based
on the apparent_si z e variable, which is set if the - - apparent - si z e option is used.
lOA hash table is a data structure that allows quick retrieval of stored information ; the details are beyond the scope
of this book.
8.5 Walking a File Tree: GNU du 273
18 if (first_call)
19 {
80 n_alloc = info->level + 10; Allocate arrays
81 sum_ent = XCALLOC (uintmax_t , n_alloc); to hold sums
82 sum_subdir = XCALLOC (uintma x_t, n_alloc);
83
84 else
85
86 / * FIXME : it's a shame that we need these 'si ze_t' casts to avoid
81 warnings from gcc about 'comparison between signed and unsigned ' .
88 Probably unavoidable, assuming that the members of struct FTW
89 are of type' int' (hi storical ), since I want variables like
90 n_alloc and prev_level to have types that make sense . */
91 if (n_all oc <= (s ize_t ) info->level)
92 {
93 n_alloc = info->level * 2; Double amount
94 sum_ent = XREALLOC (sum_ent, uintmax_t , n_alloc) ; And reallocate
95 sum_subdir = XREALLOC (sum_subdir , uintmax_t, n_alloc);
96
91
98
99 size_to-print = size;
Lines 78-97 manage the dynamic memOlY used to hold file size statistics. f irs t_call
is a stat i c variable (line 12) that is true the first time process_file () is called. In
this case, calloc () is called (through a wrapper macro on lines 81-82; this was dis-
cussed in Section 3.2.1.8, "Example: Reading Arbitrarily Long Lines," page 67) . The
rest of the time, first_call is false , and reall oc () is used (again, through a wrapper
macro, lines 91 -96).
Line 99 sets size_ to_print to size; this variable may be updated depending on
whether it has to include the sizes of any children. Although size co uld have been
reused, the separate variable makes the code easier to read.
101 if ( I first_call)
102 (
103 if ((size_t) info->level == prev_level)
104 (
105 / * This is usually the most common ca s e . Do nothing . */
106
107 else if ((size_t) info->level > prev_level)
108 {
109 /* Descending the hierarchy .
110 Clear the accumulators for *all* levels between prev_level
111 and the current one . The depth may chan ge dramatically,
112 e . g . , from 1 to 10. */
113 int i;
114 for ( i = prev_level + 1; i <= info->level; i++)
115 sUffi_ent[il = sum_subdir[il = 0;
116
11 7 e ls e /* info->level < prev_ Ieve l */

11 8 (
119 / * Ascending the hierarchy .
120 nftw process es a dir e ctory only after all entries in that
121 direct ory have been processed. When the depth decreases,
122 propagate sums from the children (prev_Ievel) to the paren t .
123 Here, the current level is always one smaller than the
124 previous one. */
125 assert ((size_t) info->leve l == prev_Ievel - 1);
126 size_to-print += sum_ent[prev_levell;
127 if (! opt_sepa ra te_dirs)
128 size_to-print += surn_subdir[prev_levell;
129 surn_subdi r[info->levell += (surn_ent[prev_levell
130 + surn_subdir[prev_leve ll);
131
132
Lines 101 - 132 compare the current level to the prevlOUS one. There are three
possible cases.
The levels are the same.

In this case, there's no need to worry about child statistics. (Lines 103-106.)
The current level is higher than the previous level.
In this case, we've gone down the hierarchy, and the statistics must be reset (lines
107-116). The term "accumulator" in the comment is apt: each element accumu-
lates the total disk space used at that level. (In the early days of computing, CPU
registers were often termed "accumulators.")
The current level is lower than the previous level.
In this case, we've finished processing all the children in a directory and have just
moved back up to the parent directory (lines 117-131). The code updates the
totals, including size_to-print.
134 prev_level info -> l e vel; Set static variables

135 first call 0;
136
137 / * Let the size o f a directory entry contribut e to the total for the
138 containing directory, unless --separa te-dirs (-5) is specified. */
139 if ( ! (opt_separate_dirs && I S_FTW_DIR_TYPE (file_type)))
140 surn_ent [info - >levell += size;
14 1
142 / * Even if t h is directory i s unreadable or we can ' t chd ir int o i t ,
143 do let its size contribute to the total, ... * /
144 tot_size + = size;
145
146 /* .. . but don ' t print out a to ta l for it, since without the size(s )
147 o f a ny potential entries, it could be very misleading . */
8.5 Walking a File Tree: GNU d u 275
148
149 return 0;
150
151 1* If we' r e not counting an entry, e . g . , because it's a ha r d link
152 t o a file we ' ve already counted ( and --count-links ) , then don't
153 print a line for it . *1
154 if ( ! print )
155 return 0;
Lines 134-135 set the stat i c variables prev_l e v e l and fi r s t_call so that they'll
have the correct values for a subsequent call to pr oc ess_file ( ) , ensuring that all the
previous code works correctly.
Lines 137-144 adjust statistics on the basis of options and the file type. The comments
and code are fairly straightforward. Lines 146-155 quit early if the information should
not be printed.
151 1* FIXME : This looks suspicious l y l ike it c o uld be s i mplified . * 1
158 if (( IS_FTW_DIR_TYPE ( f i le_type ) &&
159 ( info->level <= max_depth II inf o ->level == 0))
160 II ((opt_all && info-> l e v el <= max_depth) II info->level == 0))
16 1
162 print_only_size (size_t o-print ) ;
163 fputc ( ' \ t', stdout ) ;
164 if (arg_ length)
165 {
166 / * Print the file n ame , but without the ' . ' o r ' I . '
161 direct o ry suffi x that we may have added in main . * 1
168 1* Print everything before the pa r t we appended . * 1
169 fwrite ( f i le, arg_length , 1 , stdout) ;
110 1* Prin t everything after what we appended . * 1
111 fput s (f i le + arg_length + suffix_length
112 + ( file[arg_length + suffix_length] == ' I ' ) , stdout) ;
113
114 else
115
116 fpu t s ( f ile, stdout) ;
111
118 f p utc ('\n' , stdou t ) ;
119 fflu s h ( s tdout) ;
180
181
182 return 0 ;
183
The condition on lines 158-160 is confusing, and the comment on line 157 notes
this. The condition states: "If (1 a) the file is a directory and (1 b) the level is less than
the maximum to print (the - - max - d e pth and max_dept h variable) or the level is zero,
or (2a) all files should be printed and the level is less than the maximum to print, or
(2b) the level is zero," then print the file . (Yow! The post-5.0 version of du uses a
slightly less complicated condition for this case.)
Lines 162-1 79 do the printing. Lines 162-163 print the size and a TAB character.
Lines 164-173 handle a special case. This is explained later on in d u . c, on lines 524-529
of the file:
524 f* When dereferencing only command line arguments, we're using
525 nftw's FTW_PHYS flag, so a symlink-to-directory specified on
526 the command line wouldn ' t normally be dereferenced. To work
527 around that, we incur the overhead of appending ' f . ' (or' , ' )
528 now, and later removing it each time we output the name of
529 a derived file or directory name. *f
In this case, a r g _ length is true, so lines 164-1 73 have to print out the original
name, not the modified one. Otherwise, lines 174-117 can print the name as it is.
Whew! That's a lot of code. We find this to be on the upper end of the complexity
spectrum, at least as far as what can be easily presented in a book of this nature. How-
ever, it demonstrates that real-world code is often complex. The best way to manage
such complexity is with clearly named variables and detailed comments. du . c is good
in that respect; we were able to extract the code and examine it fairly easily, without
having to show all 735 lines of the program!
8.6 Changing the Root Directory: chroot ( )

The current working directory, set with chdir () (see Section 8.4.1, "Changing
Directory: chdir () and fchd i r ( ) ," page 256) , is an attribute of the process , just like
the set of open files. It is also inherited by new processes.
Less well known is that every process also has a current root directory. It is this direc-
tory to which the pathname / refers. Most of the time, a process's root and the system
root directories are identical. However, the superuser can change the root directory,
with the (you guessed it) c hr oot () system call:
#include <unistd.h> Common
int chroot(const char *path ) ;
The return value is 0 upon success and - 1 upon error.

As the GNU/Linux chroot(2) manpage points out, changing the root directory does
not change the current directory: Programs that must make sure that they stay under-
neath the new root directory must also execute c h d ir () afterwards:
8.7 Summary 277
if (chroot( " /new/root") < 0) Set new root directory

/* handle error * /
if (chdi r ( " /some/d i r") < 0) Pathnarnes now relative to new root
/* handle error */
The chro o t () system call is used most often for daemons-background programs
that must run in a special, contained environment. For example, consider an Internet
FTP daemon that allows anonymous FTP (connection by anyone, from anywhere,
without a regular username and password). Obviously, such a connection should not
be able to see all the files on the whole system. Instead, the FTP daemon does a
c hr oo t () to a special directory with just enough structure to allow it to function. (For
example, its own I bin / 1s for listing files , its own copy of the C runtime library if it's
shared, and possibly its own copy of l e t c / passwd and l et e/ gr oup to show a limited
set of user and group names.)
POSIX doesn ' t standardize this system call, although GNU/Linux and all Unix sys-
tems suppOrt it. (It's been aro und since V7.) It is specialized, but when you need it, it's
very handy.
8.7 Summary
• Filesystems are collections of free, inode, metadata, and data blocks , organized in
a specific fashion. Filesystems correspond one-to-one with the (physical or logical)
partitions in which they are made. Each filesys tem has its own root directory; by
convention the root direcrory always has inode number 2 .
• The mount command mounts a filesystem , grafting it onto the logical hierarchical
file namespace. The umount command detaches a filesystem. The kernel arranges
for I . and I .. to be the same; the root directory of the entire namespace is its
own parent. In all other cases, the kernel arranges for ' . . ' in the root of a
mounted filesystem to point to the parent directory of the mount point.
• Modern Unix systems suppOrt multiple types of filesystems. In particular, Sun's
Network File System (NFS) is universally supported, as is the ISO 9660 standard
format for CD-ROMs, and MS-DOS FAT partitions are supported on all Unix
systems that run on Intel x86 hardware. To our knowledge, Linux supports the
largest number of different filesystems-well over 30! Many are specialized,
but many others are for general use, including at least four different journaling
filesystems.
• The / e tc/ f s tab file lists each system's partitions, their mount points, and any
relevant mount options. / etc /mtab lists those filesystems that are currently
mounted, as does /p r oc/ mount s on GNU/Linux systems. The loop option to
mount is particularly useful under GNU/Linux for mounting filesystem images
contained in regular files , such as CD-ROM images. Other options are useful for
security and for mounting foreign filesystems, such as Windows v fa t filesystems.
• The /e t c/ f s tab-format files can be read with the ge tmn tent () suite of routines.
The GNU/Linux format is shared with several other commercial Unix variants,
most notably Sun's Solaris.
• The statv fs ( ) and fst atv f s () functions are standardized by POSIX for re-
trieving filesystem information , such as the number of free and used disk blocks,
the number of free and used inodes, and so on. Linux has its own system calls for
retrieving similar information: s ta tf s () and f s ta tf s ( ) .
• ehdi r () and f ehdir () let a process change its current directory. ge t ewd () re-
trieves the absolute pathname of the current directory. These three functions are
straightforward to use.
• The n f tw () function centralizes the task of "walking a file tree," that is, visiting
every filesystem object (file, device, symbolic link, directory) in an entire directory
hierarchy. Different Bags control its behavior. The programmer then has to provide
a callback function that receives each file's name, a s t r ue t s ta t for the file , the
file 's type, and information about the file 's name and level in the hierarchy. This
function can then do whatever is necessary for each file. The Coreutils 5.0 version
of GNU du uses an extended version of nftw () to do its job.
• Finally, the ehr oot () system call changes a process's current root directory. This
is a specialized but important facility, which is particularly useful for certain dae-
mon-style programs.
Exercises
1. Examine the mount(2) manpage under GNU/Linux and on as many other

different Unix systems as you have access to. How do the system calls differ?
2. Enhance eh08 - s t a t v fs. e to take an option giving an open integer file descrip-
tor; it should use f st atv fs ( ) to retrieve filesystem information.
8.8 Exercises 279
3. Enhance ch08 - s t atvfs . c to not ignore NFS-mounted filesystems. Such

filesystems have a device of the form server . e x ample . com : / b i g / d i sk.
4. Modify ch08 - statfs . c (the one that uses the Linux-specific statfs () call)
to produce output that looks like that from d f.
5. Add a - i option to the program you wrote for the previous exercise to produce
output like that of 'df - i' .
6. Using opendi r () , readdi r () , stat () or fsta t ( ) , d i r f d (), and f chdi r () ,
write your own version of getcwd ( ) . How will you compute the total size the
buffer needs to be? How will you move through the directory hierarchy?
7. Enhance your version of getcwd () to allocate a buffer for the caller if the first
argument is NUL L .
8. Can you use nftw () to write getcwd ( ) ? If not, why not?
9. Using n ft w () , write your own version of chown that accepts a -R option to
recursively process entire directory trees. Make sure that without -R, ' chown
user directory' does not recurse. How will you test it?
10. The BSD fts () (" file tree stream") suite of routines provides a different way
to process directory hierarchies. It has a somewhat heftier API , in terms of both
the number of functions and the struct it makes available to the user-level
function that calls it. These functions are available as a standard part of CLIBe.
Read the ft5(3) manpage. (It may help you to print it and have it handy.) Rewrite
your private version of chown to use fts ( ) .
11. Look at the find( 1) man page. If you were to try to write f i nd from scratch,
which file tree suite would you prefer, n ftw () or ft s ( ) ? Why?
Chapter 9 Process Management and Pipes page 283
Chapter 10 Signals page 347
Chapter 11 Permissions and User an d Group ID Num b ers page 403
Chapter 12 General Library Interfaces - Part 2 page 427
Chapter 13 Internationalization an d Local izati on page 485
Chapter 14 Exte nd ed Interfaces page 529
281
In this chapter
• 9.1 Process Creation and Management page 284
• 9.2 Process Groups page 312

• 9.3 Basic Interprocess Communication: Pipes and FIFOs page 315
• 9.4 File Descriptor Management page 320
• 9.5 Example: Two-Way Pipes in gawk page 337
283
A s we said in Chapter I, "Introduction," page 3, if you were to summarize Unix
(and thus Linux) in three words, they would have to be "files and processes."
Now that we've seen how to work with files and directories, it's time to look at the
rest of the story: processes. In particular, we examine how processes are created and
managed, how they interact with open files, and how they can communicate with
each other. Subsequent chapters examine signals-a coarse way for one process
(or the kernel) to let another know that some event has occurred- and
permission checking.
In this chapter the picture begins to get more complicated. In particular, to be
fairly complete, we must mention things that aren' t covered until later in the
chapter or later in the book. In such cases, we provide forward references, but you
should be able to get the gist of each section without looking ahead.
9.1 Process Creation and Management

Unlike many predecessor and successor operating systems, process creation in Unix
was intended to be (and is) cheap. Furthermore, Unix separated the idea of "create a
new process" from that of "run a given program in a process." This was an elegant design
decision, one that simplifies many operations.
9 .1 .1 Creating a Process : fork ( )

The first step in starting a new program is calling f ork ( ) :
#include <sys /types.h> POSIX
pid_t fork(void ) ;
Using f o r k () is simple. Before the call, one process, which we term the parent, is
running. When f o rk () returns, there are two processes: the parent and the child.
Here is the key: The two processes both run the same program. The two processes can
distinguish themselves based on the return value from f o rk ( ) :
Negative
If there is an error, f o r k ( ) returns - 1 , and no new process is created. The original
. .
process contmues runl1lng.
Zero
In the child, f o rk ( ) returns o.
284
9. 1 Process Creation and Management 285
Positive
In the parent, f ork () returns the positive process identification number (PI D)
of the child.
Boilerplate code for creating a child process looks like this :
if ( ( c h ild = fork ()) < 0)

/* han dle error */
else if (c hi l d == 0)
/* this is t he new p rocess */
else
/ * this is the o rig inal parent proce ss * /
The pid_ t is a signed integer type for holding PID values. It is m ost likely a plain
i nt, but it makes code more self-documenting and should be used instead of i nt o
In Unix parlance, besides being the name of a system call, the word "fo rk" is both a
verb and a noun. W e might say that "one process forks another," and that "after the
fork, two processes are running." (Think "fork in a road" and not "fork , knife
and spoon. ")
9.1 .1.1 After the fork ( ) : Shared and Distinct Attributes

The child "inherits" identical copies of a large number of attributes from the parent.
Many of these attributes are specialized and irrelevant here. Thus, the foll owing list in
purposely incomplete. T he following attributes are the relevant ones:
• The environment; see Section 2.4, "The Environment, " page 40.
• All open files and open directories; see Section 4.4.1, "Understanding File Descrip-
tors, " page 92, and see Section 5.3 .1, "Bas ic Directory Reading," page 133.
• The umask setting; see Section 4.6 , "Creating Files," page 106.
• The current working directory; see Section 8.4.1, "Changing Directory: c hdir ( )
and fchdir ( ) ," page 256.
• T he root directory; see Section 8.6, "Changing the Root Directory: chroot ( ) ,"
page 276 .
• The current priority (a.k.a. "nice d Iue"; we discuss this shortly; see Section 9.1.3,
"Setting Process Priority: nice () ," page 291).
286 Chapter 9 • Process Management and Pipes
• The controlling terminal. This is the terminal device (physical console or terminal-
emulator window) that is allowed to send signals to a process (such as CTRL-Z
to stop running jobs). This is discussed later, in Section 9.2.1, "Job Control
Overview," page 312.
• The process signal mask and all current signal dispositions (not discussed yet; see
Chapter 10, "Signals," page 347).
• The real, effective, and saved set-user and set-group IDs and the supplemental
group set (not discussed yet; see Chapter 11, "Permissions and User and Group
ID Numbers," page 403).
Besides the fork () return value, the rwo processes differ in the following ways:
• Each one has a unique process ID and parent process ID (PID and PPID). These
are described in Section 9.1.2, "Identifying a Process: g e tpid () and getpp id () ,"
page 289.
• The child's PID will not equal that of any existing process group ID (see Sec-
tion 9.2, "Process Groups," page 312).
• The accumulated CPU times for the child process and its future children are ini-
tialized to zero . (This makes sense; after all, it is a brand-new process.)
• Any signals that were pending in the parent are cleared in the child, as are any
pending alarms or timers. (We haven't covered these topics yet; see Chapter 10,
"Signals," page 347, and see Section 14.3.3 , "Interval Timers: seti timer () and
geti timer ( ) ," page 546.)
• File locks held by the parent are not duplicated in the child (also not discussed
yet; see Section 14.2, "Locking Files," page 531 ).
9 .1.1.2 File Descriptor Sharing

The attributes that the child inherits from the parent are all set to the same values
they had in the parent at the time of the f ork ( ) . From then on, though, the rwo pro-
cesses proceed on their merry ways, (mostly) independent of each other. For example,
if the child changes directory, the parent's directory is not affected. Similarly, if the
child changes its environment, the parent's environment is not changed.
9.1 Process Crea(io n and Managemenr 287
Open fi les are a significant exceptio n to this rule. Open fi le descriptors are shared,
and an action by one process on a shared file descriptor affects the state of the fi le for
the other process as well. This is bes t understood after study o f Figure 9.l.
pro 42 pro 45
Descriptor tables , r r r r r r
0 r 1 r 2 r 0 r 1 r 2 r
per process: r r r r r r
Bookkeeping
I!l T:j
lr Bo okkeeping lr Bookkeeping lr Bookkeeping lr
File table: .------------t---- --------"i------ ------\-------------j
Offset ! Off set ! Of fset: ! Or fset !
j j
Fil e contents:
FIGURE 9.1
File descriptor sharing
The figure displays the kernel's internal data structures. The key data structure is the
file table. Each element refers to an open file. Besides other bookkeeping data, the file
table maintains the current positio n (read/wri te offset) in the file. This is adj usted either
automatically each time a file is read or written or directly with 1 seek () (see Section 4.5,
"Random Access: Moving Around within a File," page 102).
The file descriptor returned by open () or c r e at ( ) acts as an index into a per-process
array of pointers into the file table. This per-process array won't be any larger than the
val ue returned by getdtab l e si ze () (see Section 4.4.1, "Understanding File Descrip-
tors ," page 92).
Figure 9.1 shows two processes sharing standard input and standard output; for each,
both point to the same entries in the file table. Thus, when process 45 (the child) does
a r ead ( ), the shared offset is updated; the next time process 42 (the parent) does a
read ( ), it starts at the position where process 45 's read () finished.
T his can be seen easily at the shell level :
$ cat data Show demo data file contents

line 1
line 2
line 3
line 4
$ ls -1 test1 ; cat testl Mode and contents of test program
-rwxr-xr-x 1 arnold devel 93 Oct 20 22:11 test1
# ! /bin/ sh
read line ; echo p: $line Read a line in parent shell, print it
( read 1 ine ; echo c : $1 ine Read a line in child shell, print it
read line ; echo p : $line Read a line in parent shell, print it
$ testl < data Run the program
p: 1 ine 1 Parent starts at beginning
c : 1 ine 2 Child picks up where parent left off
p : line 3 Parent picks up where child left off
The first executable line of t estl reads a line from standard input, changing the
offset in the file. The second line of t estl runs the commands enclosed between the
parentheses in a subshell. This is a separate shell process created-you guessed it-with
fork ( ). The child subshell inherits standard input from the parent, including the
current file offset. This process reads a line and updates the shared offset into the file.
When the third line, back in the parent shell, reads the file , it starts where the child
left off.
Although the r e a d command is built into the shell, things work the same way with
external commands. Some early Unix systems had a line command that read one line
of input (one character at a time!) for use within shell scripts; if the file offset weren't
shared, it would be impossible ro use such a command in a loop.
File descriptor sharing and inheritance playa pivotal role in shell I/O redirection;
the system calls and their semantics make the shell-level primitives straightforward to
implement in C, as we see later in the chapter.
9.1.1.3 File Descriptor Sharing and close ( )

The fact that multiple file descriptors can point at the same open file has an important
consequence: A file is not closed until all its file descriptors are closed.
We see later in the chapter that multiple descriptors for the same file can exist not
only across processes but even within the same process; this rule is particularly important
for working with pipes.
If you need to know if two descriptors are open on the same file, you can use f s t a t ( )
(see Section 5.4.2, "Retrieving File Information," page 141) on the two descriptors
9.1 Process Creatio n and Managemenr 289
with two different struct stat structures. If the corresponding st_dev and st_ino
fields are equal, they're the same file.
We co mplete the discussion of file descriptor manipulation and the file descriptor
table later in the chapter.
9.1.2 Identifying a Process: getpid () and getppid ( )

Each process has a unique process 10 number (the PID). Two system calls provide
the current PID and the PID of the parent process:
# include <sys/types . h > POSIX
#include <unistd . h >
pid_t getpid (v o id ) ;
pid_t getpp id (void ) ;
The functions are about as simple as they come:

pid_t getpid (vo id) Returns the PID of the current process.
pid_t getpp id(void) Returns the parent's PID.
PID values are unique; by definition there cannot be two running processes with the
same PID . PIDs usually increase in value, in that a child process generally has a higher
PID than its parent. On many sys tems, however, PIO values wrap around; when the
sys tem maximum value for PIDs is exceeded, the next process created will have the
lowest unused PID number. (No thing in POSIX requires this behavior, and some sys-
tems assign unused PID numbers randomly.)
If the parent dies or exits, the child is given a new parent, ini t . In this case, the new
parent PID will be 1, which is init's PID. Such a child is termed an orphan. The fol-
lowing program, ch0 9-reparent. c, demonstrates this. This is also the first example
we've seen of fork () in action:
1 / * ch09-repa ren t . c --- show that ge tppid () c an change values * /
2
3 #include <std i o . h>
4 #inc l ude <errno . h>
5 # inc lude <sys /types . h>
6 #include <uni std . h >
7
8 / * ma i n -- - d o the work */
9
10 int main ( int argc, char **argv )

11
12 pid_t pid, old-ppid, new-ppid;
13 pid_t child, parent;
14
15 parent = getpid () ; / * bef or e f o rk () */
16
17 if ((child = fork ()) < 0 ) {
18 fprintf(stderr, "%5 : fork of child failed: %s \ n",
19 argv[ O], strerror ( errno )) ;
20 exit (1) ;
21 else if ( child == 0 ) {
22 old-ppid = getppid () ;
23 sleep ( 2 ) ; / * see Chapter 1 0 * /
24 new-ppid = getppid () ;
25 else {
26 sleep ( l ) ;
27 e x it (0) ; / * parent exits after f ork () * /
28
29
30 / * on l y the child executes this * /
31 printf ( "Origina l parent : %d \ n", parent) ;
32 printf("Child : %d\n", getpid ()) ;
33 printf ( "Child's o ld ppid : %d\n" , old-ppid ) ;
34 printf ( "Child's new ppid: %d \ n", new-ppid ) ;
35
36 exit (O) ;
37
Line 15 retrieves the PID of the initial process , using getp i d ( ) . Lines 17-20 fork
the child, checking for an error return.
Lines 21-24 are executed by the child: Line 22 retrieves the PPID. Line 23 suspends
the process for two seconds (see Section 10.8.1 , "Alarm Clocks: s leep(), al arm () ,
and S I GALRM," page 382, for information about s leep () ), and then line 24 retrieves
the PPID again.
Lines 25-27 run in the parent. Line 26 delays the parent for one second, giving the
child enough time to make the first ge tpp id ( ) call. Line 27 then exits the parent.
Lines 31-34 print the values. Note that the parent variable, which was set before
the fork, still maintains its value in the child. After forking, the two processes have
identical but independent copies of their address spaces. Here's what happens when
the program runs:
9. 1 Process C reat ion and Management 291
$ ch09-reparent Run the program

$ Original parent: 6 5 82 Program finishes: shell prompts and child prints
Child : 6583
Child ' s old ppid : 65 82
Child's new ppid : 1
Remember that the two programs execute in parallel. This is depicted graphically in
Figure 9.2.
PID 6582 PID 6 583 Initially. only one process

Tim e
ch i ld = f o rk () ; Create child
sleep(l) ; old-ppid = g etppid () ; Parent sleeps, child calls g e tppi d ()
e xit (0) ; sleep(2) ; Parent exits. chi ld sleeps
6583 repa rented Continues sleeping Reparent child while it's asleep
new-ppi d = getppid() ; Orph an child ca lls getppid ( )
FIGURE 9.2
Two processes running in parallel after forking
I
I NOTE The use of sleep () to have one process outlive another wo rks most
ofthe time . However, occasionally it fails , lead ing to hard-to-reproduce a nd
I
w hard-to-find bugs . The only way to guarantee correct behavior is explicit
f.: synchronizati on w ith wai t () or wai tpid ( ) , w hich are described further on
,. in the chapter (see Section 9 .1 .6.1, " Using POSIX Functions: wait () and
% wai tpid ( ) ," page 306).
E
9.1.3 Setting Process Priority: nice ()

As processes run, the kernel dynamically changes each process's priority. As in life,
higher-priority items get attention before lower-priority ones. In brief, each process is
allotted a small amount of time in which to run, called its time slice. When the time
slice finishes, if the current process is still the one with the highest priority, it is allowed
. .
to contl11ue runnmg.
Linux, like Unix, provides preemptive multitasking. This m eans that the kernel can
preempt a process (pause it) if it's time to let another process run. Processes that have
been running a lot (fo r example, co mpute-intensive processes) have their priority lowered
at the end of their time slice, to let other processes have a chance at the processor.
Similarly, processes that have been idle while waiting for I/O (such as an interactive
text editor) are given a higher priority so that they can respond to the I/O when it
happens. In short, the kernel makes sure that all processes, averaged over time, get their
"fair share" of the cpu. Raising and lowering priorities are part of this process.
Designing a good process scheduler for the kernel is an art; the nitty-gritty details
are beyond the scope of this book. However, a process can influence the kernel 's prior-
ity assignment algorithm by way of its nice value.
The nice value is an indication of "how nice" the process is willing to be toward
other processes. Thus , higher nice values indicate increasingly more patient processes;
that is, ones that are increasingly nice toward others, lowering their priority with respect
to that of other processes.
A negative nice value, on the other hand, indicates that a process wishes to be "less
nice" towards others. Such a process is more selfish, wanting more CPU time for itself.!
Fortunately, while users can increase their nice value (be more nice), only root can
decrease the nice value (be less nice) .
The nice value is only one factor in the equation used by the kernel to compute the
priority; the nice value is not the priority itself, which varies over time, based on the
process's behavior and the state of other processes in the system. To change the nice
value, use the n i c e () system call:
# incl u d e <uni s t d . h> XSI
i n t n i ce ( int i nc ) ;
The default nice value is o. The allowed range for nice values is -20 to 19 . This takes
some getting used to. The more negative the value, the higher the process's priority:
- 2 0 is the highest priority (least nice), and 1 9 is the lowest priority (most nice).
The inc argument is the increment by which to change the nice value. Use 'nice (0) ,
to retrieve the current value without changing it. If the result of ' c ur rent_ni ce_val ue
+ i nc ' would be ou tside the range - 20 to 1 9 , the system forces the result to be inside
the range.
1 Such processes often display childlike behavior.

9.1 Process Crearion and Managemenr 293
The return value is the new nice value or -1 if there was an error. Since -1 is also a
valid nice val ue, when calling n i ce () you must explicitly set errno to zero first, and
then check it afterwards to see if there was a problem:
int niceval ;
int inc = / * whatever * / ;
e rrno = 0 ;
if (( niceva l = nice(inc )) < 0 && er r no ! = 0)
fprintf ( stderr , "nice( %d) failed : %s \n ", inc, strerro r ( errno)) ;
/ * other rec o very * /
This example can fai l if ine has a negative value and the process is not running as root.
9.1.3.1 POSIX vs. Reality

The nice value range of -2 0 to 19 that Linux uses is historical; it dates back at least
as far as V7. POSIX expresses the situation in more indirect language, which allows for
implementation flexibility while maintaining historical compatibility. It also m akes the
standard harder to read and understand, bur then, that's why you're reading this book.
So, here's how POSIX describes it.
First, the process's nice vaLue as maintained by the system ranges from 0 to ' (2 *
NZERO ) - 1 ' . The constant NZERO is defined in <limits . h> and must be at least 20.
This gives us the range 0-39.
Second, as we described, the sum of the current nice value and the ine r increment
is forced into this range.
Finally, the return value from ni ce () is the process nice value minus NZERO . With
an NZERO value of 20, this gives us the origi nal - 2 0 to 19 range that we initially de-
scribed.
The upshot is that nice ( ) 's return value actually ranges from ' - NZERO' to 'NZERO-1',
and it's best to write your code in terms of that symbolic constant. However, practically
speaking, you're unlikely to find a system where NZERO is not 2 O.
9.1.4 Starting New Programs: The exec () Family

Once a new process is running (through fork ( )), the next step is to start a different
program runnmg in the process . There are multiple functions that serve different
purposes:
in t execve ( c ons t char *filename, char *const argv[] , System call

char *const envp[]);
int execl(const char *path, const char *arg, . .. ) ; Wrappers

int execlp( c o ns t char *file, const char *arg, ... ) ;
int execl e ( const char *path, const char *arg, ... , char *const envp []);
int execv( const char *path, char *const argy l]);
in t execvp(const char *file , char *const argYl ]) ;
We refer to these functions as the "exec () family." There is no function named

exec ( ) ; instead we use this function name to mean any of the above listed functions.
As with fork ( ) , "exec" is used in Unix parlance as a verb, meaning to execute (run) a
program, and as a noun.
9.1.4.1 The execve () System Call

The simplest function to explain is execve ( ) . It is also the underlying system call.
The others are wrapper functions, as is explained shortly.
int execve(const char *filename , char *cons t argv[],

char *const envp[])
filename is the name of the program to execute. It may be a full or relative
pathname. The file must be in an executable format that the kernel understands.
Modern systems uses the ELF (Extensible Linking Format) executable format.
GNU/Linux understands ELF and several others. Interpreted scripts can be exe-
cuted with execve () if they use the '#!' special first line that names the interpreter
to use. (Scripts that don't start with '#!' will fail.) Section 1.1.3, "Executable
Files," page 7, provides an example use of'#! ' .
argv is a standard C argument list-an array of character pointers to argument
strings, including the value to use for argv [0] , terminated with a NULL pointer.
envp is the environment to use for the new process, with the same layout as the
environ global variable (see Section 2.4, "The Environment," page 40). In the
new program, this environment becomes the initial value of environ.
A call to exec ( ) should not return. Ifit does, there was a problem. Most commonly,
either the requested program doesn't exist, or it exists but it isn't executable (ENOENT
and EACC ES for errno, respectively). Many more things can go wrong; see the
execve(2) manpage.
9.1 Process Crearion and Managemem 295
Assuming that the call succeeds, the current contents of the process's address space
are thrown away. (The kernel does arrange to save the argv and envp data in a safe
place first.) The kernel loads the executable code for the new program, along with any
global and static variables . Next, the kernel initializes the envitonment with that
passed to execve ( ) , and then it calls the new program's main () routine with the argv
array passed to execve (). It counts the number of arguments and passes that value to
main () in argc.
At that point, the new program is running. It doesn't know (and can't find our)
what program was running in the process before it. Note that the process ID does not
change. Many other attrib utes remain in place across the exec; we cover this in more
detail shortly.
In a loose analogy, exec () is to a process what life roles are to a person. At different
times during the day, a single person might function as parent, spouse, friend , student
or worker, store customer, and so on. Yet it is the same underlying perso n performing
the different roles . So too, the process-its PID, open files, current directory,
etc.-doesn't change, while the particular job it's doing-the program run with
exec ( ) - can.
9.1.4.2 Wrapper Functions: execl () et al.

Five additional functions, acting as wrappers, provide more convenient interfaces to
execve ( ). The first group all take a list of arguments, each one passed as an explicit
function parameter:
int execl(const char *path , const cha r *arg, . . . )

The first argument, path, is the pathname of the file to execute. Subsequent argu-
ments, starting with arg, are the individual elements to be placed in argv. As
before, argv [01 must be explicitly included. You must pass a terminating NULL
pointer as the final argument so that execl () can tell where the argument list
ends. The new program inherits whatever environment is in the current program's
environ variable.
int execlp(const char *fi le, const char *arg, ... )
This function is like execl ( ) , but it simulates the shell's command searching
mechanism, looking for file in each directory named in the PATH environment
variable. If file contains a / character, this search is not done. If PATH isn' t present
in the environment, execlp () uses a default path. On GNU/Linux, the default

is " : /b in: / usr /bin" but it may be different on other systems. (Note that the
leading colon in PATH means that the current directory is searched first.)
Furthermore, if the file is found and has execute permission but cannot be exec'd
because it isn't in a known executable format, execlp () assumes that the program
is a shell script, and execs the shell with the filename as an argument.
int execle(const char *path, const char *arg,
char *const envp [] )
This function is also like execl ( ) , but it accepts an addi tional argument, envp,
which becomes the new program's environment. As with execl (), you must
supply the terminating NULL pointer to end the argument list, before envp.
The second group of wrapper functions accepts an argv style array:
int execv(const char *path, char *const argv[])

This function is like execve ( ) , but the new program inherits whatever environ-
ment is in the current program's environ variable.
int execvp(const char *file, char *const argv[])
This function is like execv ( ) , but it does the same PATH search that execlp ( )
does. It also does the same falling back to exec'ing the shell if the found file cannot
be executed directly.
Table 9.1 summarizes the six exec () functions.
TABLE 9.1
Alphabetical exec () family summary
Function Path search Uses environ Purpose

execl () Execute arg list.
execle () Execute arg list with environment.
execlp () ,/ Execute arg list by path search.
execv ( ) ,/ Execute with argv.
execve () Execute with argv and enviro nment
(system call).
execvp( ) Execute with argv by path search.
9. 1 Process Crearion and M anagement 297
The execlp () and execvp () functions are best avoided unless you know that the
PATH environment variable contains a reasonable list of directories.
9.1.4.3 Program Names and argv [0 1

Until now, we have always treated a r gv [0 J as the program name. We know that it
mayor may not contain a / character, depending on how the program is invoked; if it
does, then that's usually a good clue as to the path name used to invoke the program.
However, as should be clear by now, a r gv [0 J being the filename is only a convention.
There's nothing stopping you from passing an arbitrary string to the exec'd program
for a r gv [0 J. The following program, ch09 -run. c , demonstrates passing an arb itrary
string:
/ * ch09-run . c --- run a pr ogram with a different name and any arguments * /
2
6
7 / * main --- adjust argv and run named program * /
8
9 int main ( int argc, char **argv )
10
11 c har *path;
12
13 if (argc < 3)
14 fprintf ( stde rr, "u sage : %s path argO [ arg .. . J \ n", argv [O]) ;
15 exit ( l ) ;
16
17
18 path = argv [ l];
19
20 execv (path, argv + 2) ; / * skip argv[O] and argv[l ] * /
21
22 fprintf (stderr , "%S : execv () failed : %s\n " , argv[ O],
23 strerro r (errno)) ;
24 exit ( l ) ;
25
The first argument is the path name of the program to run and the second is the new
name for the program (which most utilities ignore, other than for error messages) ; any
other arguments are passed on to the program being exec'd.
Lines 13-16 do error checking. Line 18 saves the path in pa th o Line 20 does the
exec; iflines 22- 23 run, it's because there was a problem. Here's what happens when
we run the program:
$ ch09-run /bin/grep whoami foo Rungrep

a line Input line doesn 't match
a line with foo in it Input line that does match
a line with foo in it It's printed
'D £OF
S ch09-run nonexistent-program foo bar Demonstrate failure

ch09-run : execv( ) failed : No such file or directory
This next example is a bit bizarre: we have ch09 - run run itself, passing 'fo o' as the
program name. Since there aren't enough arguments for the second run, it prints the
usage message and exits:
$ ch09-run ./ch09-run foo
usage: f oo path argO [ arg ... J
While not very useful, chO 9 - run clearly shows that argv [01 need not have any rela-
tionship to the file that is actually run.
In System III (circa 1980), the cp, In, and mv commands were one executable file,
with three links by those names in Ibin. The program would examine argv [Oland
decide what it should do. This saved a modest amount of disk space, at the expense of
complicating the source code and forcing the program to choose a default action if in-
voked by an unrecognized name. (Some current commercial Unix systems continue
this practice!) Without stating an explicit reason, the GNU Coding Standards recom-
mends that a program not base its behavior upon its name. One reason we see is that
administrators often install the GNU version of a utiliry alongside the standard ones
on commercial Unix systems, using a g prefix: gmake, gawk, and so on. If such programs
expect only the standard names, they'll fail when run with a different name.
Also, today, disk space is cheap; if two almost identical programs can be built from
the same source code, it's better to do it that way, using #ifdef or what-have-you. For
example, grep and egrep share considerable code, but the GNU version builds two
separate executables.
9.1.4.4 Attributes Inherited across exec ( )

As with fork (), a number of attributes remain in place after a program does an exec:
• All open files and open directories; see Section 4.4.1, "Understanding File Descrip-
tors," page 92, and see Section 5.3.1, "Basic Directory Reading," page 133. (This
doesn 't include files marked close-on-exec, as described later in the chapter; see
Section 9.4.3.1, "The Close-on-exec Flag," page 329.)
• The umask setting; see Section 4.6, "Creating Files," page 106.
• The current working directory; see Section 8.4.1, "Changing Directory: chdir ( )
and fchdir () ," page 256.
• The root directory; see Section 8.6, "Changing the Root Directory: chroot ( ) ,"
page 276.
• The current nice value.
• The process ID and parent process ID.
• The process group ID; see Section 9.2, " Process Groups," page 312.
• The session ID and the controlling terminal; for both, see Section 9.2.1 , "Job
Control Overview, " page 312.
• The process signal mask and any pending signals, as well as any unexpired alarms
or timers (not discussed yet; see Chapter 10, "S ignals, " page 347).
• The real user ID and group IDs and the supplemental group se t. The effective
user and group IDs (and thus the saved set-user and set-group IDs) can be set by
the setuid and setgid bits on the file being exec'd. (No ne of this has been discussed
yet; see Chapter 11, "Permissions and User and Group ID Numbers," page 403).
• File locks remain in place (also not discussed yet; see Section 14.2, "Locking Files,"
page 531).
• Accumulated CPU times for the process and its children don't change.
After an exec, signal disposition changes; see Section 10.9, "Signals Across fork ()
and exec ( ) ," page 398 , for more information.
All open files and directories remain open and available after the exec. This is how
programs inherit standard input, output, and error: They're in place when the program
starts up.
Most of the time, when yo u fork and exec a separate program, you don't want it to
inherit anything but file descriptors 0, 1, and 2 . In this case, you can manually close
all other open files in the child, after the fork but before the exec. Alternatively, yo u
can mark a file descriptor to be automatically closed by the system upon an exec; this
latter option is discussed later in the chapter (see Section 9.4.3.1, "The Close-on-exec
Flag," page 329).
300 Chapter 9 • Process Managemem and Pipes
9.1.5 Terminating a Process

Process termination involves two steps: The process exits, passing an exit status to
the system, and the parent process recovers the information.
9.1.5.1 Defining Process Exit Status

The exit status (also known variously as the exit value, return code, and return value)
is an 8-bit value that the parent can recover when the child exits (in Unix parlance,
"when the child dies"). By convention, an exit status of 0 means that the program ran
with no problems. Any nonzero exit status indicates some sort of failure; the program
determines the values to use and their meanings, if any. (For example, grep uses 0 to
mean that it matched the pattern at least once, 1 to mean that it did not match the
pattern at all, and 2 to mean that an error occurred.) This exit status is available at the
shell level (for Bourne-style shells) in the special variable $?
The C standard defines two constants, which are all you should use for strict porta-
biliry to non-POSIX systems:
EXIT SUCCESS
The program exited with no problems. Zero can also be used to mean success.
EXIT_FAILURE
The program had some kind of problem.
In practice, using only these values is rather constraining. Instead, you should pick
a small set of return codes, document their meanings, and use them. (For example, 1
for command-line option and argument errors, 2 for I/O errors, 3 for bad data errors,
and so on.) For readability, it pays to use #defined constants or an enum for them.
Having too large a list of errors makes using them cumbersome; most of the time the
invoking program (or user) only cares about zero VS. nonzero.
When the binary success/failure distinction is adequate, the pedantic programmer
uses EXIT_S UCCESS and EXIT_FAILURE. Our own style is more idiomatic, using the
explicit constants 0 or 1 with return and exi t ( ) . This is so common that it is learned
early on and quickly becomes second natute. However, you should make your own
decision for your own projects.
9.1 Process Crearion and M anagemenr 301
I
;~\ NOTE Only the least-significant eight bits of the value are available to the
! parent process. Thus, you should use values in the range 0-255 . As we' ll see
shortly, 126 and 1 2 7 have a conventional meaning (above and beyond plain
_ "unsuccessful "), to which your programs should adhere.
%
hl
"!~ Since only the least-significant eight bits matter, you should never use a
, negative exit status. When the last eight bits are retrieved from small negative
jl~ numbers, they become large positive values! (For example, -1 becomes 255 ,
I
ffi and -5 becomes 251 .) We have seen C programming books that get this
I wrong-don 't be misled.
9.1.5.2 Returning from main ( )

A program can terminate voluntarily in one of two ways: by using one of the functions
described next or by returning from main ( ) . (A third, more drastic, way, is described
later, in Section 12.4, "Committing Suicide: abo r t ( ) ," page 445.) In the latter case,
you should use an explicit return value instead of falling off the end of the function:
/ * Good : * / / * Bad : * /
int main(int argc, char **argv) int main(int argc , char **argv )
/* code here * / / * code here * /

return 0 ; /* ?? What does main() return ?? */
The 1999 C standard indicates that when main () returns by falling off the end, the
behavior is as if it had returned o. (This is also true for C++; however, the 1989 C
standard leaves this case purposely undefined.) In all cases, it's poor practice to rely on
this behavior; one day you may be ptogramming for a system with meager C runtime
support or an embedded system, or somewhere else where it will make a difference. (In
general, falling off the end of any non-void function is a bad idea; it can only lead to
buggy code.)
The value returned from main () is automatically passed back to the system, from
which the parent can recover it later. We describe how in Section 9.l.6.1, "Using
POSIX Functions: wa i t () and wa i tp i d ( ) ," page 306.
I NOTE On GNU/ Linux systems, the e99 compiler-driver command runs the
compiler with the appropriate options such that the return value when falling
"I:;;
", off the end IS o. Plain gee doesn 't do thiS.

302 Chapter 9 • Process Managemenr and Pipes
9.1.5.3 Exiting Functions

The other way to voluntarily terminate a program is by calling an exiting fun ction.
T h e C standard defines the following functions:
#include <s td lib . h> ISO C
void e xit ( in t status);

void _Exi t ( int s tatus) ;
int atex it (v o i d (*function) (void)) ;
The functions work as follows:
v oid exit (int status)

This function terminates the program. s tatu s is passed to the system for recovery
by the parent. Befo re the program exits, exi t () calls all functions registered with
atexi t () , flushes and closes all open <stdio . h> FILE * streams , and rem oves
any temporary files created with tmpf ile () (see Section 12.3.2, "Creating and
Opening Temporary Files (Good)," page 441) . When the process exits, the kernel
doses any remaining open files (those opened by open ( ), c reat ( ) , or file descrip-
tor inheritance), frees up its address space, and releases any other resources it may
have been using. e x i t ( ) never returns.
vo id _Ex i t (in t s tatus)
This function is essentially identical to the POSIX _exi t () functi on ; we delay
discussion of it for a short while.
int at exit (void (* fun c tion) (vo i d) )
f unction is a pointer to a callback function to be called at program exit. exi t ( )
invokes the callback function before it doses files and terminates. The idea is that
an application can provide one or more cleanup functions to be run before finally
shutting down. Providing a function is called registering it. (Callback fun ctions
for nftw () were described in Section 8.4.3 .2 , "The n f t w () Callback Function ,"
page 263; it's the same idea here, although a t e xi t () invokes each registered
function only once.)
a t e xit ( ) returns 0 on success or -1 on error, and sets errno appropriately.
The following program does no useful work, but it does demonstrate how
atex i t () works:
9. 1 Process Crea[ion and M anage mem 303
/ * ch09-atexit . c --- demonstrat e at exit() .

Err or chec king omitt ed for brevity . * /
/ *
* The callbac k f unc tions here just answer roll call .
* In a real appl ica tion, they would do more .
*/
void cal lbackl(voi d ) printf( "ca llbackl called\n" ) ;

void callback2(vo i d) printf( " callback2 called\n") ;
void ca llback3(void) printf( " callback3 called\n") ;
/ * ma in --- regis ter functions and the n e xit * /
in t mai n ( int argc, char **argv )
printf(" regi stering callbackl\n " ); at ex i t( callbackl) ;

p rin tf ( "regist ering callback2\n"); atexit (cal lback2) ;
printf ( "registering cal lback3\n" ) ; atexi t (callback3) ;
p rintf ( "exiting now \n") ;

exit (0);
Here's what happens when it's run:

S ch09-atexit
regi steri ng cal lbackl Main program runs
regi steri ng callback2
regis ter ing call back3
e xiting now
callba ck3 called Callback functions run in reverse order
cal lback2 called
call backl called
As the example demonstrates, functions registered with atexi t () run in the reverse
order in which they were registered: most recent one first. (This is also termed last-in
first-out, abbreviated LIFO .)
POSIX defines the _ exi t () function. Unlike exi t ( ) , which invokes callback
functions and does < s tdi o . h> cleanup, _exi t () is the "die immediately" function:
vo id _exit (in t status ) ;
T he status is given to the system, just as for exi t ( ) , but the process terminates
immediately. The kernel still does the usual cleanup: All open files are closed, the
memory used by the address space is released, and any other resources the process was
using are also released.
In practice, the ISO C _Exi t () function is identical to _ex i t ( ) . The C standard

says it's implementation defined as to whether _ Exi t () calls functions registered with
a texi t () and closes open files. For GLIBC systems, it does not, behaving like _exi t ( ) .
The time to use _exi t () is when an exec fails in a forked child. In this case, you
don 't want to use regular exi t ( ) , si nce that flushes any buffered data held by FI LE *
streams. When the parent later flushes its copies of the buffers, the buffered data ends
up being written twice; obviously this is not good.
For example, suppose you wish to run a shell command and do the fork and exec
yourself. Such code wo uld look like this:
cha r *shellcommand = ".
p id_t child ;
if ((c hild = f o r k()) == 0 ) { / * child * /

execl( " /b i n/sh", "sh " , " -c", she llc ommand, NULL ) ;
_e x it( errn o == ENOENT ? 127 : 126 ) ;
/* p a rent con ti n u e s * /
The e rrno test and exit values follow conventions used by the POSIX shell. If a re-
quested program doesn't exist (ENoENT- no entry for it in a directory), then the exit
value is 127 . Otherwise, the file exists bur couldn't be exec'd for some other reason, so
the exit status is 1 2 6 . It's a good idea to follovv this convention in your own
programs too.
Briefly, to make good use of exit ( ) and atexi t () , you sh ould do the following:
• D efine a small set of exit status values that your program will use to communicate
information to its caller. Use #define d constants or an enum for them in
yo ur code.
• D ecide ifhaving callback functions for use wi th atexi t ( ) makes sense. Ifit does,
register them in main () at the appropriate point; for example, after parsing options,
and after initializing whatever data structures the callback functions are supposed
to clean up. Remember that the function s are called in LIFO (last-in first-our)
order.
• Use exi t () everywhere to exit from the program when something goes wrong,
and exiting is the correct action to take. Use the error codes that you defined.
• An exception is main ( ), for which you can use return if you wish. Our own
style is generally to use exit () when there are problems and 'return 0' at the
end of main () if everything has gone well.
• Use _exi t () or _Exi t () in a child process if exec () fails.
9.1.6 Recovering a Child's Exit Status

When a process dies, the normal course of action is for the kernel to release all its
resources. The kernel does retain the dead process's exit status, as well as information
about the resources it used during its lifetime, and the PID continues to be counted as
being in use. Such a dead process is termed a zombie.
The parent process, be it the original parent or ini t, can recover the child's exit
status. Or, by use of BSD functions that aren't standardized by POSIX, the exit status
together with the resource usage information can be recovered. Status recovery is done
by waiting for the process to die: This is also known as reaping the process. 2
There is considerable interaction between the mechanisms that wait for children to
die and the signal mechanisms we haven' t described yet. Which one to describe first is
a bit of a chicken-and-egg problem; we've chosen to talk about the child-waiting
mechanisms first, and Chapter 10, "Signals," page 347, provides the full story on signals.
For now, it's enough to understand that a signal is a way to notify a process that
some event has occurred. Processes can generate signals that get sent to themselves, or
signals can be sent externally by other processes or by a user at a terminal. For example,
CTRL-C sends an "interrupt" signal, and CTRL-Z sends a job control "s top" signal.
By default, many signals, such as the interrupt signal, cause the receiving process to
die. Others, such as the job control signals, cause it to change state. The child waiting
mechanisms can determine whether a process suffered death-by-signal, and, if so, which
signal it was. The same is true for processes stopping and, on some systems, when a
process continues.
2 We are not making this up. The terminology is indeed rather morbid, but such was the original Unix designers'
sense of humor.
306 Chapter 9 • Process M anagement and Pipes
9.1.6.1 Using POSIX Functions: wai t () and wai tpid ( )

The original V7 system call was wai t ( ) . The newer POSIX call, based on BSD
functionality, is wai tpid ( ) . The function declarations are:
#include <sys/types.h> POSIX
#include <sys / wait . h>
pid_t wait(int *status ) ;

pid_t waitpid(pid_t pid, int *status, int options) ;
wai t ( ) waits for any child process to die; the information as to how it died is returned
in *s t atus. (We discuss how to interpret *s tatu s shortly.) The return value is the
PID of the process that died or -1 if an error occurred.
If there is no child process, wai t () returns - 1 with errno set to ECH I LD (no child
process). Otherwise, it waits for the first child to die or for a signal to come in.
The wai tpid () function lets you wait for a specific child process to exit. It provides
considerable flexibility and is the preferred function to use. It too returns the PID of
the process that died or -1 if an error occurred. The arguments are as follows:
pi d_ t p i d
The value specifies which child to wait for , both by real p i d and by process group.
The pid value has the following meanings:
pid <-1 Wait for any child process with a process group ID equal to the
absolute value of pid.
pi d =-1 Wait for any child process. This is the way wa i t () works .
pid = 0 Wait for any child process with a process group ID equal to that
of the parent process's process group .
p i d> 0 Wait for the specific process with the PID equal to pid.
i n t *status
This is the same as for wai t ( ) . <sys /wa i t. h > defines various macros that inter-
pret the value in * s t a tus, which we describe soon .
int op t ions
This should be either 0 or the birwise OR of one or more of the following flags:
9.1 Process Crea[ion and Managemem 307
WNOHANG
If no child has exited, return immediately. T hat way yo u can check peri odi-
cally to see if any children have died. (Such periodic checking is known as
polling for an event.)
WUNTRACED
Return informatio n about a child process that has stopped but that h asn' t
exited yet. (For example, with job control.)
WCONTINUED
(XSI.) Return information about a child process that has continued if the
status of the child has not been reported since it changed. This tOo is for job
contro l. This flag is an XSI extension and is not available under GNU/Linux.
Multiple m acros work on the filled-in *status value to determine what happened.
T hey tend to come in pairs: one m acro to determine if so mething occurred, and if that
macro is true , one or m ore m acros that retrieve the detai ls. The macros are as follows:
WIFEXITED(status)
This macro is nonzero (true) if the process exited (as opposed to changing state).
WEXITSTATUS(status)
This macro gives the exit status; it equals the least-significant eight bits of the
value passed to exi t () or returned fro m main ( ) . You should use this macro only
if WIFE XI TED (status) IS true.
WIFSIGNALED(status)
This macro is nonzero if the process suffered death-by-signal.
WTERMSIG(status)
This macro provides the signal number that terminated the process. You should
use this macro only if WIFSIGNALED(status) is true.
WIFSTOPPED(status)
This macro is nonzero if the process was stopped.
WSTOPSIG(status)
This macro provides the signal number that stOpped the process. (Several signals
can stop a process.) You should use this macro only if WIFSTOPPED (s ta tus) is
308 Chapter 9 • Process Managemem and Pipes
true. Job control signals are discussed in Section 10.8.2, "Job Control Signals,"
page 383.
WIFCONT lNUED(sta tus)
(XSI.) This macro is nonzero if the process was continued. There is no correspond-
ing WCONTSIG () macro, since only one signal can cause a process to continue.
Note that this macro is an XSI extension. In particular, it is not available on
GNU/Linux. Therefore, if you wish to use it, bracket your code inside
'#ifdef WIFCONTINUED ... #endif' .
WCOREDUMP (status )
(Common.) This macro is nonzero if the process dumped core. A core dump is
the memory image of a running process created when the process terminates. It
is intended for use later for debugging. Unix systems name the file core , whereas
GNU/Linux systems use core .pid, where pid is the process ID of the process
that died. Certain signals terminate a process and produce a core dump
automatically.
Note that this macro is nonstandard. GNU/Linux, Solaris, and BSD systems
support it, but some other Unix systems do not. Thus, here too, if you wish to
use it, bracket your code inside '# ifdef WCOREDUMP ... #endif'.
Most programs don 't care why a child process died; they merely care that it died,
perhaps noting if it exited successfully or not. The GNU Coreutils in stall program
demonstrates such straightforward use of f ork ( ), exec lp ( ), and wai t ( ). The -s
option causes install to run the strip program on the binary executable being in-
stalled. (strip removes debugging and other information from an executable file. This
can save considerable space, relatively speaking. On modern systems with multi-gigabyte
disk drives, it's rarely necessary to strip executables upon installation.) Here is the
strip () function from install . c:
5 13 / * Strip the symbol table from the fi l e PATH.
5 14 We could dig the magic numbe r out of the file first to
515 determ ine whether to strip it, but the header files and
516 magic numbers vary so much from system to system that making
517 it portable would be very difficult. Not worth the ef fort. * /
518
9.1 Process C rearion and Managemenr 309
5 19 static void
520 strip ( const char *path)
521
522 int status;
523 pid_t pid = fork ();
524
525 switch (pid)
526
527 case -1 :
528 error (EXIT_FAILURE, errno, _ ( " fork system call failed"));
529 break;
530 case 0 : / * Child . * /
53 1 execlp ( "strip", "strip", pach, NULL ) ;
532 error ( EXIT_FAILURE, errno, _ ( "cannot run strip " ));
533 break;
534 default : / * Parent . * /
535 / * Parent process . * /
536 while (pid I = wait ( &status)) /* Wait for kid to finish . * /
537 / * Do nothing . * / ;
538 if ( status )
539 error ( EXIT_FAILURE, 0, _("strip failed" )) ;
540 break;
541
542
Line 523 calls f o r k ( ) . The swi tch statement then takes the correct action for error
return (lines 527-529) , child process (lines 530-533), and parent process (lines
534-539).
The idiom on lines 536-537 is common; it waits until the specific child of interest
exits. wai t ( ) 's return value is the PID of the reaped child. This is compared with that
of the forked child. s tatus is unused other than to see if it's nonzero (line 538), in
which case the child exited unsuccessfully. (The test, while correct, is coarse but simple.
Atestlike ' i f (WIFEXITED (s tatus ) && WEXITSTATUS (sta t us ) != o)' wouldbe
more pedantically correct.)
From the description and code presented so far, it may appear that parent programs
must choose a specific point to wait for any child processes to die, possibly polling in
a loop (as insta ll. c does), waiting for all children. In Section 10.8.3, " Parental Su-
pervision: Three Different Strategies," page 385, we'll see that this is not necessarily
the case. Rather, signals provide a range of mechanisms to use for managing parent
notification when a child process dies.
9.1.6.2 Using BSD Functions: wai t3 () and wai t4 ( )

The BSD wai t3 () and wai t4 () system calls are useful if you're interested in the
reso urces used by a child process. They are nonstandard (meaning not part of POSIX)
but widely available, including on GNU/Linux. The declarations are as follows:
#include <sys/types . h> Common
#include <sys /time .h> Not needed under GNU/ Linux, but improves portability
#include <sys /resource.h>
#include <sys/wait.h>
pid_t wait3(int *st atus, int options, struct rusage *rusage);

pid_t wait4(pid_t pid, int *status, i nt options, struct rusage *rusage);
The s ta tus variable is the same as for wai t () and wai tpid ( ) . All the macros de-
scribed earlier (WIFEXITED ( ) , etc. ) can also be used with it.
The options value is also the same as for wai tpid ( ) : either 0 o r the bitwise OR
of one or both of WNOHANG and WUNTRACED .
wai t 3 ( ) behaves like wai t ( ) , retrieving information about the first available zombie
child, and wai t4 () is like wai tpid ( ) , retrieving information about a particular process.
Both return the PID of the reaped child, - 1 on error, or 0 if no process is available and
WNOHANG was used. The pid argument can take on the same values as the pid argument
for wai tpid ( ) .
The key difference is the struc t rusage pointer. If not NULL, the system fills it in
with information about the process. This strucrure is described in POSIX and in the
getrusage(2) manpage:
struc t rusage {
struc t timeval ru_utime; /* user time used * /
struct timeval ru_stime; /* system time u sed * /
long ru _maxrSSi /* maximum resident set size * /
long ru _ixr ss; / * integral shared memory size * /
long ru_idrss ; / * integral unshared data size * /
long ru _isrs si / * integral unshared stack size * /
long ru_minflt; / * page reclaims * /
long ru_majflt; / * page faults * /
long ru_nswap; / * swaps * /
long ru_inblock; /* block inpu t operations * /
long ru_ oublock; /* block output operations * /
long ru_msgsnd; /* messages sent */
long ru _ffisgrcvi /* messages received */
long ru_nsignals; /* signals received */
long ru _nVCSWi / * voluntary context switches * /
long r u_nivcswi / * involuntary context switches * /
} ;
9.1 Process Creation and Managemenr 311
Pure BSD systems (4.3 Reno and later) support all of the fields. Table 9.2 describes
the availabiliry of the various fields in the struct rusage for POSIX and Linux.
TABLE 9.2
Availability of struct rusage fields
Field POSIX Linux Field POSIX Linux

ru_utime ./ ~ 2.4 ~2.4
ru- stime ./ ~2.4 ~ 2.6
ru_minflt ~2.4 ~ 2.6
ru_majflt ~2.4
Only the fields marked "POSIX" are defined by the standard. While Linux defines
the full structure, the 2.4 kernel maintains only the user-time and system-time fields.
The 2.6 kernel also maintains the fields related to context switching. 3
The fields of most interest are ru_utime and ru_stime, the user and system CPU
times, respectively. (User CPU time is time spent executing user-level code. System
CPU time is time spent in the kernel on behalf of the process.)
These two fields use a struct timeval, which maintains time values down to mi-
crosecond intervals. See Section 14.3.1 , "Microsecond Times: gettimeofday() ,"
page 544, for more information on this structure.
In 4.2 and 4.3 BSD, the status argument to wai t () and wai t3 () was a union
wait. It fit into an int and provided access to the same information as the modern
WIFEXITED () etc. macros do , but through the union' s members. Not all members
were valid in all situations. The members and their uses are described in Table 9.3.
POSIX doesn't standardize the union wait, and 4.4 BSD doesn't document it,
instead using the POSIX macros. GLIBC jumps through several hoops to make old
code using it continue to work. We describe it here primarily so that you'll recognize
it if you see it; new code should use the macros described in Section 9 .l.6.1 , "Using
POSIX Functions: wai t () and wai tpid ( ) ," page 306.
3 Do ubl e-check rhe getrusage(2) manpage if yo u r kernel is newer, because rhis behavior may have changed.
3 12 Chapter 9 • Process Management and Pipes
TA BLE 9.3
Th e 4.2 a nd 4 .3 BSD union wai t
POSI X m acro Un ion m em ber Usage Mean in g

WIFEXITED ( ) w_termsig w. w_te rmsig == 0 T rue if normal exit.
WEXITSTATUS() w_retcode code = w . w_ retco d e Exit status if not by
signal.
WIFS I GNALED () w_t e r ms i g w. w_te rms i g ! = 0 T rue if death by signal.
WTERMS I G() w_t ermsig sig = w . w_terms i g Signal that caused ter-
mination .
WIF STOP PED() w_ s t opval w. w_s t opval == WS TOP PED True if stopped.
WSTOPSIG( ) sig = w . w_stopsig Signal that caused
stopping.
WCOREDUMP ( ) w.w_ coredump != 0 True if child dumped
core.
9.2 Process Groups

A process group is a group of related processes that should be treated together for job
control purposes. Processes with the same process group ID are members of the process
group, and the process whose PIO is the same as the process group ID is the process
group leader. New processes inherit the process group ID of their parent process.
W e have already seen that wai tpid () allows you to wait for any process in a given
process group. In Section 10.6 .7 , "Sending Signals: kill () and killpg ( ) ," page 376,
we'll also see that you can send a signal to all the processes in a particular process group
as well. (Permission checking always applies; you can't send a signal to a process you
don ' t own.)
9.2 .1 Job Control Overview

Job control is an involved topic, one that we've chosen not to delve into for this
volume. However, here's a quick conceptual overview.
The terminal device (physical or otherwise) with a user working at it is called the
controlling terminal.
A session is a collection of process groups associated with the controlling terminal.
T here is only one session per terminal, with m ul tiple process groups in the session. One
9.2 Process Groups 313
process is designated the session leader; this is normally a shell that can do job control,
such as Bash, pdksh , z sh, or ksh9 3 .4 W e refer to such a shell as a job control sheLl.
Each job started by a job control shell, be it a single program or a pipeline, receives
a separate process group identifier. That way, the shell can manipulate the job as a single
entity, although it may have multiple processes.
The controlling terminal also has a process group identifier associated with it. When
a user types a special character such as CTRL-C for "interrupt" or CTRL-Z for "stop,"
the kernel sends the given signal to the processes in the terminal's process group.
The process gro up whose process group ID is the same as that of the controlling
terminal is allowed to read from and write to the terminal. This is called the foreground
process group. (It also receives the keyboard-generated signals.) Any other process groups
in the session are background process groups and cannot read from or write to the terminal;
they receive special signals that sto p them if they try.
Jobs move in and out of the foreground , not by a change to an attrib ute of the job,
but rather by a change to the controlling terminal's process group. It is the job control
shell that makes this change, and if the new process gtoup was stopped, the shell co n-
tinues it by sending a "continue" signal to all members of the process gro up.
In days of yore, users often used serial terminals connected to modems to dial in to
centralized minicomputer Unix systems. When the user closed the co nnection (hung
up the phone), the serial line detected the disco nnection and the kernel sent a "hangup"
signal to all processes connected to the terminal.
This concept remains: If a hang up occurs (serial hardware does still exist and is still
in use), the kernel sends the hangup signal to the foreground process group. If the session
leader exits, the same thing happens.
An orphaned process group is one where, for every process in the group, that process's
parent is also in the group or the parent is in a different session. (This can happen if a
job control shell exits with background jobs running.) Running processes in an orphaned
process gro up are allowed to run to completion. If there are any already stopped pro-
cesses in an orphaned process group when it becomes orphaned, the kernel sends those
4 Well, csh and tcsh can be included in thi s category roo, bur we prefer Bou rn e-style shell s.
processes a hangup signal and then a continue signal. This causes them to wake up so
that they can exit instead of remaining stopped forever.
9.2.2 Process Group Identification: getpgrp () and getpgid ( )

For compatibility with older systems, POSIX provides multiple ways ro retneve
process group information:
#i n cl ude <un istd . h >
pid_t g etpgrp(void); POSIX

pid_t getpg id (pid_t pid ) ; XSI
The getpgrp ( ) function returns the current process's process group ID . ge tpgid ()
is an XSI extension. It returns the process group ID of the given process p i d . A pid of
o means "the current process's process group. " Thus 'getpg id (O)' is the same as
' g e tpg rp () ' . For general programming, getpg rp () should be used.
4.2 and 4.3 BSD also have a getpgrp ( ) function, but it acts like the POSIX
getpg id () function, requiring a pid argument. Since modern sys tems support POSIX,
you should use the POSIX version in new code. (If you think this is confusing, you 're
right. Multiple ways to do the same thing are a normal result of design-by-committee,
since the committee feels that it must please everyone.)
9.2.3 Process Group Setting: setpgid () and setpgrp ()

Two functions set the process group:
#include <unis td .h>
in t setpgid (pi d_t pid , p id_t p g id) ; POSIX

i nt setpgr p (v oid ) ; XSI
The setpgrp ( ) function is simple: It sets the process group ID to be the same as
the process ID. Doing so creates a new process group in the same session, and the calling
process becomes the process group leader.
The s etpg id () function is intended for job control use. It allows one process to set
the process group of another. A process may change only its own process group ID or
the process group ID of a child process, and then only if that child process has not yet
done an exec. Job control shells make this call after the fork, in both the parent and the
child. For one of them the call succeeds, and the process group ID is changed. (Other-
wise, there's no way to guarantee the ordering, such that the parent could change the
9.3 Basic Imerprocess Communica[ion: Pipes and FIFOs 315
child's process group 10 before the child execs. If the parent's call succeeds first, it can
move on to the next task, such as manipulating other jobs or the terminal.)
With setpg i d ( ) , pgid must be an existing process group that is part of the current
session, effectively joining pid to that process group . Otherwise, pgid must be equal
to pid, creating a new process group .
There are some special case values for both p i d and pgid:
pid = 0 In this case, setpgid () changes the ptOcess group of the calling process
to pgid. It's equivalent to 'setpgid (getpid ( ), pgid) '.
pgid =0 This sets the process group 10 for the given process to be the same as its
PID . Thus , 'setpgid (pid, 0)' is the same as 'setpgid (pid, pid) '.
This causes the process with PID p i d to become a process group leader.
In all cases, session leaders are special; their PIO, process group 10, and session 10
values are all identical, and the process group 10 of a session leader cannot be changed.
(Session IDs are set with setsid () and retrieved with getsid ( ) . These are specialized
calls: see the setsid(2) and getsid(2 ) manpages.)
9 .3 Basic Interprocess Communicat ion : Pipes and FIFOs

Interprocess communication (IPC) is what it sounds like: a way for two separate pro-
cesses to communicate. The oldest IPC mechanism on Unix systems is the pipe: a one-
way communication channel. Data written into one end of the channel come out the
other end.
9 .3 .1 Pipes
Pipes manifest themselves as regular file descriptOrs. Without going to special lengths ,
you can't tell if a file descriptor is a file or a pipe. This is a feature; programs that read
standard input and write standard output don't have to know or care that they may be
communicating with another process. Should you need to know, the canonical way to
check is to attempt 'lseek ( f d, OL , S EEK_ CUR )' on the file descriptOr; this call at-
tempts to seek zero bytes from the current position, that is, a do-nothing operation. 5
This operation fails for pipes and does no damage for other files.
Such an operario n is on en referred ro as a no-op, short fo r "no operari on. "

9.3.1.1 Creating Pipes

The pipe () system call creates a pipe:
#inc1ude <unistd .h> POSIX
int pipe (int fi 1edes[2 ]) ;
The argument value is the address of a two-element integer array. pipe () returns 0
upon success and -1 if there was an error.
If the call was successful, the process now has two additional open file descriptors.
The value in filedes [0] is the read end of the pipe, and filed es [1] is the write end.
(A handy mnemonic device is that the read end uses index 0, analogous to standard
input being file descriptor 0, and the write end uses index 1, analogous to standard
output being file descriptor 1.)
As mentioned, data written into the write end are read from the read end. When
you're done with a pipe, you close both ends with a call to close ( ). The following
simple program, c h09-pipedemo . c , demonstrates pipes by creating one, writing data
to it, and then reading the data back from it:
/ * ch09-pipedemo.c --- demonstrat e I/O with a pipe. * /
2
3 #in c1ude <stdio . h>
4 #inc1ude <e rrno . h>
5 #inc1ude <unistd . h>
6
7 / * main --- create a pipe, wri te t o it, and read fr om it . * /
8
9 int main(int argc , char **argv)
10
11 stati c const cha r mesg[ ] = " Don ' t Panic! " ; /* a fam ous message * /
12 char buf [BUFSIZ];
13 ssiz e_ t rcount, wcount;
14 int pipefd[2] ;
15 size_t 1 ;
16
17 i f (p i pe (pipefd ) < 0 ) {
18 fprintf ( stderr , "% s: pipe failed : %s\n", argv [ O] ,
19 strer r o r (errno )) ;
20 exit(l) ;
21
22
23 p r in tf( "Read end = fd %d, wri te e nd fd %d\n",
24 pipefd[O] , pipefd[l]);
25
9.3 Basic Imerprocess Co mmunicatio n: Pipes and FIFOs 317
26 1 = strlen(mesg);
27 if ((wcoun t=writ e(pipefd[l) , mesg, 1)) !=l ) (
28 fprintf(stderr , "%s: write failed : %s\n", argv[O),
29 strerror(errno)) ;
30 exit (1) ;
31
32
33 if (( rcount = read (pipe fd[O] , buf , BUFSIZ)) '= wcount)
34 fprintf(stderr, "%s : read failed : %s\n", argv[O),
35 strerror(errno) ) ;
36 e xi t (1) ;
37
38
39 buf[rcount) = ' \0' ;
40
41 printf("Read <%s> from pipe\n", buf);
42 (void) close(pipefd[O));
43 (void) close(pipefd[l));
44
45 return 0 ;
46
Lines 11-15 declare local variables; of most interest is mesg , which is the text that
will traverse the pipe.
Lines 17-21 create the pipe, with error checking; lines 23-24 print the values of the
new file descripcors (just to prove that they won't be 0,1, or 2 ).
Line 26 gets the length of the message, to use with wri te ( ) . Lines 27-31 write the
message down the pipe, again with error checking.
Lines 33-37 read the contents of the pipe, again with error checking.
Line 39 supplies a terminating zero byte, so that the read data can be used as a regular
string. Line 41 prints the data, and lines 42-43 close both ends of the pipe. Here's what
happens when the program runs:
$ ch 0 9 -p ipedemo
Read end = fd 3 , write end = fd 4
Read <Don't Panic!> from pipe
This program doesn' t do anything useful, but it does demonstrate the basics. Note
that there are no calls to op e n () or c r eat () and that the program isn' t using its three
inherited file descriptors. Yet the wr i te () and read () succeed, proving that the file
descriptors are valid and that data that go into the pipe do come out of it.6 Of course,
6 We're sure you weren 'r wo rried. Mer all , you pro bably use pipelines fro m rhe sh ell dozens of rimes a day.
had the message been too big, our program wouldn't have worked. This is because pipes
have only so much room in them, a fact we discuss in the next section.
Like other file descriptors, those for a pipe are inherited by a child after a fork and
if not closed, are still available after an exec. We see shortly how to make use of this
fact and do something interesting with pipes.
9.3.1.2 Pipe Buffering

Pipes buffer their data, meaning that data written to the pipe are held by the kernel
until they are read. However, a pipe can hold only so much written but not yet read
data. We can call the writing process the producer, and the reading process the consumer.
How does the system manage full and empty pipes?
When the pipe is full, the system automatically blocks the producer the next time it
attempts to wr i te () data into the pipe. Once the pipe empties out, the system copies
the data into the pipe and then allows the wr i te () system call to return to the producer.
Similarly, if the pipe is empty, the consumer blocks in the read () until there is more
data in the pipe to be read. (The blocking behavior can be turned off; this is discussed
in Section 9.4.3.4, "Nonblocking I/O for Pipes and FIFOs," page 333.)
When the producer does a c l o se () on the pipe's write end, the consumer can suc-
cessfully read any data still buffered in the pipe. After that, further calls to re ad () return
0 , indicating end of file.
Conversely, if the consumer closes the read end, a wr i t e () to the write end
fails-drastically. In particular, the kernel sends the producer a "broken pipe" signal,
whose default action is to terminate the process.
Our favorite analogy for pipes is that of a husband and wife washing and drying
dishes together. One spouse washes the dishes, placing the clean but wet plates into a
dish drainer by the sink. The other spouse takes the dishes from the drainer and dries
them. The dish washer is the producer, the dish drainer is the pipe, and the dish dryer
is the consumer?
If the drying spouse is faster than the washing one, the drainer becomes empty, and
the dryer has to wait until more dishes are available. Conversely, if the washing spouse
7 Whar rhey are for dinner is lefr unspecified .

9.3 Basic Inrerprocess Communication: Pipes and FIFOs 319
is faster, then the drainer becomes full , and the washer has to wait until it empties out
before putting more clean dishes into it. This is depicted in Figure 9.3.
FIGURE 9.3
Synchronization of pipe processes
9.3.2 FIFOs
With traditional pipes, the only way for two separate programs to have access to the
same pipe is through file descriptor inheritance. This means that the processes must be
the children of a common parent or one must be an ancestor of the other.
This can be a severe limitation. Many system services run as daemons, disconnected
long-running processes. There needs to be an easy way to send data to such processes
(and possibly receive data from them). Files are inappropriate for this; synchronization
is difficult or impossible, and pipes can' t be created to do the job, since there are no
common ancestors.
To solve this problem, System III invented the notion of a FIFO. A FIFO, 8 or named
pipe, is a file in the filesystem that acts like a pipe. In other words, one process opens
the FIFO for writing, while another opens it for reading. Data then written to the FIFO
are read by the reader. The data are buffered by the kernel, not stored on disk.
Consider a line printer spooler. The spooler daemon controls the physical printers,
creating print jobs that print one by one. To add a job to the queue, user-level line-
printer software has to communicate with the spooler daemon. One way to do this is
for the spooler to create a FIFO with a well-known filename. The user software can
8 FIFO is an acronym for "first in first our. " This is the way pipes work.
then open the FIFO, write a request to it, and close it. The spooler sits in a loop ,
reading requests from the FIFO and processing them .
The mk fif o ( ) fun ction creates FIFO files:
#include <sy s / types.h> POSIX
#include <sys / s tat . h >
int mkfif o( const c h ar *pathname , mode_ t mo d e ) ;
The p a t hname argument is the name of the FIFO file to create, and mode is the
permissions to give it, analogous to the second argument to c reat ( ) or the third argu-
m ent to open() (see Section 4.6, "Creating Files," page 106). FIFO files are removed
like any other, with remove ( ) or un l ink () (see Section 5. 1.5.1 , "Removing Open
Files, " page 127).
The GNU/Li n ux mkfifo(3) manpage points out that the FIFO must be open both
for reading and writing at the same time, before I/O can be done: "Opening a FIFO
for reading normally blocks until some other process opens the same FIFO for writing,
and vice versa." Once a FIFO file is opened, it acts like a regular pipe; that is, it's just
another file descriptor.
The mkf ifo command brings this system call to the command level. This makes it
easy to show a FIFO file in action:
$ mkfifo afifo Create a FIFO file
$ ls -1 afifo Show type and permissions, note leading 'p'
prw- r--r- - 1 arnold devel o Oct 23 15 :4 9 afif o
$ cat < afifo &: Start a reader in the background
[1 J 22100
$ echo It was a Blustery Day > afifo Send data to FIFO
$ I t was a Bl u stery Day Shell prompts, cat prints data
Press ENTER to see job exit status
[l J+ Done cat <a fi f o cat exited
9.4 File Descriptor Management

At this point, the pieces of the puzzle are almost complete. f or k () and e x e c ( )
create processes and run programs in them. pi pe () creates a pipe that can be used for
IPe. What's still missing is a way to move the pipe's file descriptors into place as standard
output and standard input for a pipeline's producer and consumer.
The dup ( ) and d up2 () system calls, together with c l os e ( ) , let you move (well,
copy) an open file descriptor to another number. The fcntl () system call lets you do
the same thing and manipulate several important attributes of open files .
9.4 File Descripwr Managemenr 32 1
9.4.1 Duplicating Open Files: dup () and dup2 ( )

Two system calls create a co py of an open file descriptor:
#include <un is td .h> POSIX
in t dup ( in t old fd ) ;
i n t dup2 (int old fd, i n t newfd) ;
The functi ons are as follows:
int dup(in t o ldfd)

Returns the lowest unused file descrip tor value; it is a copy of oldfd. dup ( ) returns
a nonnegative integer on success or - 1 on failure.
int dup2(i nt old fd, int newf d)
Makes n e wfd be a copy of oldfd; if newfd is open , it's closed first, as if by
clo se () . dup2 () returns the new descriptor or -1 if there was a problem.
Remember Figure 9.1, in which two processes shared pointers to the same file entry
in the kernel's file table? W ell, dup () and dup2 () create the same situation, within a
single process. See Figure 9.4.
PID 42
Descriptor tables ,
per process:
I I I
Boo kkeepi ng r Bookkeeping r Boo kke eping r
Fi le table: ~~ -- ~~~~ ~---~- ------- ---~ ------------1- ~
o ffset [ Offset [ Offset [
j
File contents:
FIGURE 9.4
File descriptor sharing after 'dup2 ( 1, 3)'
In this figure, the process executed 'dup2 (1, 3)' to make file descriptor 3 a copy
of standard outp ut, file descriptor 1. Exactly as described before, the two descriptors
share the file offset for the open file.
In Section 4.4.2, "Opening and Closing Files," page 93 , we mentioned that open ( )
(and c reat () ) always returns the lowest unused integer file descriptor value for the file
being opened. Almost all system calls that return new file descriptors follow this rule,
not just open () and creat () . (dup2 () is an exception since it provides a way to get
a particular new file descriptor, even if it's not the lowest unused one.)
Given the "return lowest unused number" rule combined with dup ( ), it's now easy
to move a pipe's fil e descriptors into place as standard input and output. Assuming that
the current process is a shell and that it needs to fork two children to set up a simple
two-stage pipeline, here are the steps:
1. Create the pipe with pipe ( ) . This must be done first so that the two children
can inherit the open file descriptors.
2. Fork what we'll call the "left-hand child." This is the one whose standard output
goes down the pipe. In this child, do the following:
a. Use 'close(pipefd[O])' since the read end of the pipe isn't needed in
the left-hand child.
b. Use' c los e (1 ) , to close the original standard output.
c. Use 'dup (pipefd [1] ), to copy the write end of the pipe to file descriptor 1.
d. Use 'close (pipefd [1] )' since we don't need two copies of the open
descriptor.
e. Exec the program to be run.
3. Fork what we'll call the "right-hand child." This is the one whose standard input
comes from the pipe. The steps in this child are the mirror image of those in
the left-hand child:
a. Use 'close (pipefd [1] )' since the write end of the pipe isn't needed in
the right-hand child.
b. Use 'c l o se (0 ) ' to close the original standard input.
c. Use 'dup (pipefd [ 0] )' to copy the read end of the pipe to file descriptor o.
d. Use 'close (pipefd [0] )' since we don't need two copies of the open
descriptor.
e. Exec the program to be run.
9.4 File Descriptor M anagement 323
4. In the parent, close both ends of the pIpe: 'close (p ipefd (0) ) ;
close(p ipefd[l) )' .
5. Finally, use wai t () in the parent to wait for both children to finish.
Note how important it is to close the unused copies of the pipe's file descriptors. As
we pointed out earlier, a file isn't closed until the last open file descriptor for it is closed.
This is true even though multiple processes share the file descriptors. Closing unused
file descriptors matters because the process reading from the pipe won 't get an end-of-
file indication until all the copies of the write end have been closed.
In our case, after the two children are forked, there are three processes, each of which
has copies of the two pipe fil e descriptors: the parent and the two children. The parent
closes both ends since it doesn't need the pipe. The left-hand child is writing down the
pipe, so it has to close the read end. The right-hand child is reading from the pipe, so
it has to close the write end. This leaves exac tly one copy of each file descriptor open.
When the left-hand child finishes, it exits. The system then closes all of its file de-
scriptors. When that happens , the right-hand child finally receives the end-of-file
notification, and it too can then finish up and exit.
The following program, ch09-pipeline. c, creates the equivalent of the following
shell pipeline:
$ echo hi there I sed s/hi/hello/g
hello there
Here's the program :

7 / * ch09-pipeline . c fork two processes into their own pipeline .
2 Minimal error checking for brevity . * /
3
6 #include <sys /type s.h>
7 #include <sys /wa it . h>
9
70 int pipefd[2] ;
77
72 extern void left_child(void ) , right_child (vo id ) ;
73
74 /* main --- fork children, wait for them to finish * /
75
324 Chapter 9 • Process Management a nd Pipes
16 int main ( int argc, char **argv)

17
18 pid_t left-pid, right-pid;
19 pid_t r et;
20 int status;
21
22 if (pipe (pipe fd) < 0) /* create pipe, very fi r st thing * /
23 perror ( "pipe");
24 exi t (1) ;
25
26
27 if (( left-p id = fork ( )) < 0) { /* fork left-hand child * /
28 perror ( " fork");
29 exi t (1) ;
30 else if (left-pid 0)
31 l eft_chi ld () ;
32
33 if ( (right-pid = f o rk ()) < 0) { / * fork right-hand child * /
34 perror ( " fork" ) ;
35 exi t ( 1 ) ;
36 else if (right-pid 0)
37 right_child () ;
38
39 close(p i pefd[O]) ; /* close parent's copy of p i pe */
40 close (pipefd [1] ) ;
41
42 while (( ret = wait (& statu s )) > 0) { /* wait for children * /
43 if ( ret == left-pid )
44 printf ( " l ef t child terminated, status: %x \ n", status ) ;
45 else if (ret == right-pid )
46 printf("right child terminated, status: %x \ n", status ) ;
47 e lse
48 printf ( "yow ! unknown child %d terminated, status %x \n",
49 ret, status) ;
50
51
52 return 0 ;
53 )
Lines 22-25 create the pipe. This has to be done first.

Lines 27-31 create the left-hand child, and lines 33- 37 create the right-hand child.
In both instances, the parent continues a linear execution path through main () while
the child calls the appropriate function to manipulate file descriptors and do the exec.
Lines 39-40 close the parent's copy of the pipe.
Lines 42-50 loop, reaping children, until wai t () returns an error.
9.4 File Descripror M a nagement 325
55 /* left_child --- do the work for the left child */

56
57 void left_child (void)
58 {
59 static char *left_argv[] { "echo", "hi " , " there", NULL } ;
60
61 close(pipefd[O]);
62 close(l);
63 dup(pipefd[l]);
64 close (pipefd[l] ) ;
65
66 e x ecvp( "echo", left_argv) ;
67 _exit (errno == ENOENT ? 127 126 ) ;
68
69
70 / * right_child --- do the work for the right child */
71
72 void right_child (void)
73 {
74 static char *right_argv[] { "sed", "s / hi / hell o/ g", NULL} ;
75
76 close(pipefd[l]) ;
77 close(O) ;
78 dup(pipefd[O]) ;
79 close (pipefd[O] ) ;
80
81 execvp ( "sed", right_argv);
82 _exit(errno == ENOENT ? 127 : 126);
83
Lines 57-68 are the code for the left-hand child. The procedure follows the steps
given above to close the unneeded end of the pipe, close the original standard output,
dup () the pipe's write end to 1, and then close the original write end. At that point,
line 66 calls exe cvp ( ) , and if it fails, line 67 calls _exi t ( ) . (Remember that line 67
is never executed if execvp () succeeds.)
Lines 72-83 do the similar steps for the right-hand child. Here's what happens when
lt runs:
$ ch09-pipeline Run the program
left child terminated, status : 0 Left child finishes before output (I)
hello there Output from right child
right child terminated, status : 0
$ chO 9 -pipeline Run the program again
hello there Output from right child and
right child terminated, status : 0 Right child finishes before left one
left child terminated, status : 0
Note that the order in which the children finish isn't deterministic. It depends on
the system load and many other factors that can influence process scheduling. You
should be careful to avoid making ordering assumptions when you write code that
creates multiple processes, particularly the code that calls one of the wai t () family
of functio ns.
The whole process is illustrated in Figure 9.5.
Figure 9.5 (a) depicts the situation after the parent has created the pipe (lines 22-25)
and the two children (lines 27-37).
Figure 9.5 (b) shows the situation after the parent has closed the pipe (lines 39-40)
and started to wait for the children (lines 42- 50). Each child has moved the pipe into
place as standard output (left child, lines 61-63) and standard input (lines 76-78).
Finally, Figure 9.5 (c) depicts the si tuation after the children have closed off the
original pipe (lines 64 and 79) and called execvp ( ) (lines 66 and 81).
9.4.2 Creating Nonlinear Pipelines: / dev / fd / xx

Many modern Unix systems, including GNU/Linux, support special files in the
/ dev / f d directory.9 These files represent open file descriptors, with names such as
/ dev / f d /0, / dev / f d / 1 , and so on. Passing such a name to open () returns a new file
descriptor that is effectively the same as calling dup () on the given file descriptor
number.
These special files find their use at the shell level: The Bash, k sh88 (some versions)
and ksh93 shells supply a feature called process substitution that makes it possible to
create nonlinear pipelines. The notation at the shell level is '< ( ... ) , for input pipelines,
and '> ( . . . )' for output pipelines. For example, suppose you wish to apply the
di ff command to the output of two commands. You would normally have to use
temporary files :
command1 > / tmp / out . $$.l
command2 > / tmp / out . $$ . 2
diff / tmp /out.$ $ . l /tmp/out . $$ . 2
rm /tmp/out.$$ . l / t mp/out.$$ . 2
With process substitution, it looks like this:
9 On GNU/Linux systems, /dev / fd is a symbolic link to /proc/ sel fI fd, bur since / dev / f d is th e common
place, that 's what you should use in your code.
9.4 File Descripror Managemenr 32 7
(a) c h 0 9 - pipeline _ pipe () _ _ fd 3 fd 4
/
fork( ) fork () Pipe is shared
among parent
I
ch0 9- pipeline
\
ch O9-pipeline
and both children
Left-hand child Right-hand child
(b) close ( ) __ fd 3 fd 4
Parent closes pipe,

\.<Jai t () wai t () waits lor children
I 1 o •~
\
ch09-p ipeline
Ch ildren move pipe
to stdout (left-hand child)
and stdin (right-hand child),
close original pipe Ids
l C10 Se ()/ d UP() cl ose () /du p ()
Left-hand ch ild Right-hand child
(c) ch 09 -pipel ine
/
wait( ) wait () Children ca ll exec ( ),
programs run
echo hi there
I 1
\
sed s / hi / he llo / g hello there
execvp( ) execvp ()
Left-hand child Pipe Right-hand child
FIGURE 9 .S
Parent creating a pipeline
diff < (commandl ) < ( command2 )
No messy temporary files to remember to dean up. For example, the following command
shows that our home directory is a symbolic link to a different directory:
$ diff < (pwd) < (/bin/pwd)
lcl
< / horne / arnold / work / prenha ll / progex
> /d/horne/ arnold/work/ prenha ll/progex
The plain pwd is the one built in to the shell: It prints the current logical pathname
as managed by the shell with cd. The / bin / pwd program does a physical filesystem
walk ro print the pathname.
How does process substitution work? The shell creates the subsidiary commands 10
('pwd' and' / bin / pwd') . Each one's output is connected to a pipe, with the read end
open on a new file descriptor for the main process ('diff'). The shell then passes the
names o/files in /dev / f d to the main process as the command-line argument. We can
see this by turning on execution tracing in the shell:
$ set -x Turn on execution tracing
$ diff < (pwd) < (/bin/pwd) Run command
+ diff / dev /f d / 63 / dev /f d / 62 Sh ell trace: main program, note arguments
++ pwd Shell trace: subsidiary programs
++ / bin /pwd
lcl Output from diff
< /horne/arnold /work/pr enhall/ progex
> / d / horne /a rno ld / work / prenhall/progex
This is illustrated in Figure 9.6.

If your system has / dev / fd, you may be able to take advantage of this facility as
well. Do be careful, though, to document what you're doing. The file descriptor
manipulation at the C level is considerably less transparent than the corresponding
shell notations!
9.4.3 Managing File Attributes: fcntl ( )

The fcntl () ("file control") system call provides control over miscellaneous attributes
of either the file descriptor itself or the underlying open file. The GNU/Linux fcntl(2)
manpage describes it this way:
10 Alrhough we've shown simple commands, arbirrary pipelines are allowed.

9.4 File Descrip ro r Managemem 329
r :~::
pwd
• f d 1 r fd 63
!
1
:':,<::/'
Pipe
diU /dev / fd /63 / dev/ f d /62
Pipe
r
/bi n/pwd fd 1 r fd 62
!
FIGURE 9.6
Process substitution

int fcntl ( int fd, int cmd ) ;

int fcncl (int fd , int cmd , long arg) ;
int fcntl(int fd , i nt cmd , struct flock *lock ) ;
In other words, it takes at least two arguments; based on the second argument, It
may take a third one .
The last form, in which the third argument is a pointer to a s tru ct fl ock, is for
doing file locking. File locking is a large top ic in it own right; we delay discussion until
Section 14.2, "Locking Files," page 53l.
9 .4. 3 .1 The Close-an-exec Flag

After a f o r k () and before an exec ( ) , you should make sure that the new program
inherits only the open files it needs. You don't want a child process messing with the
parent's open files unless it's supposed to . On the flip side, if a parent has lots of files
open, that will artificially limit the number of new files the child can open. (See the
accompanying sidebar.)
Organizationally, this behavior may present a problem. The part of your program
that starts a new child shouldn't particularly need access to the other part(s) of your
program that manipulate open files . And a loop like the following is painful, since there
may not be any open files:
int j;
f or (j = getdtablesize() ; j >= 3; j -- ) /* clo se all but 0 , 1, 2 */

(vo id ) cl ose(j) ;
The solution is the close-on-exec flag. This is an attribute of the file descriptor itself,
not the underlying open file. When this flag is set, the system automatically closes the
file when the process does an exec. By setting this flag as soon as you open a file, you
don't have to worry about any child processes accidentally inheriting it. (The shell au-
tomatically sets this flag for all file descriptors it opens numbered 3 and above.)
The c md argument has rwo values related to the close-on-exec flag:
F_GETFD
Retrieves the file descriptor flags. The return value is the setting of all the file de-
scriptor flags or - 1 on error.
F_SETFD
Sets the file descriptor flags to the value in arg (the third argument). The return
val ue is 0 on success or - 1 on error.
At the moment, only one "file descriptor flag" is defined: FD_C LOEXEC. This symbolic
constant is a POSIX invention, 11 and most code uses a straight 1 or 0 :
if (f c ntl( fd, F_SETFD, 1 ) < 0) ... / * set cl ose-on -ex ec , handle any er r o rs * /
if (fcntl( fd , F_GETFD) 1) ... / * close-on -exec bit is alre ady set * /
H owever, the POSIX definitio n allows for future extensio n , and thus the correct
way to write such code is more along these lines:
in t fd ;
long fd_flags;
fcntl( fd, F_GETFD )) < 0 ) Retrieve flags

/* han dle error * /
fd_flags 1= FD_CLOEXEC; Add close-an -exec flag

if (fcntl( f d , F_SETFD, f d_ f lags) < 0) Set flags
/* han d l e error * /
11 The POSIX standard pu rposely does not give it a value. However, for old code to continue to work, the only
value any implementation could sensibly use is 1.
9.4 File Descripwr Management 331
1M NOTE The close-on-exec flag is a property of the descriptor, not the underlying
.' file . Thu s, the new descriptor returned by dup () or dup2 ( ) (o r by fcntl ()
~: with F _DUPD, as we ' re about to see) does not inherit the close-on-exec flag
I setting of the original descriptor. If you want it set for the new fi Ie descriptor
I also, you must rememb er to do it yo urs elf. This behavior makes sense: If yo u've
I just called dup (), copying one end ofa pipe to 0 or 1, yo u don 't want the
~ system to close it for you as soon as the process does an exec!
W
A Close-on-exec War Story from gawk

6 Within the awk language, I/O statements use a redirection notation similar w that of
the shell. This includes one-way pipes to and from subprocesses:
print "somethi ng bril liant " > " /some/file " Output to file
getl ine my_record < " /some/o ther /f i le Input from file
print "more words of wisdom" i "a_reader process" Output to subprocess

"a_writer proces s" I getline some_input Input from subprocess
T he awk interpreter has an open file desc ripwr for all file redirections, and for th e pipe
notations that create a subprocess, the awk interpreter creates a pipe and then does a
fork and exec of a shell to run the command as given by the string.
Now, on modern systems, part of the C runtime startup code (that runs before main ( )
is called) needs to temporarily open files in order to manage the use of shared libraries.
This means that there must be at least one or two unused file descriptors available to a
brand-new program after an exec, or the program just won't run.
One day, a user reported that when the awk program had the maximum number of files
open, any child process that it tried to fork and exec for a pipeline would fail to start!
You can probably guess what was happening. The child shell inherited all the open file
descriptors that gawk itself was using for its redirections. We modified gawk to set the
close-on-exec flag for all file and pipe redirections, and that fixed the problem.
9.4.3.2 File Descriptor Duplication

When f cntl () 's crud argument is F _DUPFD, the behavior is similar, but not quite
identical, to dup2 ( ) . In this case, a r g is a file descriptor representing the lowest acceptable
value for the new file descriptor:
Return value is between 7 and maximum, or fa ilure
Return value is 7, or failure

You can simulate the behavior of dup ( ) , which returns the lowest free fil e descriptor,
by using 'fcntl (o l d_ fd, F _ DUPFD, 0 ) ' .
If you remember that file descriptors are just indexes into an internal table, under-
standing how this function works should be clear. The third argument merely provides
the index at which the kernel should start its search for an unused file descriptor.
Whether to use f cntl ( ) with F _ DU PFD or dup () or dup 2 ( ) in your own code is
largely a matter of taste. All three APIs are part of POSIX and widely supported. We
have a mild preference for dup () and dup 2 () since those are more specific in their ac-
tion, and thus are more self-documenting. But because all of them are pretty simple,
this reasoning may not convince you.
9.4.3.3 Manipulation of File Status Flags and Access Modes

In Sectio n 4.6. 3, "Revisiting open ( ) ," p age 110, we provided the full list of a_ x x
flags that open () accepts. POSIX breaks these down by function, classifying them as
described in Table 9.4.
TABLE 9.4
o _ xx flags for open () , crea t () and fcntl ()
Category Functions Flags

File access
File creation open ( )
File status
Besides setting the various flags initially with open ( ) , you can use fcn tl () to retrieve
the current settings, as well as to change them . This is d one with the F _GETFL and
F _ SETFL values for cmd, respectively. For example, you might use these commands to
change the setting of the non blocking flag, O_NONBLOCK, like so:
if Ilfd_flags = fcntll f d, F_GET?L )) < 0)

/ * handle error */
if I I fd_ fl a gs & O_ NONBLOCK) ! = 0) { / * No nb locking flag i s se t * /

fd_ fla gs &= -O_NONBLOCK; / * Cl ear it * /
if I fc ntl ( fd, F_SET FL , fd_flags ) != 0) / * Give kernel new value * /
/ * handle erro r * /
disk files . It can also apply to certain devices, such as terminals, and to network connec-
tions, both of which are beyond the scope of this volume.
The O_NONBLOCK Bag can be used with open () to specify nonblocking I/O, and it
can be set or cleared with fcntl () . For open () and read ( ), nonblocking I/O is
straightforward.
Opening a FIFO with O_NONBLOCK set or clear displays the following behavior:
open("/fifo / file". O_RDONLY. mode)

Blocks until the FIFO is opened for writing.
open(" / fifo / file". O_RDONLY!O_NONBLOCK. mode)
Opens the file , returning immediately.
open(" / fifo / file". O_WRONLY. mode)
Blocks until the FIFO is opened for reading.
open("/fifo/file". O_WRONLY!O_ NONBLOCK. mode)
If the FIFO has been opened for reading, opens the FIFO and returns immediately.
Otherwise, returns an error (return value of - 1 and errno set to ENXI O) .
As described for regular pipes, a read () of a FIFO that is no longer open for writing
returns end-of-file (a return value of 0) . The O_NONBLOCK Bag is irrelevant in this case.
Things get more interesting for an empty pipe or FIFO: one that is still open for writing
but that has no data in it:
read (fd. buf. count), and O_NONBLOCK clear
The read () blocks until more data come into the pipe or FIFO.
read (fd. buf. count ) , and O_NONB LOCK set
The read () returns -1 immediately, with errno set to EAGAIN.
Finally, write () behavior is more complicated. To discuss it we have to first intro-

duce the concept of an atomic write. An atomic write is one in which all the requested
data are written together, without being interleaved with data from other writes. POSIX
defines the constant PIPE_BUF in <unistd.h>. Writes of amounts less than or equal
to PIPE_BUF bytes to a pipe or FIFO either succeed or block, according to the details
we get into shortly. The minimum value for PIPE_BUF is _POSIX_PIPE_BUF, '.'1r.ich
is 512. PIPE_BUF itself can be larger; current CUBe systems define it to be 4096, but
9.4 File Descriptor Managemenr 333
Besides the modes themselves, the O_ACCMODE symbolic constant is a mask you can
use to retrieve the file access modes from the return value:
fd_flags = fcntl(fd, F_ GETFL);
switch (fd_flags & O_ACCESS)

case O_RDONLY :
... action for read-only ..
break;
case O_WRONLY :
... action fo r write-only ..
b reak;
case O_RDWR :
... action for read-write ..
break;
POSIX requires that O_ RDONL Y , O_RDWR, and O_ WRONLY be bitwise distinct; thus,
code such as just shown is guaranteed to work and is an easy way to determine how an
arbitrary file descriptor was opened.
By using F _ S ETF L, you can change these modes as well, although permission
checking still applies. According to the GNU/Linux fcntl(2) manpage, the O_APPEND
flag cannot be cleared if it was used when the file was opened.
9.4.3.4 Nonblocking I/ O for Pipes and FIFOs

Earlier, we used the metaphor of two people washing and drying dishes , and using
a dish drainer to describe the way a pipe works; when the drainer fills up, the dishwasher
stops , and when it empties out, the dishdryer stops. This is blocking behavior: The
producer or consumer blocks in the call to wr i te () or read ( ) , waiting either for more
room in the pipe or for more data to come into it.
In the real world, a human being waiting for the dish drainer to empty out or fill up
would not just stand by, immobile. 12 Rather, the idle one would go and find some
other kitchen task to do (such as sweeping up all the kids ' crumbs on the Boor) until
the dish drainer was ready again.
In Unix/POSIX parlance, this concept is termed non blocking 110. That is, the request-
ed I/O either completes or returns an error value indicating no data (for the reader) or
no room (for the writer) . Nonblocking I/O applies to pipes and FIFOs, not to regular
12 Well, we' re ignorin g rhe id ea rhar rwo spouses mighr wanr (0 ralk and enj oy each orhe r's co mpany.
9.4 File Descripwr Managemenc 335
in any case you should use the symbolic constant and not expect PIP E_BUF to be the
same value across different systems .
In all cases, for pipes and FIFOs, a wr i t e () appends data to the end of the pipe.
This derives from the fact that pipes don 't have file offsets: They aren't seekable.
Also in all cases, as mentioned, writes of up to P I PE_ BUF are atomic: The data are
not interleaved with the data from other writes. Data from a write of more than
PIPE_BUF bytes can be interleaved with the data from other writes on arbitrary bound-
aries. This last means that yo u cannot expect every PIP E_BUF sub chunk of a large
amount of data to be written atomically. The O_NONBLOC K setting does not affect
this rule.
As with re ad ( ) , when O_NONBLOCK is not set, wri t e () blocks until all the data
are wntten.
Things are most complicated with O_NONBLOCK set. For a pipe or FIFO, the behavior
is as follows:
space ~ nbytes space < n bytes
nbyt es ::; PIPE_BUF wr i te () succeeds wri te () returns - l/EAGAIN
space> 0 space = 0
nbytes > PIPE_ BUF writ e () writes what it can wr ite () rerurns -l /EAGA IN
For nonpipe and non-FIFO files to which O_NONBLOCK can be applied, the behavior
is as follows:
space> 0 wri t e () writes what it can.
space = 0 write () returns -l/EAGAIN.
Although there is a bewildering array of behavior changes based on pipe/nonpipe,
O_NONBLOCK set or clear, the space available in the pipe, and the size of the attempted
write, the rules are intended to make programming straightforward:
• End-of-file is always distinguishable: re a d () rerurns zero bytes.

• If no data are availab le to be read, rea d () either succeeds or returns a "no thing
to read" indication: EAGAIN, which means "try again later."
• If there's no room to write data, wri t e () either blocks until it can succeed
(O_NONBLOCK clear) or it fails with a "no room right now" error: EAGAIN.
• When there's room, as much data will be written as can be, so that eventually all
the data can be written out.
In summary, if you intend to use nonblocking I/O, any code that uses wri te () has
to be able to handle a short write, where less than the requested amount is successfully
written. Robust code should be written this way anyway: Even for a regular file it's
possible that a disk could become full and that a wr i te () will only partially succeed.
Furthermore, you should be prepared to handle EAGAI N, understanding that in this
case write ( ) failing isn't necessarily a fatal error. The same is true of code that uses
nonblocking I/O for reading: recognize that EAGAIN isn't fatal here either. (It may pay,
tho ugh, to count such occurrences, giving up after too many.)
Nonblocking I/O d oes complicate your life, n o doubt about it. But for many appli-
cations, it's a necessity that lets you get your job done. Consider the print spooler again .
The spooler daemon can' t afford to sit in a blocking r ead () on the FIFO file to which
incoming jobs are submitted. It has to be able to monitor running jobs as well and
possibly periodically check the status of the printer devices (for example, to make sure
they have paper or aren ' t jammed).
9.4.3.5 fcntl () Summary

T he fcntl () system call is summarized in Table 9.5.
TABLE 9.S
f c ntl ( ) summary
cmd value arg value Returns

F_ DUPFD Lowest new descriptor Duplicate of the fd argument.
F_GETFD Retrieve fil e descriptor flags (close-on-exec) .
F SETFD New flag value Set fil e descriptor flags (close-on-exec).
F_GETFL Retrieve flags on underlying fil e.
F_SETFL New flag value Set flags on underlying file.
The file creation, status, and access flags are copied when a file descriptor is duplicated.
The close-on-exec flag is not.
9.5 Example: Two-Way Pipes in gawk 337
9.S Example: Two-Way Pipes in gawk

A two-way pipe connects two processes bidirectionally. Typically, for at least one of
the processes, both standard input and standard output are set up on pipes to the other
process. The Korn shell (ksh) introduced two-way pipes at the language level , with
what it terms a coprocess:
database engine command and arguments 1& Start coprocess in background
print -p "database command" Write to coprocess
read -p db_response Read from coprocess
Here, database engine represents any back-end program that can be driven by a
front end, in this case the ksh script. database engine has standard input and standard
output connected to the shell by way of two separate one-way pipes. 13 This is illustrated
in Figure 9.7.
print -p -- parent shell

- read -p
fd N
r
f fd 0 - - -......~ database engine
I r
- - -......~ fd 1 f fd M
Pipe Pipe
FIGURE 9.7
Korn shel l coprocess
In regular awk, pipes to or from subprocesses are one-way: There's no way to send
data to a program and read a response back from it-you have to use a temporary file.
GNU awk (gawk) borrows the' I &' notation from ksh to extend the awk language:
print "a command " 1& "database engine" Start coprocess, write to it
"database engine" 1& getline db_r esponse Read from coprocess
gawk also uses the ' I &' notation for TCP/IP sockets and BSD portals, which aren't
covered in this volume. The following code from io. c in the gawk 3.1.3 distribution
13 There is only one defaulr coprocess (accessible with 'read -p' and 'print -p') ar a rime. Shell scriprs can use
rhe exec command wirh a special redirecrio n norarion co move rh e co process's file descripcors co specific numbers.
Once rhis is done, ,Ulorh er co process can be srarred.
338 Chapter 9 • Process Management a nd Pipes
is the part of the t wo_way_open ( ) function that sets up a simple coprocess: It creates
two pipes, forks the child process , and does all the file descriptor manipulation. We
h ave omitted a number of irrelevant pieces of code (this function is bigger than it
sh o uld be):
1561 static int
1562 two_way_open(const char *str, struct redirec t *rp )
1563
1827 /* ca s e 3 : two way pipe to a chi ld process */

1828
1829 int ptoc[2], ctop[2];
1830 int p id;
1831 int save_ errno;
1835
1836 if (pipe (ptoc ) < 0 )
1837 re turn FALSE; / * errno se t, diagnostic from caller * /
1838
1839 if (p i pe (ctop) < 0)
1840 save_errno = errno;
1841 close(ptoc[O]);
1842 c lo s e(ptoc[l]);
1843 er r no = save_ errno;
1844 return FALSE;
1845
The first step is to create the two pipes. p toc is "parent to child," and ctop is "child
to parent. " Bear in mind as yo u read the code that index 0 is the read end and that index
1 is the write end.
Lines 1836- 1837 create the first pipe, pt oc . Lines 1839-1845 create the second
one, closing the first one if this fails. This is important. Failure to close an open but
unused pipe leads to file descriptor leaks. Like memory, file descriptors are a finite re-
source, and once you run out of them, they're gone. 14 The same is true of open files:
Make sure that all your error-handling code always closes any open files or pipes that
you won't need when a failure happens.
save_errno saves the errno values as set by pipe ( ) , on the off chance that close ( )
might fail (line 1840). errno is then restored on line 1843.
14 W ell, you can close m em , obviously. Bur if you don'r know rhey're open, rh en rh ey' re losr jusr as effecriveiy as
memory rh rough a m emory leak.
9. 5 Example: Two-Way Pipes in gawk 339
1906 if «p id = fork()) < 0) (

1907 save_errno = errno ;
1908 close(ptoc[O]) ; close(ptoc[l]);
1909 close(ctop[O]) ; close(ctop[l]) ;
1910 errn o = save_errno;
1911 return FALSE;
1912
Lines 1906-1912 fork the child, this time closing both pipes if fork() failed. Here
too, the original errno value is saved and restored for later use in producing a diagnostic.
1914 if (pid == 0) ( 1* child * 1
1915 if (c1ose(l) == -1)
1916 fatal(_('close of stdout in ch i ld failed (%5) '),
1917 strerror(errno));
1918 if (dup(ctop[l]) != 1)
1919 fatal(_('moving pipe to stdout in child failed (dup : %5) ' ) ,
strerror(e rrno));
1920 if (close (0) == -1)
1921 facal(_('close of stdin i n child failed (% 5 ) ' ) ,
1922 strerror(errno) ) ;
1923 if (dup(ptoc[O]) != 0)
1924 fatal( _ ('moving pipe to stdin in child failed (dup : %5) '),
strerror(errno)) ;
1925 if ( close(ptoc[O]) == -1 II close(ptoc[l]) == -1
1926 II close(ctop[O]) == - 1 II close (ctop[l]) == -1 )
1927 fatal(_('close of pipe failed (%s) '), strerror(errno)) ;
1928 1 * stderr does NOT get dup'ed onto child's stdout * 1
1929 execl('/bin/sh ', 'sh', '-c', str, NULL);
1930 _exit(errno == ENOENT ? 127 : 126);
1931
Lines 1914-1931 handle the child's code, with appropriate error checking and
messages at each step. Line 1915 closes standard output. Line 1918 copies the child-
to-parent pipe write end to 1. Line 1920 closes standard input, and line 1923 copies
the parent-to-child read end to o. If this all works, the child's standard input and output
are now in place, connected to the parent.
Lines 1925-1926 close all four original pipe file descriptors since they're no longer
needed. Line 1928 reminds us that standard error remains in place. This is the best
decision, since the user will see errors from the coprocess. An awk program that must
capture standard error can use the ' 2>&1' shell notation in the command to redirect
the coprocess's standard error or send it to a separate file.
Finally, lines 1929-1930 attempt to run execl ( ) on the shell and exit appropriately
if that fails.
1934 / * par ent * /

1935 rp ->p id = pid;
1936 rp ->i op = iop_alloc(ctop[O], str, NULL);
1937 i f (rp ->iop == NULL) {
1938 (void ) close (ctop[O] ) ;
1939 (void ) close (ctop[l] ) ;
1940 (void ) close(ptoc[O] ) ;
1941 (v oid) close(ptoc[l]);
1942 (v oid) kill(pid, SIGKILL); /* overkill? (pardon pun ) * /
1943
1944 re turn F ALSE ;
1945
The first step in the parent is to manage the input end, from the coprocess. The r p
pointer points to a struct red irect, which maintains a field to hold the child's PID ,
a FILE * for output, and an IOBUF * pointer named iop. The I OBUF is a gawk internal
data structure for doing input. It, in turn, keeps a copy of the underlying file descriptor.
Line 1935 saves the ptocess ID val ue. Line 1936 allocates a new I OBUF for the given
file descriptor and command string. The third argument here is NULL: It allows the use
of a preallocated IOBUF if necessary.
If the allocation fails , lines 1937-1942 clean up by closing the pipes and sending a
"kill" signal to the child process to cause it to terminate. (The kill () function is de-
scribed in Section 10.6.7, "Sending Signals: kill () and kill pg ()," page 376.)
1946 rp ->fp = fdopen (ptoc [1] , "w" ) ;
1947 if ( r p - >fp == NULL) (
1948 iop_clos e(rp->iop) ;
1949 r p->iop = NULL;
1950 (v oid) close(ctop[O]);
195 1 (v o i d) close(ctop[l]);
1952 (v o i d) close(ptoc[O]) ;
195 3 (void) close(ptoc[l]);
1954 (v oid) kill(pid, SIGKI LL ) ; /* overkill? (pardon pun) */
1955
1956 return FALSE;
1957 }
Lines 1946-1957 are analogous. They set up the parent's output to the child, saving
the file descriptor for the parent-to-child pipe write end in a FILE * by means of
fdopen ( ) . If this fails, lines 1947-1957 take the same action as before: closing all the
pipe descriptors and sending a signal to the child.
From this point on, the write end of the parent-to-child pipe, and the read end of
the child-to-parent pipe are held down in the larger structures: the F ILE * and IOBUF ,
9. 6 Suggesred Reading 341
respectively. They are closed automatically by the regular routmes that close these
structures. However, two tasks remain:
1960 os_close_on_exec( ctop[OJ, str , "pipe" , "from") ;
196 1 os_ close_on_exec(ptoc[lJ, str , "pipe ", "from " ) ;
1962
1963 (v oid) clo s e (pt oc[O] ) ;
1964 (void ) clo se ( ct op[l ]) ;
1966
1961 re turn TRUE;
1968
1911
Lines 1960- 1961 set the close-on-exec flag for the twO descriptors that will remain
open. os_c lose_on_exec ( ) is a simple wrapper routine that does the job on Unix
and POSIX-compatible systems, but does nothing on systems that don 't have a close-
on-exec flag. This buries the portability issue in a single place and avoids lots of messy
#ifdefs throughout the code here and elsewhere in i o . c .
Finally, lines 1963-1964 close the ends of the pipes that the parent doesn't need,
and line 1967 returns TRUE, for success.

Job control is complicated, involving process groups, sessions, the wait mechanisms,
signals, and manipulation of the terminal's process group. As such, we've chosen not
to get into the details. However, yo u may wish to look at these books:
1. Advanced Programming in the UNIX Environment, 2nd edition, by W. Richard
Stevens and Stephen Rago. Addison-Wesley, Reading Massachusetts, USA,
2004. ISBN: 0-201-43307-9.
This book is both complete and thorough, covering elementary and advanced
Unix programming. It does an excellent job of covering process groups, sessions,
job control, and signals.
2. The D esign and Implementation of the 4.4 BSD Operating System, by Marshall
Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman.
Addison-Wesley, Reading, Massachusetts, USA, 1996. ISBN: 0-201-54979-4.
This book gives a good overview of the same material, including a discussion
of kernel data structures, which can be found in section 4.8 of that book.
9.7 Summary
• New processes are created with fork ( ) . After a fork, both processes run the same
code, the only difference being the return value: 0 in the child and a positive PID
number in the parent. The child process inherits copies of almost all the parent's
attrib utes, of which the open files are perhaps the m ost important.
• Inherited shared file descriptors make possible much of the higher-level Unix se-
mantics and elegant shell control structures. This is one of the most fundamental
parts of the original Unix design. Because of descriptor sharing, a file isn't really
closed until the last open file descriptor is closed. This particularly affects pipes,
but it also affects the release of disk blocks for unlinked but still open files.
• The getpid () and getpp id () calls return the current and parent process ID
numbers, respectively. A process whose parent dies is reparented to the special
ini t process, PID 1. Thus, it 's possible for the PPID to change, and applications
should be prepared for this.
• The nice () system call lets you adjust your process's priority. The nicer you are
to other processes, the lower YOut priority, and vice versa. Only the superuser can
be less nice to other processes. On modern systems, especially single-user ones,
there's no real reason to change the nice value.
• The exec () system calls starts a new program running in an existing process. Six
different versions of the call provide flexibility in the setup of argument and envi-
ronment lists , at the cost of initial confusion as to which one is best to use. Two
variants simulate the shell's path searching mechanism and fall back to the use of
the shell to interpret the file in case it isn' t a binary executable; these variants
should be used with care.
• The new program's value for argv [0 1 normally comes from the filename being
executed, but this is only convention. As with fork ( ), a significant but not
identical set of attributes is inherited across an exec. Other attributes are reset to
reasonable default values.
• The atexi t () function registers callback functions to run in LIFO order when
a program terminates. The exi t ( ) ,_exi t ( ) , and _Exi t () functions all terminate
the program, passing a~ exit status back to the parent. exi t () cleans up open
FILE * streams and runs functions registered with atexi t ( ) . The other two
functio ns exi t immediately and should be used only when an exec has failed in a
9. 7 Sum m ary 343
forked child. Returning fro m main () is like calling e xi t () with the given return
value. In C99 and C++ , falling off the end of main () is the same as 'exi t (0) ,
but is bad practice.
• wa it () and wa i tpid ( ) are the POSIX functions for recovering a child's exit
status. Various macros let you determine whether the child exi ted normally, and
if so, to determine its exit status, or whether the child suffered death-by-signal
and if so, which signal committed the crime. With specific options, wai tpi d ( )
also provides information about children that haven't died but that have
changed state.
• GNU/Linux and most U nix systems support the BSD wai t3 () and wai t4 ( )
functions. GNU/Linux also supports the obsolescent union wa i t . The BSD
functions provide a struc t rusa ge , allowing access to CPU time information,
which can be handy. If wai tpi d () will suffice though, it's the most portable
way to go .
• Process groups are part of the larger job control mechanism, which includes signals,
sessions, and manipulatio n of the terminal's state. ge tpgrp () returns the current
process's process group ID, and ge tpgid () returns the PGID of a specific process.
Similarly, setpgrp () sets the current process's PGID to its PID, making it a
process group leader; se tpgid () lets a parent process set the PGID of a child
that hasn't yet exec'd.
• Pipes and FIFOs provide a one-way communications channel between twO pro-
cesses . Pipes must be set up by a common ancestor, whereas a FIFO can be used
by any two processes. Pipes are created with pipe ( ) , and FIFO files are created
with mkfi f o ( ) . Pipes and FIFOs buffer their data, stopping the producer or
co nsumer as the pipe fills up or empties o ut.
• dup () and dup2 ( ) create copies of open file descriptors. In combination with
close ( ) , they enable pipe file descriptors to be put in place as standard input and
output for pipelines. For pipes to work correctly, all copies of unused ends of the
pipes must be closed before exec' ing the target program(s). / dey / f d can be used
to create nonlinear pipelines, as demonstrated by the Bash and Korn shells' process
substitution capability.
344 Chapter 9 • Process M anagemenr and Pipes
• fcntl () is a catchall function for doing miscellaneous jobs. It manages attributes

of both the file descriptor itself and the file underlying the descriptor. In this
chapter, we saw that fcn tl () is used for the following:
• Duplicating a file descriptor, simulating dup () and almost simulating dup2 ( ) .
• Retrieving and setting the close-on-exec Bag. The close-on-exec Bag is the only
current file descriptor attribute, but it's an important one. It is not copied by
a dup () action but should be explicitly set on any file descriptors that should
not remam open after an exec. In practice, this should be done for most
file descriptors.
• Retrieving and setting Bags controlling the underlying file. Of these,
O_NONBLOCK is perhaps the most useful, at least for FIFOs and pipes. It is
definitely the most complicated Bag.
Exercises
1. Write a program that prints as much information as possible about the current
process: PID , PPID , open files, current directory, ni ce value, and so on. How
can you tell which files are open? If multiple file descriptors reference the same
file, so indicate. (Again, how can you tell?)
2. How do yo u think a texi t () stores the pointers to the callback functions?
Implement a texi t ( ), keeping the GNU "no arbitrary limits" principle in
mind. Sketch an outline (pseudocode) for exi t ( ) . What information
«stdi o. h > library internals) are you missing, the absence of which prevents
you from writing exi t ( ) ?
3. The xarg s program is designed to run a command and arguments multiple
times, when there would be too many arguments to pass directly on the com-
mand line. It does this by reading lines from standard input, treating each line
as a separate argument for the named command, and bundling arguments until
there are just enough to still be below the system maximum. For example:
$ grep ARG_MAX lusr/include/*.h lusr/include/*I*.h Command line
bash: Ibin/ gr ep: Argument list too long Shell's error message
$ find lusr/include -name '* ".h' I xargs grep ARG_ MAX findandxargsworks
l usr / include / sys / param . h:#define NCARGS ARG_MAX
9.8 Exercises 345
The constant ARG_MAX in < 1 i mi t s . h> represents the combined total memory
used by the environment and the command-line arguments. The POSIX
standard doesn ' t say whether this includes the pointer arrays or just the strings
themselves.
Write a simple version of xargs that works as described. Don' t forget the en-
vironment when calculating how much space you have. Be sure to manage your
memory carefully.
4. The layout of the status value filled in by wai t () and wai tp i d () isn ' t
defined by POSIX. Historically though, it's a 16-bit value that looks as shown
in Figure 9.8.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 o
I I I I I r
Exit status or stopping signal f CD Terminating signal
[ r I r r
FIGURE 9.8
Layout of status value from wai t ()
• A nonzero value in bits 0-7 indicates death-by-signal .

• All i-bits in the signal field indicates that the child process stopped. In this
case, bits 9-15 contain the signal number.
• A i-bit in bit 8 indicates death with core dump.
• If bits 0-7 are zero, the process exited normally. In this case, bits 9-1 5 are
the exit status .
Given this information, write the POSIX WIFEXIT ED () et al. macros.
5. Remembering that dup2 () closes the requested file descriptor first, implement
dup2 () using close () and fcntl ( ) . How will you handle the case that
fcn t l () returns a value lower than the one requested?
6. Does your system have a / dey / fd directory? If so , how is it implemented?
7. Write a new version of ch09 -pipeline . c that forks only one process. After
forking, the parent should rearrange its file descriptors and exec one of the new
programs itself.
8. (Hard.) How can you tell if your process ever called chroot ( ) ? Write a program
that checks and prints a message indicating yes or no. Can your program be
fooled? If so, how?
9. Does your system have a / proc directory? If so, what kind of per-process infor-
mation does it make available?
In this chapter
• 10.1 Introduction page 348

• 10.2 Signal Actions page 348
• 10 .3 Sta ndard C Signals: signal () and raise () page 349
• 10.4 Signal Handlers in Action page 353
• 10.5 The System V Release 3 Signal APls: sigset () et a l. page 365
• 10.6 POSIX Signals page 367
• 10.7 Signals for Interprocess Communication page 379
• 10.8 Important Specia l-Purpose Signals page 382
• 10.9 Signals Across f o rk () and exec () page 398
347
I T his chapter covers the ins and ours of signals, an important but complicated
part of the GNU/Linux API.
10.1 Introduction
A signal is an indication that some event has happened, for example, an attempt to
reference a memory address that isn't part of your program's address space, or when a
user presses CTRL-C to stop your program (called generating an interrupt).
Your program can tell only that a particular signal has happened at least once. Gen-
erally, you can't tell if the same signal has happened multiple times. You can distinguish
one signal from another, and control the way in which your program reacts to
different signals.
Signal handling mechanisms have evolved over time. As is the case with almost all
such mechanisms, both the original and the newer APIs are standardized and available.
However, of the fundamental APIs, signal handling displays possibly the broadest
change; there's a lot to get a handle on to be able to use the most capable APIs. As a
result, this is perhaps the most difficult chapter in the book. We'll do our best to make
a coherent presentation, but it'll help if you work your way through this chapter more
carefully than usual.
Unlike most of the chapters in this book, our presentation here is historical, covering
the APIs as they evolved, including some APIs that you should never use in new code. We
do this because it simplifies the presentation, making it straightforward to understand
why the POSIX sigaction ( ) API supports all the facilities that it does.
10.2 Signal Actions

Every signal (we provide a full list shordy) has a default action associated with it.
POSIX terms this the signal's disposition. This action is what the kernel does for the
process when a particular signal arrives. The default actions vary:
Termination
The process is terminated.
Ignored
The signal is ignored. The program is never aware that anything happened.
348
10. 3 Standard C Signals: s ignal () and raise ( ) 349
Core dump
The process is terminated, and the kernel creates a c o re fi le (in the process's current
directory) containing the image of the running program at the time the signal ar-
rived. The core dump can be used later with a debugger for examination of the
state of the program (see Chapter 15, "Debugging," page 567).
By default, GNU/Linux systems create files named co re . pid, where p i d is the
process ID of the killed process. (This can be changed; see sysctl(8 ). ) This naming
lets you store multiple core fi les in the same directory, at the expense of the disk
space involved. l Traditional Unix systems name the file co r e , and it's up to you
to save any c o re files for later reexamination if there's a chance that more will be
created in the same directory.
Stopped
The process is stopped. It may be continued later. (If you 've used shell job control
with CTRL-Z, fg , and bg, you understand stopping a process.)
10.3 Standard C Signals: signal () and r a ise ( )

The ISO C standard defines the original V 7 signal management API and a new API
for sending signals. You should use them for programs that have to work on non-POSIX
systems, or for cases in which the functionaliry provided by the ISO C APIs is adequate.
10.3.1 The signal () Function

You change a signal's action with the s i g na l () function. You can change the action
to one of "ignore this signa!," "restore the system's default action for this signal," or
"call my function with the signal number as a parameter when the signal occurs."
A function you provide to deal with the signal is called a signal handler (or just a
handler), and putting a handler in place is arranging to catch the signal.
With that introduction, let's proceed to the APIs. The <signa l . h > header file pro-
vides macro definitions for supported signals and declares the signal management
function provided by Standard C:
1 At least one vendo r of GNU/Linux distributions disables th e creation of co r e files "out of the box. " To ree nable
them, put the line 'ulim i t -5 -c u n limited' into your -/ . prof i le fil e.
350 Chapter 10 • Signals
#include <signal . h> ISOC
v oid (*signal(int signum , void ( * func) (int))) ( i nt) ;
This declaration for signal () is almost impossible to read. Thus, the GNU/Linux
signal(2) manpage defines it this way:
typedef void (*sighandler_t) (int);
s i ghandler_ t signal( i nt signum, sighandler_t handler);
Now it's more intelligible. The rype si ghandler_t is a pointer to a function return-
ing void, which accepts a single integer argument. This integer is the number of the
arriving signal.
The signal () function accepts a signal number as its first parameter and a pointer
to a function (the new handler) as its second argument. If not a function pointer, the
second argument may be either S1G_DFL , which means "restore the default action," or
S 1G_1GN, which means "ignore the signal. "
signal () changes the action for signum and returns the previous action. (This allows
you to later restore the previous action if you so desire.) The return value may also be
S1 G_E RR, which indicates that something went wrong. (Some signals can't be caught
or ignored; supplying a signal handler for them, or an invalid signum, generates this
error return.) Table 10.1 lists the signals available under GNU/Linux, their numeric
values, each one's default action, the formal standard or modern operating system that
defines them, and each one's meaning.
Older versions of the Bourne shell ( / bin l sh) associated traps, which are shell-level
signal handlers, directly with signal numbers. Thus, the well-rounded Unix programmer
needed to know not only the signal names for use from C code but also the correspond-
ing signal numbers! POSIX requires the trap command to understand symbolic signal
names (without the 'S1G' prefix), so this is no longer necessary. However (mostly against
our better judgment), we have provided the numbers in the interest of completeness
and because you may one day have to deal with a pre-POSIX shell script or ancient C
code that uses signal numbers directly.
g NOTE For some of the newer signals, from 16 on up , the association between
signal number and signal name isn' t necessarily the same across platforms!
1
1.:.'.-.
Check your system header files and manpages . Table 10.1 is correct for
GNU/ Linux.
10.3 Standard C Signals: si gnal () and rai se ( ) 351
TABLE 10.1
GNU/ Linux signals
Name Value Default Source Meaning

S IGHUP Term POSIX Hangup.
SIG INT 2 Term ISO C Interrupt.
SIGQUIT 3 Core POSIX Quit.
SIG ILL 4 Core ISO C Illegal instruction .
SIGTRAP 5 Core POSIX Trace trap.
SIGABRT 6 Core ISO C Abort.
SIGIOT 6 Core BSD lOT trap.
SIGBUS 7 Core BSD Bus error.
SIGFPE 8 Core ISO C Floating-point exception.
SIGKILL 9 Term POSIX Kill, unblockabl e.
SIGU SRI 10 Term POSIX User-defined signal l.
SIGSEGV 11 Co re ISO C Segmentation violation.
SI GUSR2 12 Term POSIX User-defined signal 2.
SIGP IPE 13 Term POSIX Broken pipe.
SIGALRM 14 Term POSIX Alarm clock.
SIGTERM 15 Term ISO C Termination.
SIGSTKFLT 16 Term Linux Stack fault on a processo r (unused).
SIGCHLD 17 Ignr POSIX Child process status changed.
SIGCLD 17 Ignr System V Same as SIGCHLD (for compatibility only).
SIGCONT 18 POSIX Continue if stopped.
S I GSTOP 19 Stop POSIX Stop, unblockable.
SIGTSTP 20 Stop POSIX Keyboard stop.
SIGTT I N 21 Stop POSIX Background read from tty.
SIGTTOU 22 Stop POSIX Background write to tty.
SIGURG 23 Ignr BSD Urgent condition on socket.
SIGXCPU 24 Core BSD CPU limit exceeded.
SIGXFSZ 25 Core BSD File size limit exceeded.
Name Value Default Source Meaning

S IGVTALRM 26 Term BSD Virtual alarm clock.
SIGPROF 27 Term BSD Profiling alarm clock.
SIGWINCH 28 19nr BSD Window size change.
S IGI O 29 Term BSD I/O now possible.
SIG POLL 29 Term System V Pollable event occurred: same as SIGIO (for
compatibility only).
SIGPWR 30 Term System V Power failure restart.
SIGSYS 31 Core POSIX Bad system call.
Key: Core: Terminate the process and produce a c ore file.
Ignr: Ignore the signal.
Stop : Sro p the process.
Term: Terminate the process.
Some systems also define other signals, such as SIGE MT, SIGLOST, and SIG INF O. The
GNU/Linux signal(7) manpage provides a complete listing; if your program needs to
handle signals not supported by GNU/Linux, the way to do it is with an #ifdef:
#ifdef SIGLOST
... handle SIGLOST here ...
#endif
With the exception of S IGSTKFLT, the signals listed in Table 10.1 are widely available
and don't need to be bracketed with #i fdef .
SIGKILL and SIGSTO P cannot be caught or ignored (or blocked, as described later
in the chapter). They always perform the default action listed in Table 10.1.
You can use 'kill -1 ' to see a list of supported signals. From one of our
GNU/Linux system s:
$ kill -1
1) SIGHUP 2) SIGINT 3) SIGQUI T 4) SIG ILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SI GFPE
9) SIGKILL 10) SIGUSRl 11 ) SIGSEGV 12) SI GUSR2
13) SIGP IPE 14) SIGALRM 15) SIGTERM 17 ) SIGCHLD
1 8) S IGCONT 19) SIGSTOP 20) SIGTSTP 21 ) SIGTTIN
22) SIGTTOU 23 ) S I GURG 24 ). SIGXCPU 25) SIGXFSZ
26) S IGVTALRM 27 ) SIG PROF 28 ) SIGWINCH 29) SI Gro
10.4 Signal Handlers in Ac[ion 353
30) SIGPWR 31) SIGSYS 32) SIGRTMIN 33 ) SIGRTMIN+1

34) SIGRTMIN+2 35) SIGRTMIN+3 36 ) SIGRTMIN+4 37 ) SIGRTMIN+5
38) SIGRTMIN+6 39) SIGRTMIN+7 40 ) SIGRTMIN+8 41 ) SIGRTMIN+9
42) SIGRTMIN+10 43) SIGRTMIN+11 44) SIGRTMIN+12 45 ) SIGRTMIN+13
46) SIGRTMIN+14 47) SIGRTMIN+15 48 ) SIGRTMAX-15 49) SIGRTMAX-14
50) S IGRTMAX-13 51) SIGRTMAX-12 52) SIGRTMAX-ll 53) SIGRTMAX-10
54) SIGRTMAX-9 55) SIGRTMAX-8 56) SIGRTMAX-7 57) SIGRTMAX-6
58) SIGRTMAX-5 59) SIGRTMAX-4 60) SIGRTMAX-3 61) SIGRTMAX-2
62) SIGRTMAX-1 63) SIGRTMAX
The SIGRTXXX signals are real-time signals, an advanced topic that we don 't cover.
10.3.2 Sending Signals Programmatically: raise ( )

Besides being generated externally, a program can send itself a signal directly, using
the Standard C function raise ( ) :
#include <signal . h> ISOC
inc raise(int sig) ;
This function sends the signal s i g to the calling process. (This action has its uses;
we show an example shortly.)
Because raise () is defined by Standard C, it is the most portable way for a process
to send itself a signal. There are other ways, which we discuss further on in the chapter.
10.4 Signal Handlers in Action

Much of the complication and variation shows up once a signal handler is in place,
as it is invoked, and after it returns.
10.4.1 Traditional Systems

After putting a signal handler in place, your program proceeds on its merry way.
Things don't get interesting until a signal comes in (for example, the user pressed
CTRL-C to interrupt your program or a call to raise () was made).
U pan receipt of the signal, the kernel stops the process wherever it may be. It then
simulates a procedure call to the signal handler, passing it the signal number as its sole
argument. The kernel arranges things such that a normal return from the signal handler
function (either through return or by falling off the end of the function) returns to
the point in the program at which the signal happened.
Once a signal has been handled, what happens the next time the same signal comes
in? Does the handler remain in place? Or is the signal's action reset to its default? The
answer, for historical reasons, is "it depends ." In particular, the C standard leaves it as
implementation defined.
In practice, V7 and traditional System V systems, such as Solaris, reset the signal's
action to the default.
Let's see a simple signal handler 1n action under Solaris. The following program,
chlO-c atchint. c , catches S I GINT . You normally generate this signal by typing
CTRL-C at the keyboard.
1 /* ch10-catchint . c --- catch a S I GI NT, at least once . * /
2
3 #in clude <signal.h>
4 #include <string .h>
5 #include <unis td . h>
6
7 /* h andler --- simple signal hand ler . */
8
9 void handler(int signum)
10 (
11 char buf [200J , *cp;
12 int offset;
13
14 / * Jump through hoop s t o avoid fprintf() . * /
15 strcpy (buf, "handl er: caught signal " ) ;
16 cp = buf + strl en (bu f ); / * cp p o ints at terminating ' \0 ' * /
17 if (signum> 100) /* unlikely * /
18 offset = 3;
19 else if (signum > 1 0)
20 offset 2;
21 el se
22 offset 1;
23 cp += offset ;
24
25 *cp- - = ' \0 '; / * terminate string * /
26 while (signum > 0) { / * work backwa r ds, filling in digits * /
27 *cp-- = ( signum % 1 0) + ' 0' ;
28 signum / = 10;
29
30 strcat(buf, " \n " );
31 (void) write(2, buf, strlen (buf ));
32
33
10.4 Signal Handlers in Acrion 355
34 /* main --- set up signal handling and go into infinite loop * /

35
36 int main(void)
37
38 (void) signal (SIGINT , handler ) ;
39
40 for ( ;; )
41 pause () ; / * wait for a signal, see later in the chapter * /
42
43 return 0 ;
44
Lines 9-32 define the signal handling function (cleverly named handle r ( ) ) . All
this func tion does is print the caught signal's number and return. It does a lot of man-
uallabor to generate the message, since fprint f ( ) is not "safe" for calling from within
a signal handler. (This is described shortly, in Section 10.4.6, "Additional Caveats, "
page 363.)
The main () function sets up the signal handler (line 38) and then goes into an infinite
loop (lines 40- 41). Here's what happens when it's run:
$ ssh solaris. example . com Log in to a handy Solaris system
Last login : Fri Sep 19 04 : 33 : 25 20 03 fr om 4.3 . 2 . 1 .
Sun Micr osystems Inc . SunOS 5 . 9 Generic May 2002
$ gee ehlO-eatehint.e Compile the program
$ a.out Run it
'Chandler : caught signal 2 Ty pe " C, handler is called
'C Try again, but this time ...
$ The program dies
Because V7 and other traditional systems reset the signal's action to the default, when
you wish to receive the signal again in the future, the handler fu n ction should immedi-
ately reinstall itself:
void handler ( int signum)
char buf[200], *cp;

int offset ;
(void) signal (signum, handler); /* reinstall handler * /
... rest of function as before ...

10.4.2 BSD and GNU/Linux

4.2 BSD changed the way s i gnal () worked.2 On BSD systems, the signal handler
remains in place after the handler returns. GNU/Linux systems follow the BSD behavior.
Here's what happens under GNU/Linux:
$ chlO-catchint Run the program
handler : c a ught signa l 2 Type I\C, handler is called
handler : c a ught signa l 2 And again ...
handler : caught signal 2 And again!
handle r: caught signal 2 Help!
handle r : c aug h t signal 2 How do we stop this?!
Quit (core dumped) 1\ \, generate SIGQUIT Whew
On a BSD or GNU/Linux system, a signal handler doesn' t need the extra
'signal (signum, handler)' call to reinstall the handler. However, the extra call also
doesn ' t hurt anything, since it maintains the status quo .
In fact, POSIX provides a bsd_signal () function, which is identical to s i gnal (),
except that it guarantees that the signal handler stays installed:
#include <si g na l . h> XSI, Obsolescent
void ( *bsd_si gnal (int sig, void (*fun c) (int))) (int);
This eliminates the portability issues. If you know your program will tun only on POSIX
systems, you may wish to use bsd_ s i gna l () instead of signal () .
One caveat is that this function is also marked "obsolescent," meaning that it can be
withdrawn from a future standard. In practice, even if it's wi thdrawn , vendors will
likely continue to support it for a long time. (As we'll see, the POSIX sigac ti on ( )
API provides enough facilities to let you write a workalike version, should you need to.)
10.4.3 Ignoring Signals

More practically, when a signal handler is invoked, it usually means that the program
should finish up and exit. It wo uld be annoying if most programs, upon receipt
of a SIG I NT , printed a message and continued; the point of the signal is that they
should stop!
2 Changing the behavior was a bad idea, thoroughly criticized at the tim e, but it was too late. Changin g (h e semantics
of a defin ed interface always leads to trouble, as it did here. While especially true for operating system designers,
anyone designing a general-purpose library should keep (his lesson in mind as well.
10.4 Signal H andlers in An ion 357
For example, consider the sor t program. so r t may have created any number of
temporary files for use in intermediate stages of the so rting process. U pon receipt of a
SIG INT , s ort should remove the temporary files and then exit. H ere is a simplified
version of the signal handler from the GNU Coreutils sort . c:
/ * Hand le interrupts and hangup s. */ Simplified for presentation
static void
sighandl er (int sig)
{
signal (sig , SIG_IGN ) ; Ignore this signal from now on
cl eanup (); Clean up after ourselves
signa l (sig, SIG_DFL ) ; Restore default action
rai se (sig) ; Now resend the signal
Setting the action to SIG_ IGN ensures that any further SIGINT signals that come in
won' t affect the clean -up action in progress. Once cleanup () is done, rese tting the
action to S IG_ DF L allows the sys tem to dump co re if the signal that came in would do
so. Calling r ais e () regenerates the signal. T he regenerated signal then invokes the
default action, which most likely terminates the program. (We show the full so r t . c
signal handler later in this chapter. )
10.4.4 Restartable System Calls

The EINTR value for e rrno (see Section 4.3 , " Determining What Went Wrong,"
page 86) indicates that a system call was interrup ted. While a large number of system
calls can fail with this error value, the two m os t importan t o nes are read () and
wr i te ( ) . Consider the following code:
void handle r ( in t signa l} { /* handle signa ls */ }
int main( int argc, cha r **argv }
signal (SIGINT, handl er } ;
while (( c ount = read ( fd, buf, sizeof buf)} > O} {

/ * proces s the buffe r * /
if (count == O)
/* end of fil e , clean up etc . * /
el se if (count -1)
/* fai lure * /
Suppose that the system has successfully read (and filled in) part of the buffer when
a SIGINT occurs. The read () system call has not yet returned from the kernel to the
program, but the kernel decides that it can deliver the signal. handl er () is called, runs,
and returns into the middle of the read ( ) . What does read () return?
In days of yore 017, earlier System V systems), read ( ) would return - 1 and set errno
to EINTR. There was no way to tell that data had been transferred. In this case, V7 and
System V act as if nothing happened: No data are transferred to or from the user's
buffer, and the file offset isn't changed.
4.2 BSD changed this. There were two cases:
Slow devices
A "slow device" is essentially a terminal or almost anything but a regular file. In
this case, read () could fail with EINTR only if no data were transferred when the
signal arrived. Otherwise, the system call would be restarted, and read () would
return normally.
Regular files
The system call would be restarted. In this case, read () would return normally;
the return value would be either as many bytes as were requested or the number
of byres acrually readable (s uch as when reading close to the end of the file) .
The BSD behavior is clearly valuable; you can always tell how much data you've read.
The POSIX behavior is similar, but not identical, to the original BSD behavior.
POSIX indicates that read ( ) 3 fails with EINTR only if a signal occurred before any
data were transferred. Although POSIX doesn't say anything about "slow devices," in
practice, this condition only occurs on such devices.
Otherwise, if a signal interrupts a partially successful rea d ( ), the return is the
number of bytes read so far. For this reason (as well as being able handle short files),
you should always check the return value from read () and never assume that it read
the full number of byres requested. (The POSIX sigact i on () API, described later,
allows you to get the behavior of BSD restartable system calls if you want it.)
3 Although we are describing r e a d ( ) , me rules apply to all system calls that can fail wim EINTR, such as th ose of
th e wai t () family.
10.4 Signal H andlers in Anion 359
10.4.4.1 Example: GNU Coreutils safe_read ( ) and safe_wri te ( )

The GNU Coreutils use two routines, safe_ read () and s afe_wri te ( ) , to handle
the E INTR case on traditional systems. The code is a bit complicated by the fact that
the same file , by means of #inc lud e and macros , implements both functions. From
l ib / sa f e-read. c in the Coreutils distribution:
1 /* An interface to read and write that retries after interrupts .
2 Copyright (C) 1993, 1994, 1998 , 2002 F ree Software Foundation, Inc .
... lots of boilerpla te stuffomitted
56
57 #ifdef SAFE_WRITE
58 # include " safe-write . h"
59 # define safe_rw safe_ write Crea te safe_write()
60 # define rw write Use write() system call
61 #else
62 # include "safe-read . h"
63 # define safe_rw safe_read Crea te safeJead()
64 # define rw read Use read() system call
65 # undef const
66 # define const / * empty */
67 #endif
68
69 / * Read(write) up to COUNT bytes at BUF from(to) descriptor FD, retrying if
70 interrupted . Return the actua l number of bytes read(written) , zero for EOF,
71 or SAFE_READ_ ERROR(SAFE_WRITE_ ERROR) upon error . */
72 size_t
73 safe_rw (int fd , v oid const *buf , size_t count)
74 (
75 ss i ze_t result ;
76
77 /* POSIX limits COUNT to SSIZE_MAX , but we limit i t further, requir i ng
78 that COUNT <= I NT_ MAX , to avoid t r iggering a bug in Tru64 5 . 1 .
79 When decreas i ng COUNT, k eep t h e f i le pointer block - aligned .
80 Note that in any cas e , read(write) may succeed, yet read(write)
81 fewer than COUNT b y tes , so the caller must be prepa r ed to handle
82 pa r tial results . * /
83 if (count > INT_MAX)
84 coun t INT MAX & -8191;
85
86 do
87 {
88 re s ult = rw (fd , bu f, count) ;
89
90 while (r e s ul t < 0 && IS_E I NTR (errno)) ;
91
92 return (siz e _t) result ;
93
Lines 57-67 handle the definitions, creating safe_r ead () and safe_wri te ( ) , as
appropriate (see saf e_wri te . c , below).
360 Chapter lO • Signals
Lines 77-84 are indicative of the kinds of complications found in the real world.
Here, one particular Unix variant can' t handle count values greater than I NT_ MAX, so
lines 83-84 perform two operations at once: reducing the count to below I NT_ MAX and
keeping the amount a multiple of 8192. The latter operation maintains the efficiency
of the I/O operations: Doing I/O in multiples of the fundamental disk block size is al-
ways more efficient than doing it in odd amounts. As the comment notes, the code
maintains the semantics of re ad () and wri te ( ) , where the returned count may be
less than the requested count.
Note that the c ount parameter can indeed be greater than INT_ MAX, since coun t is
a si ze_t, which is u n signed. I NT_MAX is a plain i n t , which on all modern systems
is signed.
Lines 86-90 are the actual loop, performing the operation repeatedly, as long as it
fails with EINTR. The I S_ EINTR () macro isn't shown, but it handles the case for systems
on which EINTR isn't defined. (There must be at least one out there or the code wouldn't
bother setting up the macro; it was probably done for a Unix or POSIX emulation on
top of a non-Unix system.)
Here is safe_ wri te. c :
1 1* An int e rfa ce to write that re t r i es a f ter i nterrupts .
2 Copyri ght (C ) 2002 Free So ft ware Foundation, Inc .
... lots of boilerplate stuff omitted ..
17
18 #def ine SAFE_WRITE
19 #include ·saf e- read . c·
The #define on line 18 defines SAFE_WRITE; this ties In to lines 57-60 III
safe- read.c.
10.4.4.2 GLlBC Only: TEMP _FAILURE_RETRY ( )

The GLIBC <unistd . h > file defines a macro , TEMP_FA I LURE_ RETRY () , that you
can use to encapsulate any sys tem call that can fail and set e rrn o to EI NTR. Irs "decla-
ration" is as follows:
#include <un i std . h > CUBe
l o ng int TEMP_FA ILURE_ RETRY(expression ) ;
Here is the macro's definition:

10.4 Signal Handlers in Action 361
/ * Evaluate EXPR ESSION , and repeat as long as it returns - 1 with 'e r rno'
set to EINTR . * /
# define TEMP_FAILURE_ RETRY(expression )

( __ e x tens ion __
( { long i nt __ r e s u l t ;
do __ res u lt = (long int ) (expr ession ) ;
while (__ re s ult == -lL && errno == EINTR ) ;
__ result ; )))
The macro uses a Gee extension to the e language (as marked by the
__ extensi on __ keyword) which allows brace-enclosed statements inside parentheses
to return a value, thus acting like a simple expression.
Using this macro, we might rewrite sa f e_ read () as follows:

size_t safe_read ( int fd, void const *bu f, s i z e_t count)
ssize_ t result;
/ * Limit count as per conune n t earlier . */

if ( count> INT_MAX)
count = INT_MAX & -8191 ;
resul t = TEMP_FAILURE_RETRY(read ( fd, buf , coune )) ;
retur n ( s i z e_t) result ;
10.4.5 Race Conditions and sig_at omic_t (ISO C)

So far, handling one signal at a time looks straightforward: install a signal handler
in ma in () and (optionally) have the signal handler reinstall itself (or set the action to
S I G_I GN) as the first thing it does.
What happens though if two identical signals come in, right after the other? In par-
ticular, what if your system resets the signal's action to the default, and the second one
comes in after the signal handler is called but before it can reinstall itself?
Or, suppose you're using bsd_signa l ( ) , so the handler stays installed, but the
second signal is different from the first one? Usually, the first signal handler needs to
complete its job before the second one runs, and every signal handler shouldn' t have
to temporarily ignore all other possible signals!
Both of these are race conditions. One workaround for these problems is to make
signal handlers as simple as possible. You can do this by creating flag variables that
indicate that a signal occurred. The signal handler sets the variable to true and returns.
Then the main logic checks the Bag variable at strategic points:
/ * signal handler sets to true * /
void int_handler ( in t signum )

(
int main(int argc, char **argv)
bsd_signal(SIGINT, int_handler);
.. . program proceeds on ...
i f (sig_in t_flag ) {
/ * S I GINT occurred, handle it * /
... rest of logic ...
(Note that this strategy reduces the window of vulnerability but does not eliminate it.)
Standard C introduces a special type-s ig_atomic_t-for use with such Bag vari-
ables. The idea behind the name is that assignments to variables of this type are atomic:
That is , they happen in one indivisible action. For example, on most machines, assign-
ment to an int value happens atomically, whereas a structure assignment is likely to
be done either by copying all the bytes with a (co mpiler-generated) loop, or by issuing
a "block move" instruction that can be interrupted. Since assignment to a sig_atomic_t
value is atomic, once started, it completes before another signal can come in and
. .
mterrupt It.
Having a special type is only part of the story. sig_ atomi c_t variables should also
be declared volatile:
v olatile sig_atomic_t sig_int_fl ag = 0; / * signal handl er sets to true * /
... rest of code as before ...
The volatile keyword tells the compiler that the variable can be changed externally,
behind the compiler's back, so to speak. This keeps the compiler from doing optimiza-
tions that might otherwise affect the code's correctness.
Structuring an application exclusively around sig_atomic_t variables is not reliable.
The correct way to deal with signals is shown later, in Section 10.7, "Signals for Inter-
process Communication," page 379.
10.4 Signal H andlers in Anion 363
10.4.6 Additional Caveats

T he POSIX standard provides several caveats for signal handlers:
• It is undefined what happens when handlers for SIGFPE, SIGILL , SIGSEGV, or

any other signals that represent "computation exceptions" return.
• If a handler was invoked as a result of calls to abort () , rai se ( ), or ki ll ( ) , the
handler cannot call raise ( ) . a bor t ( ) is described in Section 12.4, "Committing
Suicide: abo rt ( ) ," page 445, and kill () is described later in this chapter. (The
s igac tion ( ) A PI , with the three-argument signal handler described later, makes
it possible to tell if this is the case.)
• Signal handlers can only call the fun ctions in Table 10.2. In particular, they sho uld
avoid <stdio. h> functions. The problem is that an interrupt may come in while
a <s td i o . h> functio n is running, when th e internal state of the lib rary is in the
middle of being updated. Further calls to <s td io. h> fu nctio ns could corrupt the
internal state.
The list in Table 10 .2 comes from Section 2.4 of the System Interfaces vo lume of the
2001 POSIX standard. Many of these function s are advanced APIs not otherwise covered
in this volume.
10.4.7 Our Story So Far, Episode I

Signals are a complicated topi c, and it's about to get more confusing. So, let's pause
for a m o ment, take a step back, and summarize what we've discussed so far:
• Signals are an indication that some external event has occurred.

• raise () is the ISO C function for sending signals to the current process. We have
yet to describe how to send signals to other processes.
• signal () controls the dispositio n of a signal: that is, the process's reaction to the
signal when it comes in. The signal may be left set to the system default, ignored ,
o r caught.
• A handler function runs when a signal is caught. Here is where complexity starts
to rear its ugly head:
• ISO C leaves as unspecified whether signal disposition is restored to its default
before the handler runs or whether the disposi tion remains in place. The former
TABLE 10.2
Functions that can be called from a signal handler
_Exit () fpathconf () raise () sigqueue ()

_exit () fstat () read () s igset ()
accept () fsync () readlink () sigsuspend ()
acce ss () f trunca te ( ) recv( ) sle ep ()
ai o_error ( ) getegid () recvfrom( ) socket ()
ai o_return ( ) geteuid () re cvmsg () socketpair ( )
ai o _suspend ( ) getgid( ) rename () stat ()
alarm () getgroups ( ) rmdir () symlink( )
bind( ) getpeername () se l ect() sysc on f ()
cfgetispeed () getpgrp () sem-post () tcdrain ( )
cfgetospeed () getpi d( ) send( ) tcflow()
c fsetisp eed () getppid () sendmsg( ) tcflush ()
cfsetospeed () gets oc kname ( ) sendt o () tcgetattr ()
chdir () gets ockopt ( ) setgid( ) tcgetpgrp ( )
chmod( ) getu id () setpgid () tcsendbreak ( )
chown () kill () setsid( ) tcsetattr ()
clo ck_ get time() link () setso ckopt ( ) tcsetpgrp ( )
clos e () list en () setuid () time ()
connect () lseek () shutdown ( ) timer_ getoverrun()
crea t () lstat () sigacti on ( ) timer_gettime ()
dup() mkdir () sigaddset () time r_s ettime ()
dup2 () mkfif o () sigdel set ( ) times ()
exec le () open ( ) sigemptyset ( ) uma sk( )
execve () pathconf () sigfillset () uname ()
fchmod( ) pause () sigismember () unlink ()
fch own () pipe () signa l () utime ()
fcnt! () poll () sigpaus e () wait ()
fdatasync () posix_ trace_event () sigpending () waitpi d ()
fork( ) psel ect () sigprocmask ( ) writ e ()
10.5 The System V Release 3 Signal APls: sigset () et al. 365
is the behavior ofV7 and modern System V sys tems such as Solaris. The latter
is the BSD behavior also found on GNU/Linux. (The POSIX b sd_si gnal ( )
function may be used to force BSD behavior.)
• What happens when a system call is interrupted by a signal also varies along
the traditional vs. BSD line. Traditional systems return -1 with errno set to
EINTR. BSD systems restart the system call after the handler rerurns. The GLIBC
TEMP _FA ILURE_RETRY () macro can help you write code to handle system calls
that return -1 with errno set to EINTR.
POSIX requires that a system call that has partially completed return a success
value indicating how much succeeded. A system call that hasn 't started yet
is restarted.
• The s i gnal () mechanism provides fertile ground for growing race conditions.
The ISO C sig_atomic_t data type helps with this situation but d oesn't solve
it, and the mechanism as defined can' t be made safe from race conditions.
• A number of additional caveats apply, and in particular, only a subset of the
standard library functions can be safely called from within a signal handler.
Despite the problems, for simple programs, the signa l () interface is adequate, and
it is still widely used.
10.S The System V Release 3 Signal APls: sigset () et al.

4.0 BSD (circa 1980) introduced additional APIs to provide "reliable" signals. 4 In
particular, it became possible to block signals. In other words, a program could tell the
kernel, "hang on to these particular signals for the next little while, and then deliver
them to me when I'm ready to take them." A big advantage is that (his feature simplifies
signal handlers, which automatically run with their own signal blocked (to avoid the
two-signals-in-a-row problem) and possibly with others blocked as well.
System V Release 3 (circa 1984) picked up these APIs and popularized them; in most
Unix-related documentation and books, you'll probably see these APIs referred to as
being from System V Release 3. The functions are as follows:
4 The AP ls required linking with a separare library, -lj o bs, in order to be u sed.
#include <si gnal.h> XSI
int sighold ( int sig ) ; Add sig to process signal mask

int sigre l se ( i nt si g); Remove sig from process signal mask
int sigigno r e ( i nt sig ); Short for sigset(sig, SIC_/GN)

int sigpa u s e ( int sig); Suspend process, allowsigtocome in
v o id ( *sigs et(int si g , void (*d isp ) ( int )))(int) ; sighandler_tsigset(intsig,sighandler_tdisp);
The POSIX standard for these functions describes their behavior in terms of each
process's process signal mask. The process signal mask tracks which signals (if any) a
process currently has blocked. This is described in more detail in Section 10.6.2, "Signal
Sets: s i g set_t and Related Functions," page 368 . In the System V Release 3 API there
is no way to retrieve or modi fY the process signal mask as a whole. The functions work
as follows:
in t si ghold(int s ig )
Adds sig to the list of blocked processes (the process signal mask) .
int si grelse (in t sig )
Removes (releases) sig from the process signal mask.
int si gignore( int sig )
Ignores sig . This is a convenience function .
i nt sigpause(int si g )
Removes sig from the process signal mask, and then suspends the process until
a signal comes in (see Section 10. 7, "Signals for Interprocess Communication ,"
page 379).
sighandler_t s igset(int sig, s i ghand ler_t disp )
Is a replacement for signal ( ) . (We've used the GNU/Linux manpage notation
here to make the declaration easier to read. )
For sigset ( ), the h andl e r argument can be SIG_DFL, SIG_IGN, or a function

pointer, just as for signal () . However, it may also be S I G_ HOLD. In this case, sig is
added to the process's process signal mask, but its associated action is otherwise un-
changed. (In other words, if it had a handler, the handler is still in place; if it was the
default action, that has not changed.)
When si gse t () is used to install a signal handler and the signal comes in, the kernel
first adds the signal to the process signal mask, blocking any additional receipt of that
signal. The handler runs, and when it returns, the kernel restores the process signal
10.6 POSIX Signals 367
mask to what it was before the handler ran. (In the POSIX model, if a signal handler
changes the signal mask, that change is overridden by the restoration of the previous
mask when the handler returns.)
s ighold ( ) and sigrel se () may be used together to bracket so-called critical sections
of code: chunks of code that should not be interrupted by particular signals so th at no
data structures are corrupted by code from a signal handler.
I.'•.·
NOTE POSIX standardizes these APls, since a major goal of POSIX is to
• formalize existing practice , wherever possible. However, the si gact i on ( ) APls
described shortly let you do everything that these APls do, and more . You should
~III not use these APls in new programs. Instead, use siga c tion () . (We note that
1m there isn 't eve n asigset(2) GNU / Linu x manpage!)
:;;-:::
10.6 POSIX Signals

The POSIXAPI is based on the s igvec () API from 4.2 and 4.3 BSD. With minor
changes, this API was able to subsume the functionality of bo th the V7 and System V
Release 3 APIs. POSIX made these changes and renamed the API siga ction ( ). Because
the s i gvec () interface was not widely used, we don ' t des cribe it. Instead, this section
describes only sigact ion ( ), which is what you should use anyway. (Indeed, the
4.4 BSD manuals from 1994 mark s igvec () as obsolete, pointing the reader to
sigaction ( ) .)
10.6.1 Uncovering the Problem

What's wrong with the System V Release 3 APIs? After all, they provide signal
blocking, so signals aren' t lost and any given signal can be handled reliably.
The answer is that the API works with only one signal at a time. Programs generally
handle more than one signal. And when you're in the middle of handling one signal,
you don' t want to have to worry about handling ano ther one. (Suppose you've just
answered your office phone when your cell phone starts ringing: You'd prefer to have
the phone system tell your caller you're on ano ther line and you'll be there shortly, in-
stead of having to do it yourself.)
With the sigs et () API , each signal handler would have to temporarily block all
the other signals, do its job, and then unblock them. The problem is that in the interval
between any two calls to sigho ld ( ), a not-yet-blocked signal could come up. The
scenario is rife, once again, with race conditions .
The solution is to make it possible to work with groups of signals atomically, that
is, with one system call. You effect this by working with signal sets and the process
signal mask.
10.6.2 Signal Sets: sigset_t and Related Functions

The process signal mask is a list of signals that a process currently has blocked. The
strength of the POSIX API is that the process signal mask can be manipulated atomi-
cally, as a whole.
The process signal mask is represented programmatically with a signal set. This is the
sigset_t type. Co nceptually, it's just a bitmask, with 0 and 1 val ues in the mask
representing a particular signal's absence or presence in the mask:
/ * Si gnal mas k manipulated dire c tly . DO NOT DO THIS' * /
int mas k = (1 « SIGHUP) I (1 « SIGINT) ; / * bitmask f or SIGHU P and SIGINT * /
However, because a system can have more signals than can be held in a single int
or long and because heavy use of the bitwise operators is hard to read, several APIs
exist to m anipulate signal sets:
#include <signal.h> POSIX
i nt sigemptyset(sig se t _t *set ) ;
in t sigfillset(sigset_ t *se t) ;
in t s igaddset(sigset_t *set, in t signum) ;
int sigdelset(sigset_t *set, in t signum ) ;
int sigismember(const sigset_t * set , i n t signum);
The functions are as follows :
int sigemptyset(sigset_t *set )

Empties o ut a signal set. Upon return, *set has no signals in it. Returns 0 on
success or - 1 on error.
int sigfillset(sigs et_t *set)
Completely fills in a signal set. Upon return, *set contains all the signals defined
by the system. Returns 0 on success or -1 on error.
int sigaddset(sig set_ t *se t, int signum)
Adds signum to the process signal mask in *set. Returns 0 on success or -1
on error.
in t s i gde1set(s i g s et_ t *set, int s i gn um)

Removes s i gnum from the process signal mask in *set . Returns 0 on success or
-1 on error.
int s i g i smember(const sigset_t *set, in t signum)

Returns true/false if s i gnum is or isn' t present in *se t.
You must always call one of s igemptyset () or s i gf i 11 set () before doing anything
else with a sigs e t_t variable. Both interfaces exist because sometimes you want to
start out with an empty set and then just work with one or two signals, and other times
you want to work with all signals , possibly taking away one or two.
10.6.3 Managing the Signal Mask: sigpro cmask () et al.

The process signal mask starts out empty-initially, no signals are blocked. (This is
a simplification; see Section 10.9, "Signals Across f o rk () and exec () ," page 398.)
Three functions let you work directly with the process signal mask:
# i nclude <signal . h> POSIX
i n t sigprocma s k(int how , const sigs et_t 's et , sigset_t ' oldset) ;
int sigpending(s i gs e t_t ' set) ;
int sigsuspend(cons t sigset_ t 's et) ;
The functions are as follows:
int s i gp r ocmas k ( i n t how , con st sigset_ t *set, s i gset_ t *oldset)

If oldset is not NU L L , the current process signal mask is retrieved and placed in
*olds e t . The process signal mask is then updated, according to the contents of
se t and the value of how, which must be one of the following:
Merge the signals in *se t with the current process signal mask.
The new mask is the union of the current mask and *set .
Remove the signals in *set from the process signal mask. It
is not a problem if *set contains a signal that is not currently
in the process signal mask.
SI G SETMASK Replace the process signal mask with the contents of * s e t .
If s e t is NULL and oldset isn't, the value of h ow isn't important. This combination
retrieves the current process signal mask without changing it. (This is explicit in
the POSIX standard but isn't clear from the GNU/Linux manpage.)
int sigpending(sigset_t *set)

This function lets you see which signals are pending: That is, * set is filled in with
those signals that have been sent but that haven't yet been delivered since they' re
blocked.
int sigsuspend(const sigse t_t *set )
This function temporarily replaces the process's process signal mask with * set ,
and then suspends the process until a signal is received. By definition, only a signal
not in *s e t can cause the function to return (see Section 10.7, "Signals for Inter-
process Communication," page 379).
10.6.4 Catching Signals: s igaction ( )

Finally, we' re ready to look at the sigaction () function. This function is compli-
cated, and we intentionally omit many details that are only for advanced uses. The
POSIX standard and the sigaction(2) manpage provide full details, although yo u must
carefully read both to fully absorb everything.
#include <signal.h> POSIX
int sigact i on ( int signum, const st r uct sigacti on *act, struct sigacti on *oldact ) ;
The arguments are as follows:
int s ignum
The signal of interes t, as with the other signal handling functions.
const st ruct sigaction *act
The new handler specification for signal signum.
st ru c t sigaction *oldact
The current handler specification. If not NULL, the system fills in *oldact before
installing *act. *act can be NULL, in which case *oldact is filled in, but n othing
else changes.
Thus, sigacti on () both sets the new handler and retrieves the old one, in one shot.
The struct sigacti on looks like this:
/ * NOTE : Order in struct may vary . There may be other fields too! * /
struct sigact i on {
Additional signals to block
int sa_flags ; Control behavior
void (*sa_handler) (int) ; May be union with sa_sigaction
void (*sa_sigaction) (int , siginfo_t * v o id * ) ; May be union with sa_handler
The fields are as follows:
sigset_t sa_mask
A set of additional signals to block when the signal handler function runs. Thus,
when the handler is invoked, the total set of blocked signals is the union of those
in the process signal mask, those in act->sa_mask, and, if SA_NODEFER is clear,
signum.
int sa_flags
Flags that control the kernel 's handling of the signal. See the discussion further on.
void (*sa_handler) (int)
A pointer to a "traditional" handler function. It has the same signature (return
type and parameter list) as the handler functions for s igna l (), bsd_signal () ,
and sigset ( ) .
vo id (*sa_si gaction) (int, siginfo_t *, void *)
A pointer to a "new style" handler function. The function takes three arguments,
as described shortly.
Which of act->sa_handler and act->sa_ sigaction is used depends on the

SA_SIGINFO flag in act->sa_flags. When present, act->sa_sigaction is used;
otherwise, act->sa_handler is used. Both POSIX and the GNU/Linux manpage
point out that these two fields may overlap in storage (that is, be part of a union). Thus,
you should never use both fields in the same struct sigacti on.
The sa_flags field is the bitwise OR of one or more of the flag values listed in
Table 10.3.
When the SA_SIGINFO flag is set in act->sa_f lags, then the act->sa_sigaction
field is a pointer to a function declared as follows:
TABLE 10.3
Flag values for sa_flags
Flag Meaning
SA_NOCLDSTOP This flag is only meaningful for SIGCHLD. When set, the parent does not
receive the signal when a child process is stopped by SIGSTOP, SIGTSTP,
SIGTTIN, or SIGTTOU. These signals are discussed later, in Section 10.8.2,
page 383.
SA_NOCLDWAIT This flag is only meaningful for S IGCHLD. Its behavior is complicated.
We delay explanation until later in the chapter; see Section 10.8.3,
page 385.
SA_NODEFER Normally, the given signal is blocked while the signal handler runs. When
one of these flags is set, the given signal is not blocked while the signal
handler runs. SA_NODEFER is the official POSIX name of the flag (which
you should use).
SA_NOMASK An alternative name for SA_NODEFER. ·
SA_SIGINFO The signal handler takes three arguments. As mentioned, with this flag
set, the sa_sigaction field should be used instead of sa_handler.
SA_ONSTACK This is an advanced feature . Signal handlers can be called, using user-
provided memory as an "alternative signal stack." Such memory is given
to the kernel for this use with sigaltstack () (see sigaltstack(2)). This
feature is not otherwise described in this volume.
SA_RESETHAND This flag provides the V7 behavior: The signal's action is reset to its default
when the handler is called. SA_RESETHAND is the official POSIX name
of the flag (which you should use).
SA_ONESHOT An alternative name for SA_RESETHAND.
SA_RESTART This flag provides BSD semantics: System calls that can fail with EINTR,
and that receive this signal, are restarted.
As far as we could derermine, rhe names SA_NOMASK and SA_ONESHOT are specific ro GNU/ Linux. If anyone
knows differenrly, please inform us!
void action_handler(int sig, siginfo_t *info, void *context )

{
/ * handler body here * /
The siginfo_t structure provides a wealth of information about the signal:

/ * POSIX 2001 definition. Actual contents likely to vary across systems . * /

typedef struct (
int si_signo; / * signal number * /
int si_errno; / * <errno . h> value if an error */
int si_code ; /* signal code; see text */
pid_t si-pid ; /* process ID of process that sent signal */
uid_t si_uid ; /* real UID of sending process * /
void *si_addr; /* address of instruction that faulted */
int si_status; / * exit value , may include death-by-signal * /
long si_band ; / * band event for SIGPOLL/SIGIO */
union sigval si_value ; / * signal value (advanced) */
siginfo_t;
The si_signo, sLcode , and si_value fields are available for all signals. The
other fields can be members of a uni on and thus should be used only for the signals
for which they' re defined. There may also be other fields in the siginf o_ t structure.
Almost all the fields are for advanced uses. The full details are in the POSIX standard
and in the sigaction(2) manpage. However, we can describe a straightforward use of the
s i code field.
For SIGBUS, SIGCHLD, S I GFPE, S I GILL , S I GP OLL, SIGS EGV, and SIGTRAP, the
si_code field can take on any of a set of predefined values specific to each signal, indi-
cating the cause of the signal. Frankly, the details are a bit overwhelming; everyday code
doesn't really need to deal with them (although we'll look at the values for S I GCHLD
later on). For all other signals, the s i_code member has one of the values in Table 10.4.
TA BLE 10.4
Signal origin values for si_code
Val u e G LlBC o nly Me a n ing

SI_ASYNC10 Asynchronous I/O completed (advanced).
SI_KERNEL ./ Kernel sent the signal.
SI_ME SGQ Message queue state changed (advanced).
SI_QUEUE Signal sent from sigqueue () (advanced).
SI SIGI O ./ A SIGI O was queued (advanced).
A timer expired.
Signal sent by kill () . raise () , and abort () are allowed
to produce this too , bur are not required ro o
In particular, the SCUSER value is useful; it allows a signal handler to tell if the signal
was sent by raise () or kill () (described later) . You can use this information to avoid
calling rai se () or ki ll () a second time.
The third argument to a three-argument signal handler, void *context, is an ad-
vanced feature, not otherwise discussed in this volume.
Finally, to see sigaction () in use, examine the full text of the signal handler for
sort . c :
2074 static void
2075 sighandler (int sig)
2076 {
2077 #ifndef SA_NOCLDSTOP On old style system ...
2078 signal ( sig, SIG_IGN ); - Use signal() to ignore sig
2079 #endif - Otherwise, sig automatical!y blocked
2080
2081 cleanup (); Run cleanup code
2082
2083 #ifdef SA_NOCLDSTOP On POSIX style system
2084 {
2085 struct sigaction sigact ;
2086
2087 sigact . sa_handler = SIG_DFL; - Set action to default
2088 sigemptyset (&sigact .sa_mask ) ; - No additional signals to block
2089 sigact.sa_flags = 0; - No special action to take
2090 sigaction (sig, &sigact, NULL ) ; - Put it in place
2091
2092 #else On old style system
2093 signal ( si g, SIG_DFL ) ; - Set action to default
2094 #endif
2095
2096 raise ( sig); Resend the signal
2097
Here is the code in main () that puts the handler in place:

22 14 #ifdef SA_NOCLDSTOP On a POSIX system ...
2215 {
22 16 unsigned i;
2217 sigemptyset (&caught_signals);
2218 for ( i = 0; i < nsigs; i++ ) - Block all signals
22 19 sigaddse t (&caught_signals, sigs [i] ) ;
2220 newact . sa_handler = sighandl er; - Signal handling function
222 1 newact . sa_mask = caught_signal s; - Set process signal mask for handler
2222 newact. sa_flags = 0; - No special flags
2223
2224 #endif
2225
2226
2221 uns i gned i ;
2228 fo r ( i = 0 ; i < ns igs; i++ ) For all signals.
2229
2230 in tsig si gs[i] ;
223 7 #ifdef SA_NOC LDSTOP
2232 s i ga c t i on (si g , NULL , &oldact ) ; - Retrieve old handler
2233 i f (olda ct . s a _han dler '= SIG_IGN ) - If not ignoring this signal
2234 s i g action (sig , &newact , NULL ) ; - Install our handler
2235 #else
2236 if ( signal (sig , SIG_IGN) ! = S I G_IGN )
2231 signal (s ig , s ighandler) ; - Same logic with old API
2238 #endif
2239
2240
We note that lines 2216-2219 and 2221 could be replaced with the single call:
sigfillset (& newact . sa_mask ) ;
We don 't know why the code is written the way it is.
Also of interest are lines 2233-2234 and 2236-2237, which show the correct way
to check whether a signal is being ignored and to install a handler only if it's not.
I NOTE The sigaction () API and the signal () API should not be used
!iiI
@
together for the same signal. Although POSI X goes to great lengths to make it
~,:.\,:;., PhOssdi?le to ~se sfji gnal () initially, redtrieve a s.tr.u,ct .slli gabc dt~don reCPredsentinllg
:;; t e Isposltlon rom signal ( ) ,an restore It, It S Stl a a I ea. 0 e WI
I be easier to read, write, and understand if you use one API or the other,
I exclusively.
10.6.S Retrieving Pending Signals: s igpending ( )

The sigpending () system call, described earlier, lets you retrieve the set of signals
that are pending, that is, those that have come in, but are not yet delivered because they
were blocked:
# i n clude <signal . h> POSIX
int sigpend ing(sigs e t _t *set ) ;
Besides unblocking the pending signals so that they get delivered, you may choose
to ignore them. Setting the action for a pending signal to SIG_IGN causes the pending
signal to be discarded (even if it was blocked). Similarly, for those signals for which the
default action is to ignore the signal, setting the action to SIG_DFL causes such a
pending signal to also be discarded.
10.6.6 Making Functions Interruptible: s iginterrupt ( )

As a convenience, the s i g int errupt () function can be used to make functions
interruptible for a particular signal or to make them restartable, depending on the value
of the second argument. The declaration is:
#include <signal.h> XSI
int siginterrupt(int sig, int flag ) ;
According to the POSIX standard, the behavior of sigi nter r upt () is equivalent
to the following code:
int siginterrupt(int sig, int f l ag )
int ret;
struct sigaction act;
(v o id ) sigaction(sig, NULL, &act); Retrieve old setting
if ( flag ) If (lag is true ..

act . sa_flags &= -SA_RESTART; Disable restarting
else Otherwise ..
act .sa_flags 1= SA_ RESTART; Enable restarting
ret = sigaction(sig, &act, NULL ) ; Put new setting in place

return ret; Return result
The return value is 0 on success or - 1 on error.
10.6.7 Sending Signals: ki 11 () and ki Ilpg ( )

The traditional Unix function for sending a signal is named kill (). The name is
something of a misnomer; all it does is send a signal. (Often, the result is that the signal's
recipient dies, but that need not be true. However, it's way too late now to change the
name.) The killpg () function sends a signal to a specific process group. The declara-
tlons are:
#include <sys / types . h> POSIX
#include <signal . h>
int kill (pid_t pid, int sig);

int killpg ( int pgrp, int sig); XSI
The sig argument is either a signal name or o. In the latter case, no signal is sent,
but the kernel still performs error checking. In particular, this is the correct way to
verify that a given process or process group exists, as well as to verify that you have
10.6 POS IX Signals 377
perm ission to send signals to the process or p ro cess grou p. ki 11 () returns 0 o n success
an d - 1 on error; er r no then indicates the problem.
T he rules fo r the pi d value are a bit co mplicated:
pid > 0 pid is a ptocess number, and th e signal is sent to that process.
pid = 0 The signal is sent to every process in the sending process's process group .
pid = - 1 T he signal is se nt to every pro cess on the sys tem except for any special
sys tem processes. Permission checking still applies. On G NU/Linux sys-
tem s, o nly the in i t process (PID 1) is excluded , but other sys tem s may
have other special p rocesses.
p i d < -1 T h e signal is sent to the process group represented by the absolute value
of p id. Thus, yo u can send a signal to an entire process gro up , duplicating
ki 11pg ( )'s fu nctionali ty. T h is nonorthogo n ali ty provides historical
co mpatibility.
T he meanings of pid fo r ki 11 () are similar to those of wai t pid () (see Sec-
tio n 9. l.6 . 1, "Using POSIX Functions: wa i t ( ) and wai tpid()," page 306).
The Standard C function ra is e () is essentially equivalent to
int raise (int sig}
ret urn kill( getp id(}, s i g} ;
T he C standards committee chose the nam e rai se () because C also has to work in
no n-U nix enviro nments, and kill () was considered specific to U nix. It was also a
goo d opportunity to use a more descriptive name for the func tion .
k i11pg () se nds a signal to a process gro up. As long as the pgrp val ue is greater than
1, it's equivalent to 'ki ll (-pgrp , s ig)'. T he G NU/Li nux killpg(2) manpage states
that if pgrp is 0 , the signal is sent to the sending processes's process gro up. (This is the
same as kill ( ) .)
As yo u might imagine, you cannot send signals to arbi trary processes (unless you are
the superuser, r oot). For ordinary users, the real or effective U ID of the sendin g process
must match the real or saved set-user-ID of the receiving process . (The different UIDs
are described in Section 11.1.1 , "Real and Effective IDs," page 405.)
H owever, SI GCONT is a special case: As lon g as the receiving p rocess is a member of
the same session as the sender, the signal w ill go through. (Sessions were described
briefly in Section 9.2.1, "Job Control Overview," page 312.) This special rule allows a
job control shell to continue a stopped descendant process, even if that stopped process
is running with a different user ID.
10.6.8 Our Story So Far, Episode II

The System V Release 3 API was intended to remedy the various problems presented
by the original V7 signal APIs. The notion of signal blocking, in particular, is an impor-
tant additional concept.
However, those APIs didn't go far enough, since they worked on only one signal at
a time, leaving wide open plenty of windows through which undesired signals could
arrive. The POSIX APIs, by working atomically on multiple signals (the process signal
mask, represented programmatically by the sigset_t type), solves this problem, closing
the windows.
The first set of functions we examined manipulate sigset_t values: sigfillset ( ),
si gemptyset (), sigaddset (), sigdelset (), and sigi s member ().
The next set works with the process signal mask: sigprocmask () sets and retrieves
the process signal mask. sigpending () retrieves the set of pending signals, and
s igsuspend () puts a process to sleep, temporarily replacing the process signal mask
with the one in its parameter.
The POSIX sigaction () API is (severely) complicated by the need to supply
• Backward-compatible behavior: SA_RE SETHAND and SA_RESTART in the sa_flags

field.
• A choice as to whether or not the received signal is also blocked: SA_NODEF ER for
sa_flags.
• The ability to have two different kinds of signal handlers: one-argument or
three-argument.
• A choice of behaviors for managing SIGCHLD: SA_NOCLDSTOP and SA_ NOCLDWAIT
for sa_flags .
The siginterrupt () function is a convenience API for enabling or disabling

restartable system calls for a given signal.
10.7 Signals for [ncerprocess Co mmunicarion 3 79
Finally, ki ll () and ki llpg () can be used to send signals, not just to the current
process but to o ther processes as well (permissio ns permitting, of co urse).
10.7 Signals for Interprocess Communication

" T H IS IS A TE RRIBLE ID EA! SIGNALS ARE NOT M EANT
FOR THIS! Ju st say NO ."
-Geoff Collyer-
O ne of the pri m ary mechanism s for interpro cess communicatio n (IPC) is the pipe,
which is described in Sectio n 9.3, "Basic Interprocess Com munication: Pipes and FI-
FOs, " page 315. It is possible to use signals fo r very simple IPC as welL) D oin g so is
rather clumsy; th e recipient can only tell th at a particular signal came in. While the
sigac t i on ( ) A PI does allow the recipient to learn the PID and owner of the process
that sent the sign al, such info rmation usually is n ' t terrib ly helpful.
I NOTE As the open ing qu o te indicates, using signa ls for IPC is a lmost al ways
~\~ a bad idea . W e recomm end avoidin g it if possibl e. But our goal is to teach yo u
i.m
.
" how to use t he Linu x/ Uni x facilit ies , incl uding t heir negat ive points , leaving it
~ to you to make an informed decision a bo ut what to use.
Signals as IPC m ay sometimes be the only choice for many programs . In particular,
pipes are not an option if two co mmunicating p rograms were not started by a commo n
parent, and FIFO files may no t be an optio n if one of the communicating programs
only wo rks with standard inp u t and output. (One instance in which signals are com-
monly used is w ith certain system daemon programs, such as x inetd, which accepts
several signals advising that it sh o uld reread its control file , d o a consistency check, and
so o n . See xinetd (8) on a GNU /Lin ux sys tem , and inetd( 8) o n a Un ix system. )
T he typical h igh-level structure of a signal-based application looks like this:
fo r (; ; ) {
Wait for signal
Process signal
T he original V7 interface to wait for a signal is pause ( ) :
5 O ur rhanks to U lrich D repper fo r h el ping us understand the issu es in vo lved .

#i nclude <unis td . h> POSIX
in t pause (v o i d) ;
pause () suspends a process; it only returns after both a signal has been delivered
and the signal handler has returned. pause ( ) , by definition , is only useful with caught
signals-ignored signals are ignored when they come in, and signals with a default action
that terminates the process (with or without a core file) still do so.
The problem with the high-level application structure just described is the Proces s
signal part. When that code is running, you don't want to have to handle another
signal; yo u want to finish processing the current signal before going on to the next one.
One solution is to structure the signal handler to set a flag and check for that flag
within the main loop:
vo lat ile sig_at om i c _t signa l_wa iting 0; / * true i f und ealt - wit h signals * /
v oid handler ( i nt sig )

{
s ignal_wai t ing = 1;
Set up any other data indicating which signal
}
In the mainline code, check the flag:

f or ( ;; ) {
if ( ! signa l_wai ting) ( If another signal carne in
pause () ; This code is skipped
signa l _ waiting = 1 ;
Determine which signal carne in

s ignal _ wait ing = 0;
Process the signal
Unfortunately, this code is rife wi th race conditions:

f or ( ;; ) {
if ( ! signal _ wait ing ) {
<- - - - - - - - - - - - - - - - - - - - - - - Signal could arrive here, after condition checked!
pause ( ) ; pause() would be called anyway
signa l_waiting = 1;
Determine which signal carne in <- --- A signal here could overwrite global data
s i gnal_wait i n g = 0;
Process the signal <--- - Same here, especially if multiple signals
10.7 Signals for Imerprocess Communicarion 381
The solution is to keep the signal of interest blocked at all times, except when waiting
for it to arrive. For example, suppose S IGINT is the signal of interest:
void handler ( int sig)
{
/ * sig is automatically blocked with sigaction () */
Set any global data about this signal
int main ( int argc, char **argv )
sigset_t set;
struct sigacti o n act ;
... usual setup, process options, etc.
sigemptyset (& set ) ; Initialize set to empty

sigaddset(& s e t, SIGINT ) ; Add SICINT to set
sigprocmask (SIG_BLOCK, & set, NULL ) ; Block it
acc .s a_mask = set ; Set up handler

act . sa_handler = handler ;
act . sa_flags = 0;
sigacti o n ( sig, & act, NULL ) ; Install it
Possibly install separate handlers

For other signals
sigemptyset(& set); Reset to empty, allows SICINT to arrive
f or ( ;; ) {
sigsuspend (& set ) ; Wait for SICINT to arrive
Process signa I SICINT is again blocked here
any other code.

return 0 ;
The key to this working is that sigsuspend () temporarily replaces the process signal
mask with the one passed in as its argument. This allows SIGINT to arrive. Once it
does, it's handled; the signal handler returns and then s igsuspend () returns as well.
By the time sigsuspend () returns, the original process signal mask is back in place.
You can easily extend this paradigm to multiple signals by blocking all signals of in-
terest during ma in () and during the signal handlers, and unblocking them only in the
call to sigsuspend().
Given all this, you should not use pause () in new code. pa u s e () is standardized
by POSIX primarily to support old code. The same is true of the System V Release 3
sigpause () . Rather, if you need to structure your application to use signals for IPe,
use the s i gsus p end () and sigaction () APIs exclusively.
~I NOTE The exa mp le code ab ove presumes t hat the process signal mask starts
out empty. Pro du ction code s hould instead work with whatever signal mask is
I
J!1
in place when the program starts .
10 .8 Important Special-Purpose Signals

Several signals serve special purposes. We describe the most importam ones here.
10.8.1 Alarm Clocks: sleep ( ) , alarm ( ) , and SIGALRM

It is often necessary to write programs of the form
while ( some condition isn't true) {
wait for a while
This need comes up frequently in shell scripting, for example, to wait umi! a partic-
ular user has logged in:
until who I grep ' Aarnold ' > / dev / null
do
s l eep 10
done
Two mechanisms, one lower level and one higher level, let a running process know
when a given number of seconds have passed.
10.8 .1 .1 Harder but with More Control : alarm () and SIGALRM

The most basic bu ilding block is the a l arm() system call:
unsigned int alarm(unsigned int seconds ) ;
After al a rm () returns, the progran1 keeps running. However, when seconds seconds
have elapsed, the kernel sends a S I GALRM to the process. The default action is to
10.8 Importam Special-Purpose Signals 383
terminate the process, but mos t likely, yo u w ill instead have installed a signal handler
for S IGALRM.
T he return value is either 0, or if a previous alarm had been set, the number of seco nds
remaining before it would have gone off. However, there is only one such alarm for a
process; the previous alarm is canceled and the new one is put in place.
The advantage here is that wi th your own handler in place, you can do anything yo u
wish when the signal comes in. The disadvantage is that you have to be prepared to
work in multiple co ntexts: that of the mainline program and that of the signal h andler.
10 .8 . 1.2 Simple and Easy: sl eep ()

An easier way to wait a fixed amount of time is with sleep ( ) :
#include <unis td . h> POSIX
u ns igned i nt s leep (uns i g ned int seco nds ) ;
°
The return value is if the process slept for the full amount of time. Otherwise, the
return value is the remaining time left to sleep. T his latter return value can occur if a
signal came in while the process was napping.
I
@
NOTE The sleep () function is often implemented with a combination of
,.IL1.;, signa l ( ) , alarm (), and pause () . This approach makes it dangerous to mix
j~ sleep () with your own calls to alarm () (or the seti timer () advanced
I function , described in Section 14.3 .3 , "I nterval Timers: seti time r () and
II getitimer () ," page 546 ). To learn about the nanosleep () function now,
see Section 14.3.4, "More Exact Pauses: n anos leep () ," page 550) .
1 0.8.2 Job Control Signals

Several signals are used to implement job control-the ability to start and sto p jobs,
and move them to and from the backgro und and foreground . At the user level, yo u have
undoubtedly done this: using CTRL-Z to stop a job, bg to put it in the background,
and occasionally using fg to move a background or stopped job in to the foreground.
Section 9.2.1, "Job Control Overview," page 312, describes generally how job control
works. This section completes the overview by describing the job control signals, since
yo u may occasionally wish to catch them directly:
SIGTSTP
This signal effects a "terminal stop." It is the signal the kernel sends to the process
when the user at the terminal (or window emulating a terminal) types a particular
key. Normally, this is CTRL-Z, just as CTRL-C normally sends a SI GI NT.
The default action for S IGTSTP is to stop (suspend) the process. However, you
can catch this signal, just like any other. It is a good idea to do so if your program
changes the state of the terminal. For example, consider the vi or Emacs screen
editors, which put the terminal into character-at-a-time mode. Upon receipt of
S IGTSTP, they should restore the terminal to its normalline-at-a-time mode, and
then suspend themselves .
SIGSTOP
This signal also stops a process, but i[ cannot be caught, blocked, or ignored. It
can be used manually (with the k ill command) as a last resort, or programmati-
cally. For example, the SIGTSTP handler just discussed, after restoring the terminal's
state, could then use 'raise (SIG STOP)' to stop the process.
SIGTTIN,SIGTTOU
These signals were defined earlier as "background read from tty" and "background
write to tty. " A tty is a terminal device. On job control systems, processes running
in the background are blocked from reading from or writing to the terminal. When
a process attempts either operation, the kernel sends it the appropriate signal. For
both of them, the default action is to stop the process. You may catch these signals
if you wish, but there is rarely a reason to do so.
SIGCONT
This signal continues a stopped process. It is ignored if the process is not stopped.
You can catch it if you wish, but again, for most programs, there's little reason to
do so . Continuing our example, the S IGCONT handler for a screen editor should
put the terminal back into character-at-a-time mode before returning.
When a process is stopped, any other signals sent to it become pending. The exception
to this is SIGKILL, which is always delivered to the process and which cannot be caught,
blocked, or ignored. Assuming that signals besides SIGKILL have been sent, upon receipt
of a SIGCONT, the pending signals are delivered and the process then continues execution
after they've been handled.
10.8 Imponam Special-Purpose Signals 385
10.8.3 Parental Supervision: Three Different Strategies

As described in Section 9. l.1, "Creating a Process: fork () ," page 284, one side effect
of calling f ork () is the creation of parent-child relationships among processes. A parent
process can wait for one or more of its children ro die and recover the child's exit status
by one of the wai t () family of sys tem calls.
Dead child processes that haven' t been waited for are termed zombies. Normally,
every time a child process dies, the kernel sends a SIGCHLD signal to the parent process. 6
The default action is to igno re this signal. In this case, zo mbie processes accrue until
the parent does a wai t () or until the parent itself dies . In the latter case, the zombie
children are reparented to the ini t system process (PID 1), which reaps them as part
of its normal work. Similarly, active children are also reparented to ini t and will be
reaped when they exit.
SIGCHLD is used for more than death-of-children notification . Any time a child is
stopped (by one of the job control signals discussed earlier), SIGC HLD is also sent to the
parent. T he POSIX standard indicates that SIGCHLD "may be sent" when a child is
continued as well; apparently there are differences among historical Unix systems.
A combination of Bags for the sa_ fl ags field in the st ruct sigaction, and the
use of SIG_IGN as the action for SIGCHLD allows yo u to change the way the kernel deals
with children stopping, continuing, or dying.
As with signals in general, the interfaces and mechanisms described here are co mpli-
cated because they h ave evolved over time.
10 .8.3.1 Poor Parenting: Ignoring Children Completely

The simplest thing you can do is to change the action for SIGC HLD to SIG_IGN. In
this case, children that terminate do not become zombies. Instead, their exi t status is
thrown away, and they are removed from the sys tem entirely. Another option that
produces the sam e effect is use of the SA_ NOCLDWAIT Bag. In code:
6 Hisrorically, BSD systems used (he nam e SIGCHLD, and th is is what POS IX uses. System V had a simil ar signal
named SIGCLD . GN U/Linux #defines (he laner ro be the form er-see Table 10.1.
/* Old s tyle : * / / * New sty le : * /

struct sigac tion sa;
sa . sa_handler = SIG_I GN ;
signal (S I GCHLD, SIG_ IGN ) ; sa. s a_ f l a gs = SA_ NOCLDWAI T;
sige mp t ys et (& sa . s a_ma sk);
siga ctio n (SIGCHLD, & sa, NULL);
10.8 .3.2 Permissive Parenting: Supervising Minimally

Alternatively, yo u may only care about child termination and not be interested in
simple state changes (stopped, and continued) . In this case, use the SA_ NOCLDSTOP
flag, and set up a signal handler that calls wai t ( ) (or one of its siblings) to reap
the process.
In general, you cannot expect to get one SIGCHLD per child that dies. You should
treat S I GCHLD as meaning "at least one child has died" and be prepared to reap as many
children as possible whenever you process S I GCHLD .
T he following program, c hlO - r eapl. c , blocks SIGCHLD until it's ready to recover
the children.
1 / * ch10-rea p1 . c demo nstra te SIGCHL D management , u si n g a loop * /
2
3 #inc lude <st di o . h>
4 #i n c lude <errn o . h>
5 #inc lude <s igna l . h>
6 # i ncl ude <stri ng . h >
7 #inc lude <sy s / types . h>
8 #inc lude <sys/ wa it . h>
9
10 #def ine MAX- KIDS 42
11 #define NOT - US ED -1
12
13 pid_t ki ds[ MAX_K IDS] ;
14 siz e _ t nkids = 0;
T he kids array tracks the process IDs of children processes. If an elem ent is
NOT_US ED, then it doesn't represent an unreaped child. (Lines 89-90, below, initialize
it.) nkids indicates how many values in kids should be checked.
10.8 Imponam Special-Purpose Signals 387
76 / * forma t_num - -- helper function since can·t use [sf] print f() */
77
78 const char *format_num(int num)
79
20 #de fin e NUMSIZ 30
27 sta tic cha r b uf[NUMSIZ] ;
22 in t i;
23
24 if (num <= 0) (
25 strcpy(bu f, "0") ;
26 return buf;
27
28
29 i = NUMSIZ - 1;
30 buf[i--] = '\0';
37
32 /* Generat e digits bac kwards into st ring. */
33 do
34 bu f [i--] (num % 10) + '0' ;
35 num /= 10;
36 whil e (num > 0) ;
37
38 return & buf[i+1] ;
39
Because signal handlers should not call any member of the printf () family, we
provide a simple "helper" function, forma t_num ( ) , to turn a decimal signal or PID
number into a string. This is primitive, bur it works .
47 /* ch ildhandler -- - catch SIGCHLD, reap all ava ilable chi ldren * /
42
43 void childhandler(int sig )
44 (
45 int s tatus, ret ;
46 int i ;
47 c har buf[1 00 ];
48 stati c const cha r ent ered[] = "Enter ed ch ildhandler\n" ;
49 stati c const char exited[] = "Ex ited childhandler \n";
50
57 wr ite ( l, entered , s trlen(entered) ) ;
52 for ( i = 0 ; i < nkids; i++) {
53 i f (kids[i] = = NOT_USED)
54 continue;
55
56 retry :
57 if «ret = wa i tpid (ki ds[i], & s ta tus, WNOHANG )) kids [i]) {
58 strcpy(buf, "\treaped process " ) ;
59 strcat(bu f , format_num(ret));
60 strcat( buf, " \n");
61 write(l, buf, strlen(buf));
62 kids[i] = NOT_USED;
63 else if ( ret == 0) {
64 strcpy (bu f, " \ tpid " ) ;
65 strcat(buf, format_num(kids[i]));
66 strcat (bu f , " not a v a i lable yet\n " ) ;
67 write(l, buf, strlen(bu f));
68 else if ( ret == -1 && er r no == EINTR)
69 write(l, " \ tretrying \ n", 1 0);
70 got o retry;
71 else (
72 strcpy(buf , " \twaitpid ( ) failed : " ) ;
73 st r cat (buf, strerror( errno)) ;
74 st r cat (buf, "\n" ) ;
75 wr ite (l, buf, strlen (bu f )) ;
76
77
78 write(l , e x ited, strlen(exi ted));
79
Lines 51 and 58 print "entered" and "exited" messages, so that we can dearly see
when the signal handler is invoked. Other messages start with a leading TAB character.
The main part of the signal handler is a large loop, lines 52-77. Lines 53-54 check
for NOT_USED and continue the loop if the current slot isn't in use.
Line 57 calls wai tp id () on the PID in the current element ofkids . We supply the
WNOHANG option, which causes wai tpid () to return immediately if the requested child
isn't available. This call is necessary since it's possible that not all of the children
have exited.
Based on the return value, the code takes the appropriate action . Lines 57-62 handle
the case in which the child is found, by printing a message and marking the appropriate
slot in kids as NOT_USED .
Lines 63-67 handle the case in which the requested child is not available. The return
value is 0 in this case, so we print a message and keep going.
10.8 Important Special-Purpose Signals 389
Lines 68-70 handle the case in which the system call was interrupted. In this case,
a ga t a back to the wai tpid () call is the cleanest way to handle things. (Since main ( )
causes all signals to be blocked when the signal handler runs [line 96], this interruption
shouldn't happen. But this example shows you how to deal with all the cases.)
Lines 71-76 handle any other error, printing an appropriate error message.
81 /* main --- set up child-related information and signals , create children * /
82
83 int main(int argc, char **argv)
84
85 struct sigaction sa;
86 sigset_t childset, emptyset ;
87 int i ;
88
89 for (i = 0 ; i < nkids ; i++)
90 kids[i] NOT_USED ;
91
92 sigemptyset(& emptyset);
93
94 sa . sa_flags = SA_NOCLDSTOP ;
95 sa . sa_handler = childhandler ;
96 sigfillset(& sa . sa_mask) ; / * block everything when handler runs * /
97 sigaction(SIGCHLD , & sa , NULL) ;
98
99 sigemptyset(& childset);
100 sigaddset(& childset, SIGCHLD);
101
102 sigprocmas k(SIG_SETMASK, & childset, NULL); /* block it in main code * /
103
104 for (nkids = 0 ; nkids < S; nkids++) {
105 if ((kids[nkids] = fork ()) == 0) {
106 sleep(3) ;
107 _exit (0) ;
108
109
110
111 sleep(S) ; / * give the kids a chance to terminate * /
112
11 3 printf('waiting for signal \ n' ) ;
114 sigsuspend(& emptyset);
115
116 return 0 ;
117
Lines 89-90 initialize kids. Line 92 initializes emptyset . Lines 94-97 set up and
install the signal handler for SI GC HLD. Note the use of SA_NOC LDSTOP on line 94, while
line 96 blocks all signals when the handler is running.
Lines 99-100 create a signal set representing just SIGCHLD, and line 102 installs it
as the process signal mask for the program.
Lines 104- 109 create five child processes, each of which sleeps for three seconds.
Along the way, it updates the kids array and nkids variable.
Line 111 then gives the children a chance to terminate by sleeping longer than they
did. (This doesn't guarantee that the children will terminate, but the chances are
pretty good. )
Finally, lines 113- 114 print a message and then pause, replacing the process signal
mask that blocks S IGCHLD with an empty one. This allows the S I GCH LD signal to come
through, in turn causing the signal handler to run. Here's what happens:
$ chlO-reapl Run the program
waiting for si gnal
En ter ed childhandl e r
reaped pr o c ess 23937
reaped process 23938
reaped process 23939
reaped pr o c ess 23940
r eap e d proc ess 23 94 1
Exited childhandler
The signal handler reaps all of the children in one go.

The following program, c hl O- reap 2 . c is similar to c hlO -reapl . c . The difference
is that it allows SIGCHLD to arrive at any time. This behavior increases the chance of
receiving more than one SIGCHLD but does not guarantee it. As a result, the signal
handler still has to be prepared to reap multiple children in a loop.
1 / * ch10-reap2 . c --- demon stra te SIGC HLD management , one signal p e r child * /
2
... unchanged code omitted .. .
12
13 pid_t kids [MAX_KIDS] ;
14 size_t n k ids = 0 ;
15 size_t k ids left = 0 ; /* «< Ad ded */
16
... unchanged code for format_num() omitted ...
41
42 /* chi l dhandler --- ca tch SIGCHLD, reap all ava ilable children * /
43
10.8 Imporrant Special-Purpose Signals 391
44 void childhandler(int sig)

45 (
46 int status, ret;
47 int i;
48 char buf[lOO];
49 static const char entered[] = "Entered childhandler\n ";
50 static const cha r e xi ted[] = "Exited childhandler\n";
51
52 write(i, entered, strlen(entered)) ;
53 fo r (i = 0; i < nkids; i++) {
54 if (kids[i] == NOT_USED)
55 continue ;
56
57 retry:
58 if {( ret = waitpid(kids[i], & status, ~INOHANG)) kids [i]) {
59 strcpy (bu f, " \treapedproce ss ") ;
60 strcat (buf , format_num(ret)) ;
61 strcat (buf , "\n") ;
62 write(i , buf, strlen (buf));
63 kids[i ] = NOT_USED;
64 kidsleft-- ; / * «< Added * /
65 else if (ret == 0) (
... unchanged code omitted.
80 write(i, exited, strlen(exited)) ;
81
This is identical to the previous version, except we have a new variable , kids left,
indicating how many unreaped children there are. Lines 15 and 64 Hag the new code.
83 / * main --- set up child-related information and s i gnals, create children * /
84
85 int main(int argc , char ** argv)
86
... unchanged code omitted.
100
10 1 s i gemptyset(& childset) ;
102 sigaddset(& childs et , SIGCHLD ) ;
103
104 / * sigprocmask(SIG_SETMASK, & childset, NULL ) ; / * block it in main code * /
105
106 for (nkids = 0 ; nkids < 5; nkids++) {
107 if {(kids [nkids] = fork()) == 0) (
108 sleep (3);
109 _exit(O ) ;
110
111 kidsleft++; / * < < < Added * /
112
113
114 /* sleep(5); / * give the kids a chance to terminate * /
11 5
11 6 while (kidsleft > 0) { / * «< Added * /

117 printf ("wai ting f or signals\n");
11 8 sigsuspend ( & ernptyset);
119 / * «< Added * /
120
12 1 return 0;
122
Here too, the code is almost identical. Lines 104 and 11 4 are commented out from
the earlier version, and lines Ill, 116, and 119 were added. Surprisingly, when run,
the behavior varies by kernel version!
$ uname -a Display system version
Linux exarnple1 2 . 4 . 20-8 #1 Thu Mar 13 17 : 54:28 EST 2 003 i686 i686 i386 GNU/ L inux
$ chlO-reap2 Run the program
waiting fo r s ignals
Entered c hi l d handler Reap one child
reaped process 2702
pid 2703 not available yet
pid 270 5 not available yet
pid 2706 not avai lable y et
Exited childhandl e r
waiting for signals
Entered childhandler And the next
reaped process 2703
pid 270 4 not available yet
pid 2705 not available y et
pid 2706 n ot available y et
Exited childhandler
waiting for signals
Entered childhandler A nd so on
reaped process 2704
pid 2706 n ot available y et
Exit e d childhandler
waiting for signals
Ente r ed childhandler
reaped process 2705
pid 2706 n ot availabl e y et
Ex ited childhan dl e r
wa iting for signa l s
Entered childhandler
reaped process 2706
Exited childhandler
In this example, exactly one SIGCHLD is delivered per child process! While this is lovely,
and completely reproducible on this system, it's also unusual. On both an earlier and
a later kernel and on Solaris, the program receives one signal for more than one child:
10.8 Imponanr Special-Purpose Signals 393
$ uname -a Display system version

Linux example2 2 .4.2 2-1 . 2115 . nptl #1 Wed Oct 29 15:42 : 51 ES T
2003 i686 i68 6 i386 GNU/L i nux
$ chlO-reap2 Run the program
wai ting f or s ignal s
Ent ered c hildh a ndl er Signa l handler only called once
reaped p r o cess 9564
reaped proces s 9565
reaped p r o cess 9566
reaped pro cess 9567
r e a ped process 9568
Exited chi ldhandl er
i ~~TE Th~ c~de

!.·.!.•.r.:.,
:or clh IO -lr;:~21 '2c has one important ~~-a;ace con~ition .
a e anot er 00 at Ines - In chlO -reap2 . c. at appens I a
[I SIGCHLD comes in while this code is running? It's possible for the kids array
I
, and nkids and kidsl e ft variables to become corrupted: The main code adds
@
in a new process, but the signal handler takes one away.
This piece of code is an excelle nt example of a critical section ; it must run
I uninterrupted. The correct way to manage this code is to bracket it with calls
i that first block, and then unblock, SIGCHLD .
10.8. 3.3 Strict Parental Control

The siginfo_t structure and three argument signal catcher make it possible to learn
what happened to a child. For SIGC HL D, the si_code field of the siginfo_t indicates
the reason the signal was sent (child stopped, continued, exited, etc.). Table 10.5 presents
the full list of values. All of these are defined as an XSI extensio n in the POSIX standard.
The following program, chIO -statu s . c, demonstrates the use of the sigin fo_t
structure.
1 / * ch1 0 - status . c demons trate SrGCHLD management , u se 3 argumen t handler * /
2
3 #include <std i o . h>
4 #include <e rrno . h>
5 #include <signal . h>
6 # include <st ring . h>
7 # include <sys / types . h >
8 #include <sys / wait . h>
9
10 v oid manage (sigi n fo _t *si);
11
... unchanged code for format_num() omitted ..
TABLE 10.5
XSI si code values for SIGCHLD
Value Meaning
CLD_CONT l NUED A stopped child has been continued.
CLD_DUMPED C hild terminated abnormally and dumped core.
CLD_EXITED C hild exited normal ly.
CLD_ KI LLED C hild was killed by a signal.
CLD_STO PPED The child process was stopped.
A child being traced has stopped. (This condition occurs if a program is
being rraced--either from a debugger or for real-time monito ring. In any
case, you're not likely to see it in run-of-the- mill situations.)
Lines 3- 8 include standard header files , line 10 declares manage ( ) , which deals with
the child's status changes, and the fo rma t_num () function is unchanged from before.
37 / * childhandler --- catch SIGCHL D, r eap j ust one c h ild * /
38
39 void childhandler(int sig, siginfo_ t 'si, void ' context )
40
41 int status, ret;
42 int i ;
43 c har buf [ 100] ;
44 static const char entered[] = "Entered ch i ldhandler\n" ;
45 stat i c cons t char exited [] = "Exited childha ndler \n" ;
46
47 writ e (l, entered, str len(ent ered ));
48 retry :
49 if (( r et = waitpid(s i->si-p i d , & status , WNOHANG}) si- >si-pid} {
50 s trcpy (buf, " \ t reaped proc e ss ");
51 strcat (buf, format _num( si -> si-pid }};
52 strca t (buf, "\n" ) ;
53 write (l, buf, s trlen(buf}};
54 manage (si) ; /* deal wi th what h appened to it * /
55 el s e if (ret> 0) {
56 strcpy (buf, " \ tr eaped u nexpec ted pid " ) ;
57 strcat(buf, format_num(ret} ) ;
58 strcat( but, " \n " } ;
59 write( l , buf, strlen(buf}};
60 goto retry ; / * why not? */
61 else if ( re t == D) {
62 strcpy (bu f, " \ tp i d ") ;
63 strcat(buf, format_num(si->si-pid}};
64 strcat (but, " changed status\n") ;
65 write( l , buf, strlen(buf} } ;
66 manage ( s i ) ; /* deal with what happened t o it * /
10.8 Importanr Special-Purpose Signa ls 395
67 else if ( ret == -1 && errno == EINTR )

68 write(1, " \ tretrying \ n", 1 0) ;
69 go t o retry;
70 else {
71 strcpy(buf, " \ twaitpid () failed : " ) ;
72 strcat(buf, strerror (errno)) ;
73 strcat(buf, " \ n" ) ;
74 write(l, buf, strlen (buf )) ;
75
76
77 write ( l, exited, str l en ( exited )) ;
78 }
The signal handler is similar to those shown earlier. Note the argument list (line 39) ,
and that there is no loop.
Lines 49-54 handle process termlnatlon, including calling manage () to pnnt
the status.
Lines 55-60 handle the case of an unexpected child dying. This case shouldn ' t hap-
pen, since this signal handler is passed information specific to a particular child process .
Lines 61-66 are what interest us: The return value is 0 for status changes. manage ( )
deals with the details (line 66).
Lines 67-69 handle interrupts, and lines 70-75 deal with errors.
80 1* child - -- what to do in the child * 1
81
82 void child(void)
83 {
84 raise(SIGCONT) ; 1* should be ignored * 1
85 raise (SIGSTOP) ; 1* go to sleep, parent wakes us back up *1
86 printf("\t---> child restarted <--- \ n" ) ;
87 exit (42 ) ; 1* normal exit, let parent get value * 1
88
The chi ld () function handles the child's behavior, taking actions of the so rt to
cause the parent to be notified. ? Line 84 sends SIGCONT, which might cause the parent
to get a CLD_CONTINUED event. Line 85 sends a SIGSTOP, which stops the process (the
signal is uncatchable) and causes a CLD_STOP PED event for the parent. Once the parent
restarts the child, the child prints a message to show it's active again and then exits with
a distinguished exit status.
7 Perh aps child_at_school ( ) would be a ben er fu nction name.

90 /* main --- set up child-related inf ormat i on and signal s, c reate child * /
91
92 int main( i nt argc, char **argv )
93
94 pid_t kid;
95 struct si gaction sa;
96 sigset_t childset, emptyset;
97
98 sigemptyset (& emptyset);
99
100 sa. sa_flags = SA_SIGINFO ;
101 sa.sa_sigaction = childhandler;
102 sigfillset (& sa . sa_mas k ) ; /* block everything when handler runs * /
103 sigaction(SIGCHLD, & sa, NULL);
104
105 sigemptyset(& childset ) ;
106 sigaddset(& childset, SIGCHLD ) ;
107
108 sigprocmask(SIG _ SET~ffiSK, & childset, NULL); /* block it in main code * /
109
110 if ((kid = fork()) == 0)
111 child() ;
112
113 /* parent execut es here */
114 for (;;) (
115 printf("wait ing for signals\n " ) ;
116 sigsuspend(& e mptyset ) ;
117
118
119 return 0;
120 }
The main () program sets everything up . Lines 100-103 pur the handler in place.
Line 100 sets the SA_SIGINFO flag so that the three-argument handler is used. Lines
105-108 block SIGCHLD .
Line 110 creates the child process. Lines 113-117 contmue m the parent, usmg
sigsuspend () to wait for signals to come in.
123 / * manage --- deal with different thing s that could happen to child * /
124
125 void manage(siginro_t *si )
126
127 char buf [100 J ;
128
129 switch (si->si_code)
130 case CLD_STOPPED:
13 1 write (l, "\t child stopped, restarting\n " , 27);
132 kill (si->s i-pid , SIGCONT);
133 break;
134
10.8 Imponanr Special-Purpose Signals 397
135 case CLD_CONTINUED : /* not sent on Linux * /

136 write(l, "\tchild continued\n ", 17);
137 break;
138
139 case CLD_EXITED :
140 strcpy(buf. "\tchild exited with status " );
141 strcat (buf , format_num(si->si_status)) ;
142 strcat(buf, " \n " ) ;
143 write(l, buf, strlen(buf)) ;
144 exit(O); /* we're done */
145 break ;
146
147 case CLD_DUMPED :
148 write(l, "\tchild dumped\n " , 14) ;
149 break ;
150
151 case CLD_KILLED :
152 write(l, "\tchild killed\n ", 14);
153 break ;
154
155 case CLD_TRAPPED :
156 write(l, " \tchild trapped\n " , 15) ;
157 break;
158
159
Through the manage () function, the parent deals with the status change in the child.
manage () is called when the status changes and when the child has exited.
Lines 130-133 handle the case in which the child stopped; the parent restarts the
child by sending SIGC ONT .
Lines 135-137 print a notification that the child continued. This event doesn ' t
happen on GNU/Linux systems, and the POSIX standard uses wishy-washy language
about it, merely saying that this event can occur, not that it wiLl.
Lines 139-145 handle the case in which the child exits, printing the exit status. For
this program, the parent is done too, so the code exits, although in a larger program,
that's not the right action to take.
The other cases are more specialized. In the event of CLD_KILLED, the status value
filled in by wai tpid () would be useful in determining more details.
Here is what happens when it runs:
$ chlO-status Run the program

waiting f o r signals
Entered ch i ldhandler Signal handler entered
pid 24279 changed status
ch i ld stopped, restarting Handler takes action
Exited childhandler
waiting fo r s ignals
---> child restarted <--- From the child
Entered c h i ldhandler
reaped p rocess 2 4 279 Parent's handler reaps child
child exi ted with status 42
Unfortunately, because there is no way to guarantee the delivery of one SrGCHLD

per process, your program has to be prepared to recover multiple children at one shot.
10.9 Signals Across fork () and exe c ( )

When a program calls f ork ( ) , the signal situation in the child is almost identical
to that of the parent. Installed handlers remain in place, blocked signals remain blocked,
and so on. However, any signals pending for the parent are cleared for the child, includ-
ing time left as set by alarm ( ) . This is straightforward, and it makes sense.
When a process calls one of the exec ( ) functions , the disposition in the new program
is as follows:
• Signals set to their default action stay set to their default.
• Any caught signals are reset to their default action.
• Signals that are ignored stay ignored. SrGCHLD is a special case. If Sr GC HLD is ig-
nored before the exe c ( ) , it may stay ignored after it. Alternatively, it may be reset
to the default action. What actually happens is purposely unspecified by POSIX.
(The GNU/Linux manpages don't state what Linux does, and because POSIX
leaves it as unspecified, any code you write that uses Sr GCHLD should be prepared
to handle either case.)
• Signals that are blocked before the exec () remain blocked after it. In other words,
the new program inherits the process's existing process signal mask.
• Any pending signals (those that have arrived but that were blocked) are cleared.
The new program won't get them .
• The time remaining for an alarm ( ) remains in place. (In other words, if a process
sets an alarm and then calls exec () directly, the new image will eventually get
10.10 Summary 399
the SIGALRM. If it does a fork () first, the parent keeps the alarm setting, wh ile
the child, which does the exec ( ) , does not.)
g
n NOTE Many, if not most, programs assume that signal actions are initialized
I to their defaults and that no signals are blocked . Thus, particularly if you didn 't
I write the program being run wi th exec ( ) , it's a good idea to unb lock all signals
~ before doing the exec ( ) .
®.
10.10 Summary
" Our story so far, Episode III."
-Arnold Robbins-
• Signal h andling interfaces have evo lved from simple but prone-to-race conditio ns
to complicated but reliable. Unfortunately, the multiplicity of interfaces makes
them harder to learn than many other Linux/Unix APIs.
• Each signal has an action associated with it. The action is one of the foll owing:
ignore the signal; perform the system default action; or call a user-provided handler.
The system default action, in rum, is o ne of the following: ignore the signal; kill
the process; kill the process and dump core; stop the process; or continue the
process if stopped.
• signa l () and rai se () are standardized by ISO C. signa l () manages actions
for particular signals; raise () sends a signal to the current process. Whether
signal handlers stay installed upon invocation, or are reset to their default values
is up to the implementation. signa l () and r aise () are the simplest interfaces,
and they suffice for many applications.
• POSIX defines the bsd_signa l () function, which is like signal () but guarantees
that the handler stays installed.
• What happens after a signal handler returns varies according to the type of system.
Traditional systems (V7, Solaris, and likely o thers) reset signal dispositions to their
default. On those systems, interrupted system calls return -1 , setting errno to
EINTR. BSD systems leave the handler installed and only rerum -1 with errno
set to EINTR when no data were transferred; otherwise, they restart the system call.
• GNU/Linux follows POSIX, which is similar but not identical to BSD. If no data
were transferred, the system call returns -l/EINTR. Otherwise, it returns a count
of the amount of data transferred. The BSD "always restart" behavior is available
in the sigaction () interface but is not the default.
• Signal handlers used with signal () are prone to race conditions. Variables of
type volatile sig_atomic_t should be used exclusively inside signal handlers.
(For expositional purposes, we did not follow this rule in some of our examples.)
Similarly, only the functions in Table 10.2 are safe to call from within a
signal handler.
• The System V Release 3 signal API (lifted from 4 .0 BSD) was an initial attempt
at reliable signals. Don 't use it in new code.
• The PO SIX API has multiple components:
• the process signal mask, which lists the currently blocked signals,
• the sigset_t type to represent signal masks, and the sigfi11 set ( ),
sigemptyset ( ), sigadds et ( ), sigdels et ( ), and sigisrnember () functions
for working with it,
• the si gprocrnask () function to set and retrieve the process signal mask,
• the sigpending () function to retrieve the set of pending signals,
• the s igaction () API and st ruc t sigac t i on in all their glory.
These facilities together use signal blocking and the process signal mask to provide
reliable signals. Furthermore, through various flags, it's possible to get restartable
system calls and a more capable signal handler that receives more information
about the reason for a particular signal (the s i ginfo_t structure) .
• ki 11 () and ki llpg ( ) are the POSIX mechanisms for sending signals. These
differ from rais e () in two ways: (1) one process may send a signal to another
process or an entire process group (permissions permitting, of course), and
(2) sending signal 0 does not send anything but does do the checking. Thus, these
functions provide a way to verify the existence of a particular process or process
group, and the ability to send it (them) a signal.
• Signals can be used as an IPe mechanism, although such use is a poor way to
structure your application and is prone to race conditions. If someone holds a gun
10.11 Exercises 401
to your head to make you work that way, use careful signal blocking and the
sigaction () interface to do it correctly.
• SIGALRM and the alarm () system call provide a low-level mechanism for notifi-
cation after a certain number of seconds have passed. pause () suspends a process
until any signal comes in. sleep () uses these to put a process to sleep for a given
amount of time: sleep () and alarm () should not be used together. pause ( )
itself opens up race conditions; signal blocking and s i gsuspend () should be
used instead.
• Job control signals implement job control for shells . Most of the time you should
leave them set to their default, but it's helpful to understand that occasionally it
makes sense to catch them.
• Catching SIGCHLD lets a parent know what its children processes are doing. Using
'signal (SIGCHLD, SIG_IGN)' (or sigaction() with SA_NOCLDWAIT) ignores
children altogether. Using sigaction ( ) with SA_NOCLDSTOP provides notification
only about termination. In the latter case, whether or not SIGCHLD is blocked,
signal handlers for S I GCHLD should be prepared to reap multiple children at once.
Finally, using sigaction ( ) without SA_ NOCLDSTOP with a three-argument signal
handler gives you the reason for receipt for the signal. (Whew!)
• After a fork ( ), signal disposition in the child remains the same, except that
pending signals and alarms are cleared. After an exec ( ) , it's a little more compli-
cated-essentially everything that can be left alone is; anything else is reset to
its defaults.
Exercises
1. Implement bsd_si g nal () by using s i gaction ( ) .

2. If you're not running GNU/Linux, run chlO - catch i nt on your system. Is
your system traditional or BSD?
3. Implement the System V Release 3 functions sighold (), sig r else () ,
s i g i g n o r e (), s i gp au se (), and s i gset () by using s i gaction () and the
other related functions in the POSIX API.
4. Practice your bit-bashing skills. Assuming that there is no signal 0 and that
there are no more than 31 signals, provide a typede f for sigset_ t and
write sigemptyse t (), sigfill set () , si gaddse t () , sigdelset ( ) , and

s igismember ().
5. Practice your bit-bashing skills some more. Repeat the previous exercise, this
time assuming that the highest signal is 42.
6. Now that yo u've done the previous two exercises, find sigemptys e t ( ) et al.
in your < s igna l . h> header file. (You may have to search for them; they co uld
be in files #i ncl uded by <signal. h>.) Are they macros or functions?
7. In Section 10.7, "Signals for Interprocess Communication," page 379, we
mentioned that production code should work with the initial process signal
mask, adding signals to be blocked and removing them except in the call to
s i gsllsp e n d ( ) . Rewrite the example, using the appropriate calls to do this.
8. Write your own version of the kill command. The interface should be
ki ll [ -s si gnal - n ame 1 pi d .. .
Without a specific signal, the program should send S I GTERM.

9. Why do you think m odern shells such as Bash and ksh93 h ave ki ll as a built-
in command?
10. (Hard). Implement sl eep (), usi ng a l arm (), si gnal (), and pau se () . What
if a signal handler for SIGAL RM is already in place?
11 . Experiment with chlO - re ap. c, changing the amount of time each child sleeps
and arranging to call sig suspend () enough times to reap all the children.
12. See if you can get chl O- re ap 2 . c to corrupt the information in ki d s, nki ds ,
and kids lef t . Now add blocki ng/unblocking around the critical section and
see if it makes a difference.
In this chapter
• 11.1 Checkin g Permissions page 404

• 11.2 Retri eving User and Group IDs page 407
• 11. 3 Checking As th e Real User: access () page 410
• 11.4 Checking as th e Effective Us er: euidaccess () (GLl BC) page 412
• 11.5 Setting Extra Permission Bits for Directo ries page 412
• 11.6 Setting Real and Effective IDs page 415
• 11 .7 Working with All Three IDs: getresuid () and setresuid ( )
( Linux ) page 421
• 11 .8 Crossing a Security Min efield: Setuid roo t page 422
• 11 .10 Summary page 424
403
l inux, following Unix, is a multiuser system. Unlike most operating systems for
personal computers,l in which there is only one user and whoever is physically
in front of the computer has complete control, Linux and Unix separate files and
processes by the owners and groups to which they belong. In this chapter, we examine
permission checking and look at the APIs for retrieving and setting the owner and
group identifiers.
11.1 Checking Permissions

As we saw in Section 5.4.2, "Retrieving File Information," page 141 , the filesystem
stores a file's user identifier and group identifier as numeric values; these are the types
uid_ t and g i d _ t, respectively. For brevity, we use the abbreviations UID and GID
for "user identifier" and "group identifier."
Every process has several user and group identifiers associated with it. As a simplifi-
cation, one particular UID and GID are used for permission checking; when the UID
of a process matches the UID of a file, the file's user permission bits dictate what the
process can do with the file . If they don't match, the system checks the GID of the
process against the GID of the file; if they match, the group permissions apply; otherwise
the "other" permissions apply.
Besides files, the UID controls how one process can affect another by sending it a
signal. Signals are described in Chapter 10, "Signals," page 347.
Finally, the superuser, r oo t, is a special case. r oot is identified by a UID of o. When
a process has UID 0 , the kernel lets it do whatever it wants to: read, write, or remove
files, send signals to arbitrary processes, and so on. (POSIX is more obtuse about this,
referring to processes with "appropriate privilege." This language in turn has filtered
down into the GNU/Linux manpages and the GLIBC online Info manual. Some oper-
ating systems do separate privilege by user, and Linux is moving in this direction as
well. Nevertheless, in current practice, "appropriate privilege" just means processes with
UID 0.)
1 MacOS X and Windows XP are both multiuser systems, bur this is a rath er recent development.
404
11.1 Checking Permissions 405
11.1.1 Real and Effective IDs

UID and GID numbers are like personal identification. Sometimes you need to
carry more than one bit of identification around with you. For instance, you may have
a driver's license or government identity card. 2 In addition, your university or company
may have issued you an identification card. Such is the case with processes too; they
carry multiple UID and GID numbers around with them, as follows:
Real user ID
The UID of the user that forked the process.
Effective user ID
The UID used for most permission checking. Most of the time, the effective and
real UIDs are the same. The effective UID can be different from the real one at
startup if the setuid bit of the executable program's file is set and the file is owned
by someone other than the user running the program. (More details soon .)
Saved set-user ID
The original effective UID at program startup (after the exec). This plays a role
in permission checking when a process needs to swap its real and effective UIDs
back and forth. This concept came from System V.
Real group ID
The GID of the user that created the process, analogous to the real UID.
Effective group ID
The GID used for permission checking, analogous to the effective UID .
Saved set-group ID
The original effective GID at program startup, analogous to the saved set-user
rD.
Supplemental group set
4.2 BSD introduced the idea of a group set. Besides the real and effective GIDs,
each process has some set of additional groups to which it simultaneously belongs.
Thus, when permission checking is done for a file's group permissions, not only
does the kernel check the effective GID, but it also checks all of the GIDs in the
group set.
2 Although th e U nited States does n 't have official identity cards, many co untries do.
406 Chapter 11 • Permissions and User and Group 10 Numbers
Any process can retrieve all of these values. A regular (non-superuser) process can
switch its real and effective user and group IDs back and forth. A r oot process (one
with an effective UID of 0) can also set the values h owever it needs to (although this
can be a one-way operati on).
11.1.2 Setuid and Setgid Bits

The setuid and setgidbits 3 in the file permissions cause a process to acquire an effective
UID or GID that is different from the real one. These bits are applied manually to a
file with the chrnod command:
$ chmod u+s myprogram Add setuid bit
$ chmod g+s myprogram Add setgid bit
$ 18 -1 myprogram
-rwsr- sr-x 1 arnold devel 4 573 Oce 9 18:17 mypr og r a m
The s character where an x character usually appears indicates the presence of the setu-
id/setgid bits.
As mentioned In Section 8.2.1, "Using Mount Options," page 239, the nos ui d
option to mount for a filesystem prevents the kernel from honoring both the setuid and
setgid bits. This is a security feature; for example, a user with a home GNU/Linux system
might handcraft a floppy with a copy of the shell executable made setuid to r oot. But
if the GNU/Linux sys tem in the office or the lab will o nly mount floppy fil esystem s
with the n osu i d option , then running this shell won ' t provide root access .4
The canonical (and probably overused) motivating example of a setuid program is
a game program. Suppose you've written a really cool game, and you wish to allow
users on the system to play it. The game keeps a score file , listing the highest scores .
If you're not the system administrator, you can' t create a separate group of just those
users who are allowed to play the game and thus write to the score file. But if yo u make
the file world-writable so that anyone can play the game, then anyone can also cheat
and put any name at the top .
3 Denni s Ritchie, the invemor of C and a cocreator of Unix, received a patem for th e setuid bit: Protection o/D ata
File Contents, US Patem number 4,135,240. See h ttp:/ / www. d elphion . com/ detail s ?pn=US041352 40__
and al so http : // '.'.'I'.'W . u s pto. gOY. AT&T assigned the patent to the public, allowing anyo ne to use
its technology.
4 Security for GNU/ Linux and U nix systems is a deep topic in and of itself. T his is JUSt an example; see Section 11 .9,
"Sugges ted Reading, " page 423.
1l.2 Re[rieving User and Group IDs 407
However, by making the game program setuid to yourself, users running the game
have your UID as their effective UID. The game program can then open and update
the file as needed, but arbitrary users can't come along and edit it. (You also open
yourself up to most of the dangers of setuid programming; for example, if the game
program has a hole that can be exploited to produce a shell running as you, all your
fi les are available for deletion or change. This is a justifiably scary thought.)
The same logic applies to setgid programs, although in practice setgid programs are
much less used than setuid ones. (This is roo bad; many things that are done with setuid
roo t programs could easily be done with setgid programs or programs that are setuid
to a regular user, instead. 5)
11.2 Retrieving User and Group IDs

Getting the UID and GID information from the system is straightforward. The
functions are as follows:
~ include <unistd . h> POSIX
uid_ t getuid (v o id ) ; Real and effective U/O

uid_t geteuid (void ) ;
gid_t getgid(void ) ; Real and effective C/O

g i d_t getegid(void ) ;
i n t getgroup s ( int si z e , g i d_t list[]) ; Supplemental group list
The functions are:

u i d _ t g e tu i d(vo i d)
Returns the real UID .
uid_t geteuid (vo i d)
Returns the effective UID,
gid_t getgid( v oid )
Returns the real GID .
gid_ t getegid (vo id )
Returns the effective GID.
5 O ne program designed for (his purpose is GN U u s e rY (ft p : II ft p . gnu . o r g I g nu / use r v I ).

408 Chapter 11 • Permissions and User and Group ID Numbers
int getgr oups(int siz e , gid_t list[))

Fills in up to size elements of list from the process's supplemental group set.
The return val ue is the number of elements filled in or -1 if there's an error. It is
implementation defined whether the effective GID is also included in the set.
On POSIX-compliant systems, you can pass in a s i ze value of zero; in this case,
getgroups () returns the number of groups in the process's group set. You can
then use that value to dynamically allocate an array that's big enough.
On non-POSIX systems , the constant NGROUPS_MAX defines the maximum neces-
sary size for the list array. This constant can be found in <limi ts. h> on modern
systems or in <sys /param. h> on older ones. We present an example shortly.
You may have noticed that there are no calls to get the saved set-user 10 or saved
set-group 10 values. These are just the original values of the effective UID and effective
GID. Thus, you can use code like this at program startup to obtain the six values:
uid_t r uid , euid , saved_uid;
gid_t rgid , egid , s aved_ gid;
i nt main(int argc, c h a r **argv)
ruid g etuid ( ) ;
euid saved uid g e teuid () ;
rgid getg id () ;
egid saved_gid g etegid();
rest of progra m ...
Here is an example of retrieving the group set. As an extension, gawk provides

awk-Ievel access to the real and effective UID and GID values and the supplemental
group set. To do this, it has to retrieve the group set. The following function is from
main. c in the gawk 3.1.3 distribution:
1080 f * ini t_g r oupset -- - ini t ia lize groupset * f
108 1
1082 sta t i c vo id
1083 init_gr oupse t ()
1084 {
1085 #if defined (HAVE_GETGROUPS) && defined (NGROUPS_ MAX) && NGROUPS_MAX > 0
1086 #ifdef GETGROUPS_ NOT_ STANDARD
1087 / * For sys t ems that a ren't standards conformant , us e old way. * f
1088 n g r oups = NGROUPS_MAX ;
11.2 Retrieving User and Group IDs 409
1089 #else
1090 /*
1091 * If called with 0 for both args, return value is
1092 * total number of groups .
1093 */
1094 ngroups = getgroups(O, NULL):
1095 #endif
1096 if (ngroups == -1)
1097 fatal (_("could not find groups : Is"), strerror(er rno)) :
1098 else if (ngroups == 0)
1099 return:
1100
1101 / * fill in groups * /
1102 emalloc(groupset, GETGROUPS_T * ngroups * sizeof (GETGROUPS_T) ,
" ini t_groupset " ) :
1103
1104 ngroups = getgroups(ngroups , groupset):
1105 if (ngroups == -1)
11 06 fatal(_("could not find groups : Is") , strerror(errno )) :
1107 #endif
1108
The ng r oups and groupset variables are global; their declaration isn't shown. The
GETGROUPS _T macro (line 1102) is the type to use for the second argument; it's gid_ t
on a POSIX system, i n t otherwise.
Lines 1085 and 1107 bracket the entire function body; on ancient systems that don' t
have group sets at all, the function has an empty body.
Lines 1086-1088 handle non-POSIX systems; GETGROUPS_NOT_STANDARD is defined
by the configuration mechanism before the program is compiled. In this case, the code
uses NGROUPS_ MAX, as described earlier. (As late as 2004 , such systems still exist and
are in use; thankfully though, they are diminishing in number.)
Lines 1089-1094 are for POSIX systems, using a s i z e parameter ofzero to retrieve
the number of groups.
Lines 1096-1099 do error checking. If the return value was 0, there aren ' t any sup-
plemental groups, so ini t _ groupset () merely returns early.
Finally, line 1102 uses malloe () (through an error-checking wrapper macro, see
Section 3.2.1.8 , "Example: Reading Arbitrarily Long Lines," page 67) to allocate an
array that's large enough. Line 1104 then fills in the array.
11.3 Checking As the Real User: access ( )

Most of the time, the effective and real UID and GID values are the same. Thus, it
doesn 't matter that file-permission checking is performed against the effective ID and
not the real one.
However, when writing a setuid or setgid application, you sometimes want to check
whether a file operation that's OK for the effective UID and GID is also OK for the
real UID and GID . This is the job of the acces s () function:
int access (const char *path, int amode ) ;
The path argumem is the pathname of the file to check the real UID and GID
against. amode is the bitwise OR of one or more of the following values:
R_OK The real UID/GID can read the file.
W_OK The real UID/GID can write the file.
X OK The real UID/GID can execute the file, or if a directory, search through the
directory.
F _ OK Check whether the file exists.
Each component in the pathname is checked, and on some implementations, when
checking for root, acce ss () might act as if X_OK is true, even if no execute bits are
set in the file 's permissions. (Strange but true: In this case, forewarned is forearmed.)
Linux doesn't have this problem.
If pa th is a symbolic link, acces s ( ) checks the file that the symbolic link points to.
The return value is 0 if the operation is permitted to the real UID and GID or -1
otherwise. Thus, if access () returns - 1, a setuid program can deny access to a file
that the effective UID/GID would othetwise be able to work with:
if ( access ( " / some / specia l/ file", R_OK IW_OK ) < 0) {
fprintf ( stderr, "So rry : / some / special / file: %s \ n", strerr o r ( errno)) ;
exit (1) ;
At least with the 2.4 series of Linux kernels, when the X_OK test is applied to a
filesystem mounted with the noexec option (see Section 8.2.1, "Using Mount Options,"
page 239), the test succeeds if the file's permissions indicate execute permission. This
is true even though an attempt to execute the file will fail. Caveat emptor.
1l. 3 Checking As [he Real User: acc ess ( ) 411
NOTE
W
.". While using access () before open ing a file is proper practice, a race
l.condition exists : The fil e being opened could be swapped out in between the
m check with acc ess ( ) and the call to open (). Careful programming is required,
suc h as checking owner and permission wit h s tat () and fs tat () before and
.ji
, after the calls to ac cess () and open () .
For example, the pathchk program checks pathnames for validiry. The GNU version
uses ac ces s () to check that the directory componenrs of given pathnames are valid.
From the Coreutits pa thchk. c :
244 / * Return 1 if PATH is a u sable leading d irec tory, ° if not,
245 2 if it does n ' t exist . * /
246
241 static int
248 dir_ok (const char *path)
249
250 struc t stat scacs;
251
252 if ( stat (path, &sta ts)) No nzero return = fa ilure
253 return 2 ;
254
255 if ( ! S_ISDIR (s tats . st_mode ))
256 {
251 err o r (0 , 0, _ ( " '% s' is not a dire ctory" ) , pa th ) ;
258 return 0 ;
259
260
26 1 / * Use access to test for search permission because
262 testing pe rmissi on bit s of st_mode can lose with new
263 access control mechan isms . Of cour se , access lo ses if you' re
264 running setuid. * /
265 if (ac cess (path , X_OK) ! = 0)
266 {
261 if (errno == EACCES)
268 error (0,0, _( "di rectory '% s' is not searchable), path) ;
269 else
210 error (0 , errno , "%s", path ) ;
21 1 return 0 ;
212
213
214 return 1;
215 }
The code is straightforward. Lines 252-253 check whether the file exists. If stat ()
fails, then the file doesn't exist. Lines 255-259 verify that the file is indeed a directory.
The commenr on lines 261-264 explains the use of a c c e ss ( ) . Checking the s t _ffiode
bits isn ' t enough: The file could be on a filesys tem that was mounted read-only, on a
412 Chapter 11 • Permissions an d User and Group ID N um bers
remote filesystem, or on a non-Lin ux or non-U nix filesystem, or the file co uld have fil e
attributes that prevent access . T hus, only the kernel can really tell if the access would
work. Lines 265-272 do the check, with the error message being determined by the
value of e rrno (lines 267-270).
11.4 Checking as the Effective User: euidaccess () (GLlBC)

GLIBC provides an additional function that works like acces s () but that checks
according to the effective UID , GID and group set:
#in clude <unis td . h> CLiBe
i nt euidac cess (const c har *path , int amodel ;
The arguments and return value have the same meaning as for ac cess () . When
the effective and real UIDs are equal and the effective and real G IDs are equal,
e uidacc ess () calls a c ce ss () to do the test. T his has the advantage that the kernel
can test for read-only filesystems or other conditions that are not reRected in the file 's
ownership and permissions.
Otherwise, euida cc e s s () checks the file's owner and group values against those of
the effective UID and GID and gro up set, using the appropriate permission bits. This
test is based on the file's s t a t () information .
If you're writing a portable program but prefer to use this interface, it's easy enough
to extract the source file from the G LIBC archive and adapt it fo r general use.
11.5 Setting Extra Permission Bits for Directories

On modern system s, the setgid and "sticky" bits each have special m eaning when
applied to directories.
11.5.1 Default Group for New Files and Directories

In the original Unix system , when open () or c r eat () created a new file, the file
received the effective U ID and GID of the process creating it.
V7, BSD through 4 .1 BSD , and System V through Release 3 all treated directories
like files. However, with the addition of the supplemental group set in 4.2 BSD , the
way new directories were created changed: new directories inherited the group of the
11.5 Setting Exrra Permission Birs for Direcrories 413
parent directory. Furthermore, new files also inherited the group 10 of the parent di-
rectory and not the effective GID of the creating process.
The idea behind having multiple groups and directories that work this way is to fa-
cili tate group cooperation. Each organizational project using a system would have a
separate group assigned to it. The top-level directory for each project would be in that
project's group, and files for the project would all have group read and write (and if
necessary, execute) permission. In addition, new files automatically get the group of
the parent directory. By being simultaneously in multiple groups (the gro up set), a user
could move among projects at will with a simple cd command, and all files and direc-
tories would maintain their correct gro up.
What happens on modern systems? Well, this is another of the few cases where it's
possible to have our cake and eat it too. SunOS 4.0 invented a mechanism that was
included in System V Release 4; it is used today by at least Solaris and GNU/Linux.
These systems give meaning to the setgid bit on the parent directory of the new fi le or
directory, as follows:
Setgid bit on parent directory clear
New files and directories receive the creating process's effective GID.
Setgid bit on parent directory set
New files and directories receive the parent directory's GID. New directories also
inherit the setgid bit being on.
(Until SunOS 4.0, the setgid bit on a directory had no defined meaning.) The fol-
lowing session shows the setgid bit in action:
$ cd /tmp Move to / tmp
$ Is -ld . Check its permissions
drwxrwxrwt 8 root root 4096 Oct 16 17 : 40 .
$ id Check out current groups
uid=2076 (arnold ) gid=42 (deve l ) group s= 19(flop py ) , 42(deve l) , 2076 (arnold )
$ mkdir d1 ; Is -ld d1 Make a new directory
drwxr - x r- x 2 arnold devel 4096 Oct 16 1 7: 40 d1 Effective group ID inherited
$ chgrp arnold d1 Change the group
$ chmod g+s d1 Add setgid bit
$ Is -ld d1 Verify change
drwxr-sr-x 2 arnold arnold 4096 Oct 16 17 : 40 d 1
$ cd d1 Change into it
$ echo this should have group arnold on it > £1 Create a new file
$ 1s -1 f1 Check permissions
- rw-r--r-- 1 arno ld arno ld 36 Oct 16 17 : 41 f1 Inherited from parent
$ mkdir d2 Make a directory

5 18 -l d d2 Check permissions
drwxr - sr- x 2 arn old arnold 4096 Oct 1 6 1 7 : 51 d2 Group and setgid inherited
The ext2 and ex t3 filesystems for GNU/Linux work as just shown. In addition
they support special m ount options, grpi d and b sdgroup s, which make the " use
parent directory group" semantics the default. (The two names mean the same thing.)
In other words, when these mount options are used, then parent directories do not
have to have their setgid bits set.
The opposite mount options are n ogrpi d and sysvgr oups. This is the default be-
havior; however, the setgid bit is still honored if it's present. (Here, too, the two names
mean the same thing.)
POSIX specifies that new files and directories inherit either the effective GID of the
creating process or the group of th e parent directory. However, implementations have
to provide a way to m ake new directories inherit the group of the parent directory.
Furthermore, the standard recommends that applications not rely on one behavior or
the other, but in cases where it m atters, applications should use c h own () to force the
ownership of the new file or directory's group to the desired GID.
11.5.2 Directories and the Sticky Bit

"She rman , set the wayback machin e for 1976."
- Mr. Peabody-
The sticky bit originated in the PDP-I I versions of Unix and was applied to regular
executable files. 6 This bit was applied to programs that were expected to be heavily
used, such as the shell and the editor. When a program had this bit set, the kernel would
keep a copy of the program's executable code on the swap device, from which it co uld
be quickly loaded into memory for reuse. (Loading from the filesystem took longer:
T he image on the swap device was stored in contiguo us disk blocks, whereas the image
in the filesystem might be spread all over the disk.) The executable images "stuck" to
the swap device, hence the name.
Thus, even if the program was not currently in use, it was expected that it would be
in use again sho rtly when another user went to run it, so it would be loaded quickly.
6 Images come [Q mind of happy yourhfu l programs, rhei r faces and hands covered in chocolare.
11.6 Swing Real and Effeccive IDs 415
Modern sys tems have considerably faster disk and memory hardware than the
PDP-11s of yore. They also use a technique called demand paging to load into memory
only those parts of an executable program that are being executed. T hus, today, the
sticky bit on a regular executable file serves no putpose, and indeed it has no effect.
However, in Section 1.1.2, "Di rectories and Filenames, " page 6, we mentioned that
the sticky bit on an otherwise writable directory prevents file removal from that direc-
tory, or file renaming within it, by anyone except the file 's owner, or r oo t. Here is an
example:
$ Is -ld /tmp Show / tmp's permissions
drwxrwxrwt 19 root root 4096 Oct 20 14 : 04 Itmp
$ cd /tmp Change there
$ echo this is my file > arnolds-file Create a file
$ Is -1 arnolds-file Show its permiss ions
-rw-r--r-- 1 arnold devel 1 6 Oc t 20 14 : 14 arnolds-file
$ su - miriam Change to another user
Password :
$ cd /tmp Change to / tmp
$ rm arnolds-file Attempt to remove file
rm : remove write-protected regular file 'a r nolds-file'? y rm is cautious
rm : cannot remove 'arnolds - file' : Operation not permitted Kernel disallows removal
The primary purpose of this feature is exactly for directories such as / t mp , where
multiple users wish to place their files. On the one hand, the directory needs to be
world-writable so that anyone can create files in it. On the other hand, once it's world-
writable, any user can remove any other user's files! The directory sticky bit solves this
problem nicely. Use 'chmod +t ' to add the sticky bit to a fi le or directory:
$ mkdir mytmp Create directory
$ chmod a+wxt mytmp Add all-write, sticky bits
$ Is -ld mytmp Verify result
drwx rwxrwt 2 arnold devel 4 096 Oct 20 14 : 23 mytmp
Finally, note that the directory's owner can also remove files, even if they don' t belong
to him.
11.6 Setting Real and Effective IDs

Things get interesting once a process has to change its UID and GID val ues. Setting
the group set is straightforward. Changing real and effective U ID and GID values
around is more involved.
11.6.1 Changing the Group Set

The setgroups ( ) function installs a new group set:
#incl ude <sys/ types .h> Common
#include <uni std . h>
#include <grp . h>
int setg r aups(size_t si ze , canst gid_ t *list ) ;
The s ize parameter indicates how many items there are in the list array. The return
value is 0 if all went well, -1 with errno set otherwise.
Unlike the functions for manipulating the real and effective UID and GID values,
this function may only be called by a process running as r oot . This is one example of
what POSIX terms a privileged operation; as such it's not formally standardized
by POSIX.
setgroups () is used by any program that does a login to a system, such as
I bin !login for console logins or I bin / sshd for remote logins with ssh.
11.6.2 Changing the Real and Effective IDs

Running with two different user IDs presents a challenge to the application program-
mer. There are things that a program may need to do when working with the effective
UID , and other things that it m ay need to do when working using the real UID.
For example, before Unix systems had job control, many programs ptovided shell
escapes, that is, a way to run a command or interactive shell from within the current
program. The ed editor is a good example of this: Typing a command line beginning
with! ran the rest of the line as a shell command. Typing'! s h ' gave you an interactive
shell. (This still works-try it!) Suppose the hypothetical game program described ear-
lier also provides a shell escape: the shell should be run as the real user, not the effective
one. Otherwise, it again becomes trivial for the game player to directly edit the score
file or do lots more worse things!
Thus, there is a clear need to be able to change the effective UID to be the real UID .
Furthermore, it's helpful to be able to switch the effective UID back to what it was
originally. (This is the reason for having a saved set-user ID in the first place; it becomes
possible to regain the original privileges that the process had when it started out.)
As with many Unix APIs, different systems solved the problem in different ways,
sometimes by using the same API but with different semantics and sometimes by
11.6 Swing Real and Effecrive IDs 417
introducing different APIs. Delving into the historic details is only good for producing
headaches, so we don ' t bother. Instead, we look at what POSIX provides and how each
API works. Furthermore, our discussion focuses on the real and effective UID values;
the GID values work analogously, so we don't bother to repeat the details for those
sys tem calls. The functions are as follows:
#include <sys / type s . h> POSIX
#inc lude <unistd .h>
int seteuid(uid_t euid); Set effective 10

i nt setegid (gid_t egid ) ;
int setui d(uid_t u i d ) ; Set effective 10, if root, set all

int setgid(gid_t gid) ;
int setreuid(uid_t ruid, uid_t euid) ; BSD compatibility, set both

in t setreg id (gid_t rgid , gid_t egid) ;
There are three sets of functions . The first two were created by POSIX:
int seteuid(u i d_t eu id)

This function sets only the effective UID. A regular (non- root ) user can only set
the 10 to one of the real, effective, or saved se t-user ID values . Applications that
will switch the effective UID aro und should use this function exclusively.
A process with an effective UID of zero can se t the effective UID to any value.
Since it is also possible to set the effective UID to the saved set-user 10, the process
can regain its root privileges with another call to set euid ( ) .
int seteg id(gid_t egid)
This function does for the effective group ID what seteui d ( ) does for the effective
user 10.
The next set of functions offers the original Unix API for changing the real and ef-
fective UID and GID. U nder the POSIX model, these function s are what a setuid- root
program should use to make a permanent change of real and effective UID:
int setuid (uid_t uid )

For a regular user, this function also sets only the effective UID. As with
seteuid ( ) , the effective UID may be set to any of the current real, effective, or
saved set-user ID values. The change is not permanent; the effective UID can be
changed to another value (from the same source set) wi th a subsequent call.
However, for r oo t, this function sets all three of the real , effective, and saved set-
user IDs to the given value. Furthermore, the change is permanent; the IDs cannot
be changed back. (This makes sense: Once the saved set-user ID is changed, there
isn' t a different ID to change back to.)
in t setgid(gid_t gid)
This function does for the effective group ID what setuid ( ) does for the effective
user ID. The same distinction between regular users and root applies.
II NOTE The ability to change the group ID h inges on th e effective user !D . An

effective GI D of 0 has no special privileges.
Finally, POSIX provides two functions from 4.2 BSD for historical compatibility.
It is best not to use these in new code. However, since yo u are likely to see older code
which does use these [unctions, we describe them here.
int setreu id(uid_t ruid , uid_t euid)

Sets the real and effective UIDs to the given values. A value of -1 for ruid or
eui d leaves the respective ID unchanged. (This is similar to chown ( ) ; see Sec-
tion 5.5.1, "Changing File Ownership: chown () , fchown ( ), and lchown () ,"
page 155 .)
root is allowed to set both the real and the effective IDs to any value. According
to POSIX, non-r oot users may only change the effective ID ; it is "unspecified"
what happens if a regular user attempts to change the real U ID. However, the
GNU/Linux setreuid(2) manpage spells out the Linux behavior: The real UID
may be set to either the real or effective UID , and the effective UID m ay be set
to any of the real, effective, or saved set-user IDs. (For other sys tems, see the se-
treuid(2) manpage. )
int setreg i d(g id_t rgid, gid_t eg id)
Does for the real and effective group IDs what setreuid () does for the real and
effective user ID. The same distinction between regular users and root applies.
The saved set-user ID didn't exist in the BSD model, so the idea behind se treuid ( )
and setregid () was to make it simple to swap the real and effective IDs:
setreuid (geteui d(), getu i d ()) ; /' s wap rea l and effe ctive * /
11.6 Se[[ing Real and Effec[ive IDs 419
However, given POSIX's adoption of the saved set-user 10 model and the seteuid ( )
and s e t e gid () functions , the BSD functions should not be used in new code. Even
the 4.4 BSD documentation marks these functions as obsolete, recommending
seteuid () Isetui d ( ) and setegid ( ) Isetgid () instead.
11.6.3 Using the Setuid and Setgid Bits

There are important cases in which a program running as r oot must irrevocably
change all three of the real, effective, and saved set-user IDs to that of a regular user.
The most obvious is the login program, which you use every time you log in to a
GNU/ Linux or Unix system (either directly, or remotely). There is a hierarchy of pro-
grams, as outlined in Figure 11.1.
ini t PIO 1
fork() l e x ec () fork () lexec ()
getty getty PIO 523 ruid: 0 euid: 0
exec ( )
login PIO 523 ruid: 0 euid: 0
1
o pen ( ) I dup ( )
set groups () I setgi d ( ) I setuid ( )
e x ec ()
she ll PIO 523 ruid: 42 euid: 42
FIGURE 11.1
From init to g etty to login to shell
The code for l ogin is too complicated to be shown here, since it deals with a number
of tasks that aren' t relevant to the current discussion. But we can outline the steps that
happen at login time, as follows:
420 Chapter 11 • Permissions and U ser and Group 10 Numbers
1. ini t is the primordial process. It has PID 1 . All other processes are descended
from it. The kernel handcrafts process 1 at boot time and runs in i t in it. It
runs with both the real and effective UID set to zero, that is, as roo t.
2. ini treads / etc / ini t tab, which, among other things, tells ini t on which
hardware devices it should start a get ty process. For each such device (such
as the console, serial terminals, or virtual consoles on a GNU/Linux system) ,
ini t forks a new process. This new process then uses exec () to run get ty
("get tty, " that is, a terminal). On many GNU/Lin ux systems, this command
is named mi nge tty. The program opens the device, resets its state, and prints
the ' logi n : ' prompt.
3. Upon reading a login name, getty execs login . The login program looks
up the username in the password file , prompts for a password, and verifies the
password. If they match, the login process continues.
4. login changes to the user's home directory, sets up the initial environment,
and then sets up the initial set of open files. It closes all file descriptors, opens
the terminal , and uses dup () to copy the terminal's file descriptor to 0, 1, and
2 . This is where the already opened standard input, o utput, and error file de-
scriptors come from.
5. log in then uses setgr oups () to set the supplemental group set, setgid ( )
to set the real, effective, and saved set-group IDs to those of the user, and
finally setuid () to set all three of the real, effective, and saved se t-user IDs to
those of the logging-in user. Note that the call to setuid () must come last so
that the other two calls succeed.
6. Finally, login execs the user's login shell. Bourne-style shells then read
/ e tc / p rofile and $ HOME / .pro fi le, if those files exist. Finally, the shell
pnnts a prompt.
Note how one process changes its nature from system process to user process. Each
child of ini t starts out as a copy of ini t. By using e x ec ( ), the same process does
different jobs. By calling setu id ( ) to change from roo t to a regular user, the process
finally goes directly to work for the user. When you exit the shell (by CTRL-0 or exi t),
the process simply dies. i ni t then restarts the cycle, spawning a fresh get ty, which
prints a fresh 'log in :' prompt.
11.7 Working with All Three IDs: get res uid () and setre su id () (Linux) 421
m
1NOTE Open files remain ope n and usable , even after a process has changed
I
, anyorall of its UIDs or GIDs . Thus, setuid programs should open any necessary
~~:: ~ft~~~~~~h;~t~:~~e~~~~::;a t;~~i~e~:~he real user, and continue with the
Table 11 .1 summarizes the six standard functions for manipulating UID and
GID values.
TABLE 11 .1
API summary for setting real and effective 105
Function Sets Permanent Regular user Root

se teuid () E No From R, E, S Any value
set egid () E No From R, E, S Any value
setuid () Root: R, E, S Root: yes From R, E Any value
Other: E Other: no
setgid () Root: R, E, S Root: yes From R, E Any value
Other: E Other: no
setr euid () E, may set R No From R, E Any value
setr eg id () E, may set R No From R, E Any value
11.7 Working with All Three IDs: getresuid () and setresuid ( )

( Linux)
Linux provides additional system calls by which yo u can work directly with the real,
effective, and saved user and group IDs:
#include <sys/types.h> Linux
#include <unistd .h>
int g e tresuid (u id_t *ruid, uid_ t *euid, uid_t *sui d) ;

int g e tresgid (g id_t *rgid, gid_t *egid, g id_t *sg id) ;
int setr esuid(uid_t ruid, u id_t euid, uid_t suid ) ;

int setresgid(gid_t rgid, gid_t egid, gid_ t sgid) ;
The functions are as follows:

422 Chapter 11 • Permissions and User and Group 1D Numbers
in t getre suid(ui d_t *ruid, uid_ t *euid, uid_t *suid )

Retrieves the real, effective, and saved set-user ID values. The return value is 0 on
success or - 1 if an erro r, with errno indicating the problem.
int getresgid(gid_t *rgid, gid_t *egid, gid_t *sgid)
Retrieves the real, effective, and saved set-group ID values. The return value is 0
on success or -1 if an error, with errno indicating the problem.
in t setresuid(uid_t ruid, uid_t euid, uid_t suid)
Sets the real, effective, and saved set-user 10 values respectively. When a parameter
value is - 1, the corresponding U ID is left unchanged.
When the process is running as root, the parameters can be any arbitrary values.
(However, using a nonzero value for euid causes a permanent, irrevocable loss of
r oot privilege.) Otherwise, the parameters must be one of the current real , effec-
tive, or saved set-user ID values.
int setresgid(g id_t rgid, gid_t egid , gid_t sgid)
Sets the real, effective, and saved set-group 10 values respectively. When a param-
eter value is -1, the corresponding GID is left unchanged.
This function is analogous to setresuid ( ) .
The setr esu id () and setresgid () functions are particularly valuable because
the semantics are clearly defined. A programmer knows exactly what the effect of the
call will be.
Furthermore, the calls are "all or nothing" operations: They either succeed completely,
making the desired change, or fail completely, leaving the current situation as it was.
This improves reliability since, again, it's possible to be sure of exactly what happened.
11.8 Crossing a Security Minefield: Setuid root

Real minefields are difficult, but not impossible, to cross. However, it's not something
to attempt lightly, witho ut training or experience.
So, too, writing programs that run setuid to root is a difficult task. There are many,
many iss ues to be aware of, and almost anything can have unexpected security conse-
quences. Such an endeavor should be undertaken carefully .
1l.9 Suggesred Reading 423
In particular, it pays ro read up on Linux/Unix securiry issues and ro invest time in

learning how ro write seruid root programs. If yo u dive straight into such a challenge
having read this book and n o thing else, rest assured that your system will be bro ken
into, easily and immediately. It's unlikely that either you or your customers will be happy.
Here are a few guiding principles:
• Do as little as possible as root . Use yo ur super powers sparingly, only where

they're absolutely needed.
• Design your program properly. Compartmentalize yo ur program so that all of the
root operations can be done up front, with the rest of the program running as a
regular user.
• When changing or dropping privileges, use setresuid () if you have it. Otherwise
use set reui d ( ), since those two functi ons have the cleanest semantics . Use
setuid () only when you want the change to be permanent.
• C hange from root ro regular user in the proper order: se t the group set and GID
values first, and then the UID values.
• Be especially careful w ith fork () and exec (); the real and effective U IDs are
not changed across them unless yo u explicitly change them.
• Consider using se tgid permissions and a special group for your application. If that
will wo rk, it'll save yo u much headache.
• Consider throwing o ut the inherited environment. If you must keep some envi-
ronment variables around, keep as few as possible. Be sure ro provide reasonable
values for the PATH and IFS environment variables.
• Avoid execlp () and exe cvp ( ) , which depend upon the value of the PATH envi-
ronment variable (although this is less problematic if yo u've reset PATH yo urself).
These are just a few of the many tactics for traversing a danger zone notable for
pitfalls, booby-traps, and landmines. See the next section for pointers ro o ther sources
of information.

U nix (and thus GNU/Linux) securiry is a ropic that requires knowledge and experi-
ence ro handle properly. It has gotten only harder in the Internet Age, not easier.
1. Practical UNIX & Internet Security, 3rd edition, by Simson Garfinkel, Gene
Spafford, and Alan Schwartz, O'Reilly & Associates, Sebastopol, CA, USA,
2003 . ISBN: 0-596-00323-4.
This is the standard book on Unix security.
2. Building Secure Software: How to A void Security Problems the Right Way, by
John Viega and Gary McGraw. Addison-Wesley, Reading, Massachusetts,
USA, 2001. ISBN: 0-201-72152-X.
This is a good book on writing secure software and it includes how to deal with
setuid issues. It assumes you are familiar with the basic Linux/Unix APIs; by
the time you finish reading our book, you should be ready to read it.
3. "Setuid Demystified," by Hao Chen, David Wagner, and Drew Dean.
Proceedings of the 11th USENIX Security Sy mposium, August 5-9, 2002.
http: // www . cs . berkeley. edu / -da w/ papers / setuid-us enix 0 2 . pdf .
Garfinkel, Spafford, and Schwartz recommend reading this paper "before you
even think about writing code that tries to save and restore privileges." We
most heartily agree with them.
11.10 Summary
• The use of user and group ID values (UIDs and GIDs) to identify files and pro-
cesses is what makes Linux and Unix into multiuser systems. Processes carry both
real and effective UID and GID values, as well as a supplemental group set. It is
generally the effective UID that determines how one process might affect another,
and the effective UID , GID, and group set that are checked against a file 's permis-
sions. Users with an effective UID of zero, known as r oot or the superuser, are
allowed to do what they like; the system doesn't apply permission checks to such
a user.
• The saved set-user ID and saved set-group ID concepts came from System V and
have been adopted by POSIX with full support in GNU/Lin ux. Having these
separate ID values makes it possible to easily and correctly swap real and effective
UIDs (and GIDs) as necessary.
11.10 Summary 425
• Setuid and setgid programs create processes in which the effective and real IDs
differ. The programs are marked as such with additional bits in the file permissions.
The setuid and setgid bits must be added to a file after it is created.
• getuid ( ) and geteu id () retrieve the real and effective UIO values, respectively,
and getgid () and getegid () retrieve the real and effective GID val ues, respec-
tively. getgroups () retrieves the supplemental group set and in a POSIX envi-
ronment, can query the sys tem as to how many members the group set contains.
• The a ccess () function does file permission checking as the real user, makin g it
possible for setuid programs to check the real user's permissions . Note that, often,
examining the information as retrieved by sta t () may not provide the full picture,
given that the file may reside on a nonnative or network filesystem.
• The GLIBC euidacc ess () function is similar to ac cess ( ) but does the checking
on the base of the effective UIO and GID values.
• The setgid and sticky bits, when applied to directories, introduce extra semantics .
When a directOlY has its setgid bit on , new files in that directory inherit the direc-
tory's group. New directories do also, and they automatically inherit the setting
of the setgid bit. Without the setgid bit, new files and directo ries receive the effec-
tive GID of the creating process. The sticky bit on otherwise writable directories
restricts file removal to the file's owner, the directory's owner, and to root .
• The gro up set is changed with setgroups ( ) . This function isn't standardized by
POSIX, but it exists on all modern Unix systems. Only root may use it.
• Changing UIOs and GIDs is considerably involved. The semantics of various
system calls have changed over the years. New applications that will change only
their effective UIO/GI D should use seteui d () and seteg id ( ) . Non-roo t ap-
plications can also set their effective IDs with setu id () and se tgid () . The
setreu id () and set regi d () calls from BSD were intended for swapping the
UID and GID values; their use in new programs is discouraged.
• Applications running as roo t can permanently change the real, effective, and
saved ID values with setu id () and setgid () . One example of this is login,
which has to change from a system program running as root to a nonprivileged
login shell running as a regular user.
• The Linux setresu i d () and setre sgid () functi ons should be used when
they' re available, since they provide the cleanest and most reliable behavior.
• Writing setuid-root applications is not a task for a novice. If you need to do such
a thing, read up on security issues first ; the sources cited previously are excellent.
Exercises
1. Write a simple version of the id command. Its action is to print the user and
group IDs, with the group names, to standard output. When the effective and
real IDs are different, both are printed. For example:
$ id
uid=2076 (arnold) gid=42 (devel) group s =19 (floppy) ,42 (devel ) ,2 076(arno ld )
Its usage is:

id [ user 1
id -G [ - nr user
id -g [ -nr u ser
id -u [ -nr u ser
With user, that user's information is displayed ; otherwise, id prints the invok-
ing user's information. The options are as follows:
-G Print all the group values as numeric values only, no names.
-n Print the name only, no numeric values . Applies to user and group
values.
-g Print just the effective GID.
-u Print just the effective UID.
2. Write a simple program, named surne , that is seruid to yo urself. It should
prompt for a password (see getpass(3)) , which for the purposes of this exercise,
can be hardwired into the program's source code. If the person running the
program correctly enters the password, sume sho uld exec a shell. Get another
user to help yo u test it.
3. How do you feel about making s ume available to your friends? To your fellow
students or coworkers? To every user on your system?
In this chapter
• 12.1 Assertion Statem ents: asser t ( ) page 428
• 12.2 Low-Level Memory: The mernXXX () Functions page 432
• 12.3 Temporary Files page 436
• 12.4 Com mitting Suicide: abort () page 445
• 12 .5 Nonlocal Gotos page 446
• 12.6 Pseudorandom Numbers page 454

• 12.7 Metacharacter Expansions page 461
• 12.8 Regul ar Expressions page 471
• 12. 10 Summ ary page 481
427
C hapter 6, "General Library Interfaces - Part 1," page 165, presented the first
set of general-purpose library APIs. In a sense, those APIs support working
with the fundamental objects that Linux and Unix systems manage: the time of
day, users and groups for files , and sorting and searching.
This chapter is more eclectic; the APls covered here are not particularly related to
each other. However, all are useful for day-to-day Linux/Unix programming. Our
presentation moves from simpler, more general APls to more complicated and more
specialized ones.
12.1 Assertion Statements: assert ( )

An assertion is a statement you make about the state of your program at certain points
in time during its execution. The use of assertions for programming was originally de-
veloped by CA.R. H oare. 1 The general idea is part of " program verification": That as
you design and develop a program, you can show that it's correct by making carefully
reasoned statements about the efTects of your program's code. Often, such statements
are made about invariants-facts about the program's state that are supposed to remain
true throughout the execution of a chunk of code.
Assertions are particularly useful for describing two kinds of invariants: preconditions
an d postconditions: conditions that must hold true before and after, respectively, the
execution of a code segment. A simple example of preconditions and postconditions is
linear search:
/ * l search --- r e turn index in array of va lue, or -1 if not found * /
int l s earch(int *ar ray , size_t size, int va lue )
/ * precondit i on : array ! = NULL * /

/ * precondit ion : size> 0 * /
f or (i = 0 ; i < siz e ; i ++ )
if (array[iJ == value )
return i;
/ * postconditi on : i si ze * /
return - 1 ;
1 In his 1981 ACM Turing Award lecture, however, Dr. H oare states that Alan Turing himself promoted this idea.
428
12. 1 Asserrion Statements: assert ( ) 429
This example states the conditions using comments. But wouldn't it be better to be
able to test the conditions by usi ng code? This is the job of the assert () macro:
#includ e <assert . h> ISOC
void assert ( scalar expression ) ;
When the scalar expression is false , the assert () macro prints a diagnostic
message and exits the program (with the abort () function; see Section 12.4, "Com-
mitting Suicide: abort() ," page 445). c h12-assert. c provides the lsearch()
function again, this time with assertions and a main () function:
1 / * ch12-assert . c --- demonstrate assertions */
2
4 #include <assert . h>
5
6 / * lsearch --- return index in array of value, or -1 if n o t found * /
1
8 int lsearch ( int *array, size t size, int value)
9
10 size_t i;
11
12 assert (array != NULL ) ;
13 assert (size> 0) ;
14 f o r ( i = 0 ; i < siz e ; i++ )
15 if (array ( i j == va l ue )
16 return i;
11
18 assert (i == s ize ) ;
19
20 return -1;
21
22
23 / * main --- test out asserti o ns * /
24
25 int main (void )
26 {
21 #define NELEMS 4
28 static int array (NELEMSj ( 1, 17, 42 , 91 ) ;
29 int index;
30
31 index = lsearch (array, NELEMS, 21);
32 assert ( index == -1 ) ;
33
34 index = lsearch (array, NELEMS, 17 ) ;
35 assert ( index == 1 ) ;
36
37 index = lsearch (NULL, NELEMS, 10 ) ; / * won't return * /

38
39 pri ntf ( " index = %d\n ", inde x);
40
41 return 0 ;
42
When compiled and run , the assertion on line 12 "fires:"

$ eh12-assert Run the program
ch12-assert : ch12- as sert.c:12 : lsearch: Assertion 'array != ((void *)0)' fail ed.
Aborted (core dumped )
The message from as s er t () varies from system to system. For GLIBC on

GNU/Linux, the message includes the program name, the source code filename and
line number, the function name, and then the text of the failed assertion. (In this case,
the symbolic constant NULL shows up as its macro expansion , '( (vo i d *) 0) ' .)
The 'Aborted (c ore dumped)' message means that ch12 -a ssert created a core
fil e; that is, a snapshot of the process's address space righ t before it died. 2 This fil e can
be used later, with a debugger; see Section 15.3, "GDB Basics," page 570. Core file
creation is a purposeful side effect of ass ert ( ); the assumption is that so mething
went drastically wrong, and yo u 'll want to examine the process with a debugger to
determine what.
You can disable assertions by compiling your program with the command-line option
'-DNDEBUG' . When this macro is defined before <ass ert . h> is included, the as ser t ()
macro expands into code that does nothing. For example:
$ gee -DNDEBUG=l eb12-assert. C -0 eb12-assert Compile with -DNDEBUG
$ eb12-assert Run it
Segmenta tion fault (co re dumped) What happened?
Here, we got a real core dump! We know that assertions were disabled ; there's no
"failed assertion" m essage. So what happened? Consider line 15 of I sear ch ( ) , when
called from line 37 of main ( ) . In this case, the array variable is NULL. Accessing
m emory through a NULL pointer is an erro r. (Technically, the various standards leave
as "undefined" what happens when you dereference a NULL pointer. Most modern sys-
tems do what GNU/Linux does; they kill the process by sending it a SIGS EGV signal;
this in turn produces a core dump . T his process is described in Chapter 10, "Signals,"
page 347.)
2 As mentioned in Section 10. 2, "Signal Actions," page 348, some GNU/Linux di striburions disable creation of
core fi les. To reenabl e them, put the lin e 'ulimit - s -c unlimi ted' into your - / . profile fil e.
12.1 Asserrion Statements: asse r t ( ) 431
This case raises an important point about assertions. Frequently, programmers mis-
takenly use assertions instead ofruntime error checking. In our case, the test for ' array
! = NULL' should be a runtime check:
if ( a rr ay== NULL )
return -1 ;
The test for 'si ze> 0' (line 13) is less problematic; if s iz e is 0 or less than 0 , the
loop never executes and l searc h () (correctly) returns -1. (In truth, this assertion isn't
needed because the code correctly handles the case in which 'siz e < = 0 '.)
The logic behind turning off assertions is that the extra checking can slow program
performance and that therefore they should be disabled for the production version of
a program. CA.R. Hoare 3 made this observation, however:
Finally, it is absurd to make elaborate security checks o n debugging runs,

when no trust is put in the results, and then remove them in production
runs , when an erroneous result could be expensive or disastrous. What would
we think of a sailing enthusiast who wears his lifejacket when training on
dry land, bur takes it off as soon as he goes ro sea?
Given these sentiments, our recommendation is to use assertions thoughtfully: First,

for any given assertion, consider whether it should instead be a runtime check. Second,
place your assertions carefully so that you won't mind leaving assertion checking enabled,
even in the production version of your program.
Finally, we'll note the following, from the "BUGS " section of the GNU/Linux
assert(3) manpage:
asse rt () is implemented as a macro; if the expression tested has side effects,

program behavior will be different depending on whether NDEBUG is defined.
This may create Heisenbugs which go away when debugging is turned on.
Heisenberg's famous Uncertainty Principle from physics indicates that the more
precisely you can determine a particle's velocity, the less precisely you can determine
its position, and vice versa. In layman's terms, it states that the mere act of observing
the particle affects it.
3 Hints On Programming Language Design, c.A. R. Hoare. Stanford University Computer Science Technical Report
CS-73-403 (ftp : II repo r ts . stanford . edu / pub / cstr I reports l cs l t r 173 / 403 / CS -TR - 7 3- 403 . pdf ),
December, 1973 .
432 Chapter 12 • Gen eral Library Interfaces - Parr 2
A similar phenomenon occurs in programming, not related to particle physics: The

act of compiling a program for d ebugging, or running it with debugging enabled can
change the program's behavior. In particular, the original bug can disappear. Such a
bug is known colloquially as a heisenbug.
The manpage is warning us against putting expressio n s with side effects into
as sert () calls:
ass ert( *p++ == '\ n ' );
The side-effect here is that the p pointer is incremented as part of the test. When
ND E BUG is defined, the expression argument disappears from the source code; it's never
executed. This can lead to an unexpected failure. However, as soon as assertions are
reenabled in preparation for debugging, things start working again! Such problems are
painful to track down.
12.2 Low-Level Memory: The rnernXXX () Functions

Several functions provide low-level services for working with arbitrary blocks of
memory. Their names all start with the prefix 'mem' :
#include <string . h> ISOC
void *memse t (void *buf, int val, size_t count);

vo i d *memcpy(vo i d *des t , const v oid *src , size_ t count ) ;
vo id *memmove( void *dest, const void *src, siz e_t count ) ;
void *memccpy(vo id *dest, cons t void *src, i nt val, si ze_ t count);
int me mcmp(const void *bufl, const void *buf2, size_t count ) ;
void *memchr(const void *buf , in t val, si ze_ t count);
12.2.1 Setting Memory: rnernset ( )

The memset () functi on copies the value v a l (trea ted as an unsigned char) into
the first count bytes of bu f. It is particularly useful for zeroing out blocks of
dynamic memory:
void *p = mallo c(count ) ;
i f ( P ! = NULL )
memset(p, 0, coun t ) ;
However, memset () can be used on any kind of memory, not just dynamic memory.
The return value is the first argument: buf .
12.2 Low-Level Memory: The memXXX () Functio ns 433
12.2.2 Copying Memory: rnerncpy ( ) , rnenunove ( ) , and rnernccpy ( )

T hree functions copy one block of memory to another. The first two differ in their
handling of overlapping memory areas; the third copies memory but stops upon seeing
a particular value.
void *memcpy(void *dest, const void * s rc, size_t count)
This is the simplest function. It copies co unt bytes from s rc to dest . It does not
handle overlapping memory areas. It returns dest .
vo id *me mmove(vo i d *dest, c onst vo id * src, size_ t count)
Similar to memc py (), it also copies coun t bytes from src to dest . However, it
does handle overlapping memory areas. It returns dest .
v o i d *memccpy(vo i d *dest, c onst vo id *sr c , i nt v al, siz e_t c o unt)
This copies bytes from src to dest stopping either after copying v al into des t
or after co pying co unt bytes. If it found v a l , it returns a poincer to the position
in des t just beyond where val was placed. Otherwise, it returns NULL .
Now, what's the issue with overlapping memory? Consider Figure 12.1.
struct xyz { ... } d ata [8];

memcpy { & data[3] , data , sizeof(data[O] ) * 4) ;
vs.
memmove( & data[3], dat a , sizeof(data[O] ) * 4) ;
o 1 2 3 4 5 6 7 Index
struct struct st r uct struc t s truct st ruct struct struct

~z ~z ~z ~z ~z ~z x~ ~z
Source ~I
Destination
FIGURE 12.1
Overlapping copies
The goal is to copy the four instances of s tru c t xyz in da ta [0] through da t a [3]
inco data [3 ] through da ta [6] . da ta [3] is the problem here; a byte-by-byte copy
moving forward in memory from data [0] will clobber d a ta [3] before it can be safely
copied into d ata (6] ! (It's also possible to come up with a scenario where a backwards
copy through memory destroys overlapping data.)
434 Chapter 12 • General Library Inrerfaces - Pan 2
The memcpy () function was the original System V API for copying blocks of mem-
ory; its behavior for overlapping blocks of memory wasn't particularly defined one way
or the other. For the 1989 C standard, the committee felt that this lack of defined be-
havior was a problem; thus they invented memmove ( ) . For historical compatibility,
memcpy () was left alone, with th e behavior for overlapping memory specifically stated
as undefined, and memmove () was invented to provide a routine that would correctly
deal with problem cases.
Which one should you use in your own code? For a library function that has no
knowledge of the memory areas being passed into it, you should use memmove ( ) . That
way, you're guaranteed that there won't be any problems with overlapping areas.
For application-level code that "knows" that two areas don't overlap, it's safe to
use memcpy ( ) .
For both memcpy () and memmove () (as for strcpy ()) , the destination buffer is the
first argument and the source is the second one. To remember this, note that the order
is the same as for an assignment statement:
dest = src;
(Many systems have manpages that don't help , providing the proto type as 'voi d
*memcpy(void *bufl, void *buf2 , size_t n)'andrelyingontheproseto explain
which is which. Fortunately, the GNU/Linux manpage uses better names.)
12.2.3 Comparing Memory Blocks: merncrnp ( )

The memcmp () function compares count bytes from two arbitrary buffers of data.
Its return value is like strcmp ( ) : negative, zero, or positive if the first buffer is less
than, equal to, or greater than the second one.
You may be wondering "Why not use s trcmp () for such comparisons?" The differ-
ence between the two functions is that memcmp () doesn 't care about zero bytes (the
, \ 0 ' string terminator). Thus, memcmp () is the function to use when yo u need to
compare arbitrary binary data.
12.2 Low-Level Memory: The memXXX () Funccions 435
Another advantage to mememp () is that it's faster than the typical C implementation:
/ * memcmp --- e xamp le C implementation, NOT f or real use * /
int memcmp( co nst void *bu fl , canst void *buf2 , size_t count )
canst unsign ed cha r *cpl (canst unsigned char *) bufl;

canst unsigned char *cp2 (canst unsigned char * ) buf2;
int diff;
while (count- - ! = 0) {
diff = *cpl++ - *cp2++ ;
if (diff ,= 0)
return diff;
return 0 ;
The speed can be due to special " block memory compare" instructions that many archi-
tectures suppOrt or to comparisons in units larger than byres. (This latter ope ration is
tricky and is best left to the libraty's author.)
For these reasons, yo u should aLways use your library's version of mememp () instead
of rolling yo ur own . C hances are excellent that the library auth or knows the machine
better than yo u do .
12.2.4 Searching for a Byte Value: memchr ( )

The memehr () fun ction is similar to the s t r ehr () function: It returns the location
of a particular value within an arbitrary buffer. As for mememp () VS. s tremp ( ) , the
principal reason to use memehr ( ) is that you have arbitrary binary data.
GNU we uses memehr () when counting only lines and bytes,4 and this allows we to
be qui te fast. From we . e in the GNU Coreutils:
4 See we(l) . wc co ums lines, words, and characrers.

257 else if (!count_chars && !count_ complicated)

258
259 / * Use a separate loop when counting only lines or lines and byt es
260 but not chars or words. */
261 whil e ((bytes_r ead = safe_read (fd , buf , BUFFER_ SIZ E) ) > 0)
262 {
263 reg ister char *p = buf;
264
265 if (bytes_read == SAFE_READ_ERROR )
266 {
267 er ror (0, errno , "%s", file);
268 exit_status = 1 ;
269 break ;
270
271
272 while (( p memchr (p, '\n', (buf + bytes_read) - p)) )
273
274 ++p;
275 ++lines;
276
277 byte s += bytes _read;
278
279
The outer loop (lines 261-278) reads data blocks from the input file. The inner loop
(lines 272-276) uses mernchr () to find and count newline characters. The complicated
expression ' (buf + bytes_read ) - p' resolves to the number of bytes left between
the current value of p and the end of the buffer.
The comment on lines 259-260 needs so me explanation. Briefly, modern systems
can use characters that occupy more than one byte in memory and on disk. (This is
discussed in a little more detail in Section 13.4, "Can You Spell That for Me, Please? ",
page 521.) Thus, we has to use different code if it's distinguishing characters from bytes:
This code deals with the co unting-bytes case.
12.3 Temporary Files

A temporary file is exactly what it sounds like: A file that while a program runs , holds
data that isn't needed once the program exits. An excellent example is the sort program .
s ort reads standard inp ut if no files are named on the command line or if you use ' -'
as the filename. Yet sor t has to read all of its input data before it can output the sorted
results. (Think about this a bit and you'll see that it's true,) While standard input is
being read, the data must be stored somewhere until s ort can sort it; this is the perfect
use for a temporary file. s ort also uses temporary files for storing intermediate
sorted results.
12.3 Temporary Files 437
Amazingly, there are jive different functio ns for creati ng temp orary files. Three of
them wo rk by creating strings representing (s upposedly) unique filenames. As we'll see,
these should generally be avoided. The other two wo rk by creating and opening the
temporary file; these functions are preferred.
12.3.1 Generating Temporary Filenames (Bad)

There are three functions whose purpose is to create the name of a unique, nonexistent
file. Once you have such a filename, you can use it to create a temporary file. Since the
name is unique, yo u're "guaranteed" exclusive use of the file. H ere are the function
declarations:
char *tmpnam(char *s ) ; ISOC

char *tempnam(const char *dir, const char *pfx ) ; XSI
cha r *mkt emp(char *t emplate); ISOC
The functions all provide different variations of the same theme: They fill in or create
a buffer with the path of a unique temporary filename. The file is unique in that the
created name doesn't exist as of the time the function s create the name and return it.
The functions work as follows:
cha r *tmpnam(char *s)

Generates a unique filename. If s is not NULL, it sho uld be at least L_tmpnam bytes
in size and the unique name is copied into it. If s is NULL, the name is generated
in an internal static buffer that can be overwritten o n subsequent calls. The di-
rectory prefix of the path will be P _tmpd ir. Both P _tmpdir and L_tmpnam are
defined in <s td i o. h >.
cha r *tempnam (const char *dir, cons t char *pfx)
Like tmpnam(), lets you specify the directory prefix. If dir is NULL, P_ tmpdir is
used. The pfx argument, if not NULL, specifies up to five characters to use as the
leading characters of the filename.
tempnam () allocates storage for the filenames it generates. The returned pointer
can be used later with free () (and should be if you wish to avoid memory leaks).
c har *mktemp(char *templa te )
Generates unique filenames based on a template. The last six ch aracters of
templa te must be 'xxxxxx'; these ch aracters are replaced wi th a unique suffix.
438 Chapter 12 • Gen eral Library Interfaces - Parr 2
I NOTE The templat e argument to mktemp () is overwritten in place. Thus, it

. should not be a string constant. Many pre-Standard C compilers put string
iii constants into the data segment, along with regular global variables. Although
[~ defined as constants in the source code, they were writable; thus: code like the
$ following was not uncommon :
/ * Old-s tyle code : don't do thi s. * /
char *tf ile " mktemp( " /tmp/myprogXXXXXX") ;
I
... use tfile ...
On modern systems, such code will likely fail; string constants nowadays
find themselves in read-only seg ments of memory.
Using these functions is quite straightforward. The file ch12 -mkt e mp . c demonstrates
mkt emp ( ) ; changes to use the other functions are not difficult:
1 / * ch12-mktemp . c --- demonst rate naive use of mktemp () .
2 Err or checki ng omitted for brevi ty * /
3
4 # include <stdi o . h>
5 # include <f cntl.h> / * for open flag s */
6 #in clude <limits.h> / * for PATH_MAX * /
7
8 int main( void )
9
10 stat ic char template[l "/tmp/myfileXXXXXX" ;
11 cha r fname[PATH_MAX 1;
12 static char mesg[l =
13 "He re's lookin' at y ou , kid! \ n": / * beats "hello, worl d" * /
14 int f d;
15
16 strcpy(fname, template):
17 mktemp ( fnam e) ;
18
19 / * RACE CONDITION WI NDOW OP ENS * /
20
21 print f (" Filename is %s\n", fname ) :
22
23 /* RACE CONDITION WINDOW LASTS TO HERE */
24
25 fd = open(fname, O_CREATlo_ RDWRlo_TRUNC, 0600);
26 write (fd, me s g, str l en (mes g )) ;
27 close (fd) :
28
29 /* unlink( fname ) ; * /
30
31 r eturn 0 :
32
The templ a te variable (line 10) defines the filename template; the 'xxxxxx' will be
replaced with a unique value. Line 16 copies the template into fname , which isn' t
const : It can be modified. Line 18 calls mktemp () to generate the filename, and line
21 prints it so we can see what it is. (We explain the comments on lines 19 and
23 shortly. )
Line 25 opens the file , creating it if necessary. Line 26 writes the message in mesg,
and line 27 closes the file. In a program in which the file should be removed when we're
done with it, line 29 would not be commented out. (Sometimes, a temporary file should
not be unlinked; for example, if the file will be renamed once it's completely written. )
We've commented it our so that we can run this program and look at the file afterwards.
Here's what happens when the program runs:
$ ch12-mktemp Run the program
File name is / tmp / myfileQES4WA Filename printed
S cat /tmp/rnyfileQES4WA
Here' s looki n' at you, kid' Contents are what we expect
$ Is -1 /tmp/rnyfileQES4WA So are owner and permissions
-rw------ - 1 arnold devel 2 8 Sep 18 09 : 27 /tmp/myfileQES4WA
$ rrn /tmp/myfileQES4WA Remove it
$ ch12-mktemp Is same filename reused?
Filename is / tmp / myfileic7xCy No. That's good
S cat /tmp/rnyfileic7xCy Check contents again
Here' s lookin' at y ou , kid'
$ Is -1 /trnp/rnyfileic7xCy Check owner and permissions again
-rw------ - 1 arnold devel 28 Sep 18 09 : 28 /tmp /myfileic7 x Cy
Everything seems to be working fine. mktemp () gives back a unique name,

ch12-mktemp creates the file with the right permissions, and the contents are what's
expected. So what's the problem with all of these functions?
Historically, mktemp () used a simple, predictable algorithm to generate the replace-
ment characters for the 'xxxxxx' in the template. Furthermore, the interval between
the time the filename is generated and the time the file itself is created creates a
race condition.
How? Well, Linux and Unix systems use time slicing, a technique that shares the
processor among all the executing processes. This means that, although a program appears
to be running all the time, in actuality, there are times when processes are sleeping, that
is, waiting to run on the processor.
Now, consider a professor's program for tracking student grades. Both the professor
and a malicious student are using a heavily loaded, multiuser system at the same time.
The professor's program uses mktemp () to create temporary files , and the student,
having in the past watched the grading program create and remove temp orary files, has
figured out the algorithm that mkternp () uses. (The CUBe version doesn't have this
problem, but not all systems use GLIBC!) Figure 12.2 illustrates the race condition
and how the student takes advantage of it.
2 3 4
f- Time_ .. .. ..
Grader: Imktemp ()
gives name
lOS stops
program
I Opens file
writes data
Student: ICreates file

with the same name
I Saves copy
of grades
and extra link
FIGURE 12.2
Race condition with mkternp ()
Here's what happened.
1. The grading program uses rnkt ernp () to generate a filename. Upon return
from mkt e mp ( ) , the race condition window is now open (line 19 in
ch1 2-mktemp . c).
2. The kernel stops the grader so that other programs on the system can run. This
happens before the call to open ( ) .
While the grader is stopped, the student creates the file with the same name
mkternp () returned to the grader program. (Remember, the algorithm was easy
to figure out.) The student creates the file with an extra link to it, so that when
the grading program unlinks the file , it will still be available for perusal.
3. The grader program now opens the file and writes data to it. The student cre-
ated the file with -rw-rw-rw- permissions, so this isn't a problem.
4. When the grader program is finished, it unlinks the temporary file. However,
the student still has a copy. For example, there may be a profit opportunity to
sell classmates their grades in advance.
Our example is simplistic; besides just stealing the grade data, a clever (if immoral)
student might be able to change the data in place. If the professor doesn ' t double-check
his program's results, no one would be the wiser.
I~ any ofthis .
>
NOTE We do not recommend doin g a ny of this! If you are a student, don 't try
First and fore most, it is un ethical. Seco nd , it's likely to get you kicked
L out of school. Third , your professors are probably not so naive as to have used
\~ mktemp () to code their programs. Th e examp le is for illustration only!
~
Fo r the reasons given, and others, all three functions described in this section should
never be used. T hey exist in POSIX and in G LIBC o nly to support o ld programs that
were written before the dangers of these routines were understood. To this end,
GNU/ Linux systems generate a warning at link time:
$ cc ch12 -mktemp. C -0 ch12 -mktemp Compile the program
/ tmp/cclXCvD 9 . o( . tex t+Ox35) : In fun ction ' mai n ' :
: the use of 'mk t emp' is dangerous , bette r use . mkstemp ,
(We cover mkst emp () in the next subsection.)

Sho uld yo ur system not have mks temp ( ) , think about how you might use these in-
terfaces to si mulate it. (See also "Exercises" for Chapter 12, page 482, at the end of
the chapter.)
12.3.2 Creating and Opening Temporary Files (Good)

There are two functions that don ' t have race condition problems. One is intended
for use with the <stdio. h> library:
#include <stdio . h> ISOC
FILE *tmpfil e (void ) ;
The second functi on is for use with the file-d escriptor-based sys tem calls:
#i nclude <stdlib . h> XSI
int mkstemp ( char *template ) ;
tmp f ile () returns a FILE * value representing a unique, open temporary file. The
file is opened in "w+b" mode. The w+ means "open for reading and writing, truncate
the file first," and the b means binary mode, not text mode. (There's no difference on
a GNU/Linux or Unix system, but there is o n other system s.) The file is automatically
deleted when the file pointer is closed; there is no way to get to the file's name to save
its contents. The program in c h12-tmpf i le . c demonstrates tmpf i le ():
442 Chapter 12 • G eneral Library Interfaces - Part 2
/ * ch12-tmpfil e . c demonstrate tmpfile () .

Err or checking omitted for brevity * /
#in clude <st dio.h>
int main(void)
stat ic char mesg[] =

"He re 's loo kin ' at you, kid!"; / * beat s "hell o, wo rld" * /
FILE *fp;
cha r b u f[BUF SIZ] ;
fp = tmpfile () ; / * Get temp file * /

fprint f ( fp, "%s", mesg ) ; / * Wri te to it * /
fflush ( fp ) ; / * Forc e it out * /
rewind(fp ) ; /* Mov e to front * /

fgets (buf, sizeof buf, fp); / * Read contents * /
prin tf ( "Got back <%s> \ n", buf ) ; / * Prin t ret rieved data * /
fc lose (fp ) ; / * Close file , goes away * /

return 0; / * All done * /
The returned F ILE * value is no different from any other FILE * returned by
fope n () . When run , the res ults are what 's expected:
$ ch12-tmpfile
Got ba c k <Here 's looki n ' at you, kid! >
We saw earlier that the CLIBe authors recommend the use of mkstemp ( ) fun ction:
$ cc ch12-mktemp.c -0 ch12-mktemp Compile the program
/ tmp/cclXCvD9 . o( . text+Ox35 ) : In function ' main ' :
: t h e use o f ' mktemp' is dangerous, b e tter use 'mkstemp'
This function is similar to mk t emp ( ) in that it takes a filename ending in 'xxxxxx'

and replaces th ose characters with a unique suffix to create a unique filename. H owever,
it goes one step further. It creates and opens the file. The file is created with mode 0600
(that is, -rw--- - ---). Thus, only the user running the program can access the file.
Furthermore, and this is what makes mks t emp () more secure, the file is created using
the O_EXCL Bag, which guarantees that the file doesn't exist and keeps anyo ne else from
opening the file.
The return value is an open file descriptor that can be used for reading and writing.
The pathname now stored in the buffer passed to mks temp ( ) should be used to remove
the file when you 're done. This is all demonstrated in ch12 - mkstemp . c, which is a
straightforward modification of ch12 -tmpf ile . c :
/* ch12-mkstemp . c --- demonstrate mkstemp() .

Error checking omitted for brevity * /
#include <stdio.h>
#include <fcntl . h> /* for open flags * /
#include <limits . h> / * for PATH_MAX * /
int main(void)
static char template[] "/tmp/myfileXXXXXX" ;

char fname[PATH_MAX] ;
static char mesg[] =
"Here's lookin' at you, kid ! \n" ; / * beats "hello, world" */
int fd ;
char buf[BUFSIZ] ;
int n;
strcpy(fname, template); / * Copy template * /

fd = mkstemp(fname); / * Create and open temp file */
printf("Filename is %s\n", fname) ; /* Print it for information * /
write(fd, mesg, strlen(mesg)) ; / * Write something to file */
lseek(fd, OL, SEEK_SET); /* Rewind to front * /

n = read(fd, bUf, sizeof(buf)); /* Read data back ; NOT '\0' terminated ! */
printf("Got back : %. *s", n , buf); /* Print it out for verification */
close (fd) ; /* Close file * /

unlink(fname ) ; /* Remove it */
return 0 ;
When run , the results are as expected:

$ c h1 2-mkst emp
Filename is /tmp/myfileuXFW1N
Got back : Here's lookin' at you, kid!
12.3.3 Using the TMPDIR Environment Variable

Many standard utilities pay attention to the TMPDIR environment variable, using the
directory it names as the place in which to put their temporary files. If TMPDIR isn't
set, then the default directory for temporary files is usually I tmp, although most modern
systems have a I var I tmp directory as well. I tmp is usually cleared of all files and direc-
tories by administrative shell scripts at system startup.
Many GNU/Linux systems provide a directory I dev / shm that uses the tmpfs
filesystem rype:
$ df
Filesys tem 1K-blocks Us ed Avail able Use % Mount e d on
Ide v/hda2 61 98 436 51360 20 74 7 5 44 88%
I dev /hda5 61431 520 27720248 30 59 0648 48 % Id
non e 256616 0 256 6 1 6 0% Id ev / s hm
The tmpfs filesystem type provides a RAM disk: a portion of memory used as if it
were a disk. Furthermore, the tmpf s filesystem uses the Linux kernel's virtual memory
mechanisms to allow it to grow beyond a fixed size limit. If you have lots of RAM in
your system, this approach can provide a noticeable speedup. To test performance we
started with the / u s r / s hare / di ct / 1 inux. wo rds file, which is a sorted list of correctly
spelled words, one per line. We then randomized this file so that it was no longer sorted
and created a larger file containing 500 copies of the random version of the file:
$ ls -1 Itmp/randwords.big Show size
-rw-r --r - - 1 a rno ld devel 20 4 65250 0 Sep 18 16 : 02 I tmp / ra ndwords . b ig
$ we -1 Itmp/randwords.big How many words?
22713 5 00 I tmp / randwo rds . big Over 22 million!
We then sorted the file , first using the default / tmp directory, and then with TMPDIR
set to / dey / shm:)
$ time sort /tmp/randwords.big > Idev/nu11 Use real files
r eal 1m32 .566s

user 1m23 . 137s
sys Om1. 74 0s
$ time TMPDIR=/dev/shrn sort /tmp/randwords.big > /dev/nu11 Use RAM disk
real 1m28 .257s

user 1m18 . 469s
sys Om1 . 602s
Interestingly, using the RAM disk was only marginally faster than using regular files.
(On some further tests, it was actually slower!) We conjecture that the kernel's buffer
cache (see Section 4.6.2, "Creating Files with creat () ," page 109) comes into the
picture, quite effectively speeding up file I10.6
5 This use of I dev / shm is really an abuse of it; it's intended for use in implementir.g shared memory, nor for use
as a RAM disk. Nevertheless, it's helpful for illustrating our point.
6 Our system has 512 megabytes of RAM , which to old fogies like the author seems like a lot. However, RAM prices
have fall en, and systems with one or more gigabytes of RAM are nor uncommon , at least for software developers.
12.4 Commiuing Suicide: abort ( ) 445
T he RAM disk has a significant disadvantage: It is limited by the amount of swap

space configured on yo ur sys tem ? When we tried to sort a file containing l,000 copies
of the randomized words file, the RAM disk ran our of room, whereas the regular sort
finished successfully.
Using TMPDIR for yo ur own programs is straightforward. We offer the following
oudine:
co nst char t e mplate[] "myprog . XXXXXX" ;
char * tmpdi r, *tfile;
size_t count;
int fd;
if (( tmpdi r = geten v ( "TMPDIR") ) NULL) Use TMPDIR value if there

tmpdi r = " /tmp"; Otherwise, default to / tmp
coun t = st r len (tmpdi r ) + strl en(template) + 2; Compute size offilename

tfile = ( char * ) mallo c ( count ) ; Allocate space for it
if (tf ile == NULL ) Check for error
1* r ecover * /
sprintf (tfile , "% 5/%5 ", tmpdir, templat e) ; Create final template
fd = mkstemp(tfil e) ; Create and open fi le
.. use tempfile via fd ..
close ( fd ) ; Clean up
unlink (t file ) ;
free(tfile) ;
Depending on your application's needs, you may wish to unlink the file immediately
after opening it instead of doing so as part of the cleanup.
12.4 Committing Suicide: abort ( )

There are times when a program just can't continue. Generally, the best thing to do
is to generate an error message and call exi t ( ) . However, particularly for erro rs that
are likely to be programming problems, it's helpful to not just exit but also to produce
a core dump, which saves the state of the running program in a file for later examination
with a debugger. This is the job of the abort () function:
#include <stdlib . h > ISOC
void abort(void);
7 Swap space cons isrs of one or m o re dedicared chu n ks of d isk, used to ho ld rhe pans of execuring processes rhar
are nor currenrly in memory.
The abort () function sends a SIGABRT signal to the process itself. This happens
even if SIGABRT is blocked or ignored. The normal action for SIGABRT, which is to
produce a core dump, then takes place.
An example of abart () in action is the assert () macro described at the beginning
of this chapter. When assert () finds that its expression is false, it prints an error
message and then calls abort () to dump core.
According to the C standard, it is implementation defined whether or not abort ( )
does any cleanup actions. Under GNU/Linux it does: All <stdio . h> FILE * streams
are closed before the program exits. Note, however, that nothing is done for open files
that use the file descriptor-based system calls. (Nothing needs to be done if all that are
open are files or pipes. Although we don't discuss it, file descriptors are also used for
network connections, and not closing them cleanly is poor practice.)
12.5 Nonlocal Gotos

"Go directly to jail. Do not pass GO. Do not collect $200 ."
-Monopoly-
You undoubtedly know what a gat o is: a transfer of control flow to a label somewhere
else within the current function. goto statements, when used sparingly, can contribute
to the readability and correctness of a function. (For example, when all error checks
use a gato to a label at the end of a function, such as cl ean_up, the code at that label
then cleans things up [closing files, etc. ] and returns. ) When used poorly, got o state-
ments can lead to so-called spaghetti code, the logic of which becomes impossible
to follow.
The goto statement is co nstrained by the C language to jump to a label within the
current function. Many languages in the Algol family, such as Pascal, allow a gata to
"jump out" of a nested function into an earlier calling function. However in C, there
is no way, within the syntax of the language itself, to jump to a location in a different
function , even a calling one. Such a jump is termed a nonlocal goto.
Why is a non local goto useful? Consider an interactive program that reads commands
and processes them. Suppose the user starts a long-running task, gets frustrated or
changes his mind about doing the task, and then presses CTRL-C to generate a SIGINT
signal. When the signal handler runs, it can jump back to the start of the main
12.5 Nonlocal GO[Qs 447
read-commands-and-process-them loop. The ed line editor provides a straightforward

example of this:
$ ed -p '> ' sayings Start ed, use '> ' as prompt
sayings: No such file or directory
> a Append text
Hello, world
Don't panic
'C Generate SIGINT
The "one size fits all" error message
> l,$p ed returns to command loop
Hello, world ' 1, $p ' prints all the lines
Don't panic
> w Save file
25
> q All done
Internally, ed sets up a return point before the command loop, and the signal handler
then does a nonlocal goto back to the return point.
12.5.1 Using Standard Functions: setjmp () and longjmp ( )

Nonlocal gotos are accomplished with the setjmp () and longjrnp () functions.
These functions come in two flavors. The traditional routines are defined by the ISO
C standard:
#include <setjmp . h> ISO C
int setjmp(jmp_buf env) ;

void longjmp(jmp_buf env, int val);
The jmp_ buf type is typedef' d in <setjrnp . h> . setjrnp ( ) saves the current "envi-
ronment" in env. env is typically a global or file-level static variable so that it can
be used from a called function. This environment includes whatever information is
necessary for jumping to the location at which setjmp () is called. The contents of a
jmp_ buf are by nature machine dependent; thus , jmp_buf is an opaque type: something
you use without knowing what's inside it.
setjmp () returns 0 when it is called to save the current environment in a jmp_buf .
It returns nonzero when a nonlocal jump is made using the environment:
jmp_buf command_ loop ; At global level
... then in main () ...
if (setjmp(command_loop) 0) State saved OK, proceed on
else We get here via nonlocal goto

printf (" ?\n"); ed's famous message
... now start command loop
l on gj mp () makes the jump. The first parameter is a j mp_bu f that must have been
initialized by setjmp ( ) . The second is an integer nonzero val ue that set jmp () returns
in the original environment. This is so that code such as that just shown can distinguish
between setting the environment and arriving by way of a nonlocal jump.
The C standard states that even if long jmp ( ) is called with a second argument of
0, setjmp () still returns nonzero. In such a case, it instead returns 1.
The ability to pass an integer value and have that come back from the return of
s etjmp ( ) is useful ; it lets user-level code distinguish the reason for the jump. For in-
stance, gawk uses this capability to handle the break and c ont inue sta tements inside
loops. (The awk language is deliberately similar to C in its syntax for loops, with whi le,
do-wh ile, and f or loops, and break and conti nue.) The use of setjmp () looks like
this (from eval . c in the gawk 3.1.3 distribution):
501 c ase Node_K_whil e:
508 PUSH_BINDING ( l oop _t ag_stac k , loop_ tag , l oop_ tag_v a lid ) ;
509
5 10 s tab le_tre e = tre e ;
511 whi le ( eva l _ c ondition( stable_ tree->l n ode))
512 INC REMENT (st able_tr ee -> e x ec_ cou n t ) ;
5 13 switch ( se tjmp (l oop _ tag ) ) {
514 case 0 : / * normal non-j u mp * /
515 (vo i d ) i nte rpr et ( s tab le_tr ee-> rno d e) ;
5 16 br eak;
5 11 ca s e TAG_ CONTINUE : / * continu e st ateme nt * /
518 b reak;
519 case TAG_ BREAK : / * brea k sta t ement * /
520 RESTORE_BIND ING ( loop_tag_ stack, loop_ tag, loop_tag_vali d ) ;
52 1 r et urn 1 ;
522 d e fault :
523 can t _ happ e n() ;
524
525
526 RESTORE_BINDI NG (loop_tag_ s ta c k, lo op _ t a g, loop_ta g_va lid ) ;
521 brea k ;
This code fragment represents a wh i le loop. Line 508 manages nested loops by
means of a stack of saved jmp_bu f variables. Lines 511-524 run the wh il e loop (using
a C whi 1 e loop!). Line 511 tests the loop's condi tion. If it's true, line 513 d oes a s wi t c h
°
on the s et j mp ( ) return value. If it's (lines 514-516), then line 515 runs the state-
ment's body. However, when s e tjmp () returns either TAG_BREAK or TAG_ CONTINUE ,
12.5 Nonlocal GO[Qs 449
the swi t ch statement handles them appropriately (lines 517-518 and 519-521 ,
respectively) .
An aWk-level break statement passes TAG_BREAK to longjrnp ( ) , and the awk-Ievel
cont inue passes TAG_ CONTINUE . Again, from eval. c , with some irrelevant details
omitted:
657 case Node_K_break :
658 INCREMENT(tree->exec_count) ;
675 longjrnp(loop_tag , TAG_ BREAK);

676 break;
677
678 case Node_K_continue :
679 INCREMENT ( tree->ex ec_count ) ;
696 longjrnp(loop_t ag, TAG_CONTINUE );

697 break;
You can think of setjrnp () as placing the label, and longjrnp () as the goto,
with the extra advantage of being able to tell where the code "came from" (by the
return value).
12.5.2 Handling Signal Masks: sigsetjmp () and siglongjmp ()

For historical reasons that would most likely bore you to tears, the 1999 C standard
is silent about the effect of setjrnp () and longjrnp ( ) on the state of a process's signals,
and POSIX states explicitly that their effect on the process signal mask (see Section 10.6,
"POSIX Signals," page 367) is undefined.
In other words, if a program changes its process signal mask between the first call to
setjmp () and a call to longjmp () , what is the state of the process signal mask after
the longjmp ( ) ? Is it the mask in effect when setjmp () was first called? Or is it the
cutrent mask? POSIX says explicitly "there's no way to know. "
To make handling of the process signal mask explicit, POSIX introduced two addi-
tional functions and one typedef:
#include <setjrnp . h> POSIX
int sigs etjrnp(sigjrnp_buf env, int savesigs) ;

void siglongjrnp(sigjrnp_buf env , int val) ;
The main difference is the savesigs argument to sig setjrnp (). If nonzero, then
the current set of blocked signals is saved in env, along with the rest of the environment
that would be saved by setjrnp ( ) . A siglongj rnp () with an env where savesi gs
was true restores the saved process signal mask.
I NOTE POSIX is also clear that if savesigs is zero (false ), it's undefined whether
I
I
th~ process ~ignal mask i.s saved an~ r:stored , ju~t like setjrnp () /longj rnp ( ) .
t This In turn Implies that If you're gOing to use 'slgsetjrnp (env, 0)' you may
as well not bother: The whole point is to have control over saving and restoring
the process signal mask!
12.5.3 Observing Important Caveats

There are several technical caveats to be aware of:
First, because the environment saving and restoring can be messy, machine-dependent
tasks, setjrnp () and longjrnp () are allowed to be macros.
Second, the C standard limits the use of setjrnp () to the following situations:
• As the sole controlling expression of a loop or conditio n statement (if, switch) .

• As one operand of a comparison expression (= =, <, etc.), with the other operand
as an integer constant. The comparison expression can be the sole controlling ex-
pression of a loop or conditio n statement.
• As the operand of the unary ! operator, with the resulting expression being the
sole controlling expression of a loop or condition statement.
• As the entire expression of an expression statement, possibly cast to void. For
example:
(void ) setjmp ( buf ) ;
Third, if you wish to change a local variable in the function that calls setjrnp ( ),
after the call, and you want that variable to maintain its most recently assigned value
after a long jrnp (), you must declare the variable to be volatile . Otherwise, any
non-volatile local variables changed after setjrnp () was initially called have indeter-
minate values. (Note that the jrnp_buf variable itself need not be declared vola til e.)
For example:
12.5 Nonlocal Gotos 451
1 / * ch12-setjmp . c - -- demonstrate setjmp()/longjmp() and volatile . * /

2
4 #include <setjmp . h>
5
6 j mp_buf env;
7
8 / * comeback --- do a longjmp * /
9
10 void comeback(void)
11
12 longjmp(env , 1) ;
13 print f ( " This line is neve r print ed \n" ) ;
14
15
16 / * ma in --- call se tjmp, fidd le with var s, pr int value s */
17
18 i nt ma in(void)
19
20 int i = 5;
21 volatil e int j = 6;
22
23 if (se t jmp(env ) == 0) / * f i rs t ti me * /
24 i+ +;
25 j + +;
26 p ri ntf ( "first time : i = %d , j = %d\n ", i, j);
27 comeback ( ) ;
28 else / * second time */
29 p ri ntf( " second time : i = %d , j = %d\n ", i , j) ;
30
31 return 0 ;
32 }
In this example, only j (line 2 1) is guaranteed to maintain its value fo r the second
call to p r intf () . The value of i (line 20), according to the 1999 C standard, is inde-
terminate. It may be 6, it may be 5 , or it may even be something else!
Fo urth, as described in Sectio n 12. 5.2, "Handling Signal M asks: sigse tjmp () and
sig l ongj mp ( ) ," p age 449, the 1999 C standard makes no statement about the effect,
if any, of set jmp () and long jmp () on the state of the program 's signals. If that's im-
portant, you have to use s i g s etjmp () and siglongj mp () instead.
Fifth, these ro utines provide amazing potential for memo ry leaks! Consider a program
in which main () calls se tj mp () and then calls several nes ted functions, each of which
allocates dynamic memory with ma lloe ( ) . If the most deeply nested function does a
longjmp ( ) back into mai n ( ) , the pointers to the dynamic memory are lost. Consider
eh12 - memlea k . c :
1 /* ch12-memleak . c -- - demonstrate memory leaks wi th setjmp()/longjmp(). */

2
3 #include <stdi o .h>
4 #include <mall oc.h> /* for def initi on of ptrdiff _t on GLIBC */
5 #include <setjmp.h>
6 #include <unistd.h>
7
8 jmp_buf env;
9
10 void f1 (void ) , f2 (voi d) ;
11
12 /* main --- leak memo ry with setj mp( ) and longjmp() */
13
14 int main(void)
15
16 char *start_break;
17 char *curr ent_break;
18 p trdiff_t diff;
19
20 sta rt_break = sbrk ((pt rdiff_t ) 0);
21
22 if (setjmp( env ) == 0) /* first time */
23 printf( "se tjmp call ed\n");
24
25 current_break = sbrk((pt rdiff _t) 0);
26
27 diff = current_break - start_break;
28 printf ("memsize = %ld\ n", (long) diff);
29
30 f1 ();
31
32 return 0 ;
33
34
35 /* f1 - -- alloc at e some memo ry, make a nested call */
36
37 void fl(void )
38 {
39 char *p ma lloc (1024) ;
40
41 f2 () ;
42
43
44 / * f2 - -- a llocate some memory, make longjmp * /
45
46 void f2 (vo i d)
47 (
48 char *p = malloc(1024);
49
50 longjmp ( env, 1);
51
12.5 Nonlocal Goms 453
This program sets up an infinite loop, using s etjmp () and long j mp ( ) . Line 20
uses sbrk () (see Section 3.2.3, "System Calls: brk () and sbr k ( ) ," page 75) to find
the current start of the heap, and then line 22 calls setjmp ( ) . Line 25 gets the current
start of the heap; this location changes each time through since the code is entered re-
peatedly by l ongjmp ( ) . Lines 27-28 compute how much memory has been allocated
and print the amount. Here's what happens when it runs:
$ ch12 - memleak Run the program
setjmp called
memsize 0
memsize 6372
mems iz e 6372
mem s i ze 6372
memsize 10468
me msize 10 46 8
memsize 1 4564
memsize 14564
mem size 18660
memsize 18660
The program leaks memory like a sieve. It runs until interrupted from the keyboard
or until it runs out of memory (at which point it produces a massive core dump).
Functions f1 () and f2 () each allocate memory, and f 2 () does the longjmp ( )
back to main () (line 51). Once that happens, the local pointers (lines 41 and 49) to
the allocated memory are gone! Such memory leaks can be difficult to track down because
they are often for small amounts of memory, and as such, they can go unnoticed liter-
ally for years. 8
This code is clearly pathological, but it's intended to illustrate our point: se t jmp ()
and l ongjmp () can lead to hard-to-find memory leaks. Suppose that f1 () called
fr e e () correctly. It would then be far from obvious that the memory would never be
freed. In a larger, more realistic program, in which l ongjmp () might be called only
by an if, such a leak becomes even harder to find.
In the presence of set jmp () and longjmp ( ), dynamic memory must thus be
managed by global variables, and you must have code that detects entry with long jmp ( )
(by checking the setjmp () return value). Such code should then clean up any dynam-
ically allocated memory that's no longer needed.
8 W e had such a leak in gawk. Fo nun ately, it's fixed.

Sixth, longjmp ( ) and s ig longjmp () should not be used from any functions regis-
tered with atexit ( ) (see Section 9.1.5.3, "Exiting Functions," page 302).
Seventh, setj mp () and lon gj mp ( ) can be costly operations on machines with lots
of registers.
Given all of these issues, you should take a hard look at your program's design. If
you don't need to use s et jmp () and longjmp ( ), then you're probably better off not
doing so . However, if their use is the best way to structure your program, then go ahead
and use them, but do so carefully.
12.6 Pseudorandom Numbers

Many applications need sequences of random numbers. For example, game programs
that simulate rolling a die, dealing cards, or turning the wheels on a slot machine need
to be able to pick one of a set of possible values at random. (Consider the f o r t une
program, which has a large collection of pithy sayings; it prints a different one "at ran-
dom" each time it's called.) Many cryptographic algorithms also require "high quality"
random numbers. This section describes different ways to get sequences of
random numbers.
I NOTE The nature of randomness, generation of random numbers, and the

"quality" of random numbers are all broad topics, beyond the scope of this
, book. We provide an introduction to the available APls, but that's about all
I
we can do . See Section 12.9, "Suggested Reading," page 480, for other sources
of more detailed information.
Computers, by design, are deterministic. The same calculation, with the same inputs,
should produce the same outputs, every time. Thus, they are not good at generating
truly random numbers, that is , sequences of numbers in which each number in the se-
quence is completely independent of the number (or numbers) that came before it.
Instead, the kinds of numbers usually dealt with at the programmatic level are called
pseudorandom numbers. That is, within any given sequence, the numbers appear to be
independent of each other, but the sequence as a whole is repeatable. (This repeatability
can be an asset; it provides determinism for the program as a whole.)
Many methods of producing pseudorandom number sequences work by performing
the same calculation each time on a starting, or seed, value. The stored seed value is
12.6 Pseudorandom Numbers 455
then updated for use next time. APIs provide a way to specify a new seed. Each initial
seed produces the same sequence of pseudorandom numbers, although different seeds
(should) produce different sequences.
12.6.1 Standard C: rand () and srand ( )

Standard C defines two related functions for pseudorandom numbers:
#include <stdlib. h> ISO C
int rand (void ) ;

void srand (unsigned int seed) ;
returns a pseudorandom number between 0 and RAND_MAX (inclusive, as far

rand ()
as we can tell from the C99 standard) each time it's called. The constant RAND_MAX
must be at least 32,767; it can be larger.
s r and () seeds the random number generator with seed. If srand () is never called
by the application, rand () behaves as if the initial seed were 1.
The following program, ch12 - rand . c , uses rand () to print die faces.
1 / * ch12-rand . c --- generate die rolls, using rand( ) . * /
2
4 #include <stdlib . h>
5
6 char *die_faces[] = /* ASCII graphics rule! */
7
8 /* 1 */
9
10
11
12 / * 2 */
13
14
15
16 * * * / * 3 */
17
18
19
20 / * 4 */
21
22
23
24 /* 5 * /
25 * "
26
27
28 / * 6 */
29
30 };
31
32 / * main - -- print N different die faces * /
33
34 int main(int argc, char **argv }
35
36 int nfaces;
37 int i, j, k;
38
39 if (argc != 2)
40 fprintf (stderr, "usage : %s number-die-faces\n " , argv[O]);
41 exit ( l } ;
42
43
44 nfaces = atoi(argv[ l] } ;
45
46 if (nfaces <= O) {
47 fprintf (stde rr, "usage : %s number -die- faces \n", argv[O]);
48 fprintf(stde rr, " \tUse a positive number! \ n" } ;
49 exit(l} ;
50
51
52 for (i = 1; i <= nfaces; i++) {
53 j = rand(} % 6; / * f orce to range 0 <= j <= 5 * /
54 printf ( "+- ------ +\n" } ;
55 for ( k = 0; k < 3; k++ )
56 printf(" I% s l\ n", die_faces [(j * 3 } + k] } ;
57 printf ( "+ -------+ \ n \ n" } ;
58
59
60 return 0 ;
61
This program uses simple ASCII graphics to print out the semblance of a die face.
You call it with the number of die faces to print. This is computed on line 44 with
at o i (). (In general , atoi () should be avoided for production code since it does no
error or overflow checking nor does it do any input validation.)
The key line is line 53, which converts the rand () return val ue into a number be-
tween zero and five, using the remainder operator, %. The val ue 'j * 3' acts a starting
index into the die_face s array for the three strings that make up each die's face. Lines
54 and 57 print out surrounding top and bottom lines, and the loop on lines 55 and
56 prints the face itself. When run, it produces output like the following:
$ chl:2-rand :2 Print two dice

+-------+
I
I * * I
I I
+-------+
+-------+
I * * I
I I
I * * I
+-------+
The rand () interface dates back to V7 and the PDP-II. In particular, on many
systems the result is only a I6-bit number, which severely limits the range of numbers
that can be returned. Furthermore, the algorithm it used is considered "weak" by
modern standards. (The GLIBC version of rand () doesn't have these problems, but
portable code needs to be written with the awareness that rand () isn't the best API
to use.)
ch12 -rand. c uses a simple technique to obtain a value within a certain range: the
% operator. This technique uses the low bits of the returned value (just as in decimal
division, where the remainder of dividing by IO or 100 uses the lowest one or two
decimal digits). It turns our that the historical rand () generator did a better job of
ptoducing random values in the middle and higher-order bits than in the lower bits.
Thus, if yo u must use rand ( ) , try to avoid the lower bits. The GNU/Linux rand(3)
manpage cites Numerical Recipes in C,9 which recommends this technique:
j = l+(int) ( 10 . 0 *rand ()/( RAND_MAX+1.0)) ; 1* for a number between 1 and 1 0 * 1
12.6.2 POSIX Functions: random () and srandom ( )

4.3 BSD introduced random () and its partner functions. These functions use a
much better random number generator, which returns a 31-bit value. They are now
an XSI extension standardized by POSIX:
#include <stdlib . h> XSI
long randorn(void ) ;
void srandorn (unsigned int seed) ;
char *initstate (unsigned int seed, char *st ate, size_t n ) ;
char *setstate (char *state ) ;
9 NumericaLRecipes in C: The Art of Scientific Computing, 2nd ed ition, by Wi lliam H . Press, Brian P. Flannery,
Sa ul A. Teukolsky, and William T. Verrerling. Cambridge Un iversity Press, USA, 1993. ISB : 0-52 1-43108-5.
The first two functions correspond closely to rand () and srand () and can be used
similarly. However, instead of a single seed value that produces the sequence of pseudo-
random numbers, these functions use a seed value along wi th a state array: an array of
bytes that holds state information for calculating the pseudorandom numbers. The last
two functions let you manage the state array.
long random(void);
Returns a number between 0 and 2 31 - 1. (Although the GNU/Linux random(3)
man page says between 0 and RAND_MAX, this is only true for GLIBC systems where
RAND_MAX equals 2 31 - 1. On other systems, RAND_MAX might be smaller. POSIX
is explicit that the range is 0 to 2 31 - 1.)
vo i d srandam (unsigned int seed ) ;
Sets the seed. If srandam () is never called, the default seed is 1.
char *initstate(unsigned int seed, char *state, size_t n);
Initializes the state array with information for use in generating random numbers.
seed is the seed value to use, as for srandom ( ) , and n is the number of bytes in
the state array.
n should be one of the values 8, 32, 64, 128, or 256 . Larger values produce better
sequences of random numbers. Values less than 8 cause random () to use a simple
random number generator similar to that of rand () . Values larger than 8 that are
not equal to a value in the list are rounded down to the nearest appropriate value.
char *setstate(char *state);
Sets the internal state to the s ta te array, which must have been initialized by
ini tstate ( ) . This lets you switch back and forth between different states at will,
providing multiple random number generators.
If initstate () and setstate () are never called, random ( ) uses an internal state
array of size 128.
The state array is opaque; you initialize it with initstate () and pass it to the
random () function with sets tate ( ) , but you don ' t otherwise need to look inside it.
If you use initsta te() and sets tate(), you don't have to also call srandom(),
since the seed is included in the state information. ch12 - random . c uses these routines
instead of rand ( ) . It also uses a common technique, which is to seed the random
number generaror with the time of day, added to the PID.
1 /* chI2-random . c --- generate die rolls, using random() . */
2
4 #include <stdlib . h>
1
8 char *die_faces[] /* ASCII graphics rule' */
... as before.
32 };
33
34 /* main --- print N different die faces */
35
36 int main(int argc, char **argv)
31
38 int nfaces;
39 int i, j, k ;
40 char state[256] ;
41 time_t now;
42
.. . check args, compute nfaces, as before.
55
56 (void) time(& now); /* seed with time of day and PID */
51 (void) initstate( (unsigned int) (now + getpid()) , state, sizeof state);
58 (void) setstate(state);
59
60 for (i = 1 ; i <= nfaces; i++) (
61 j = random() % 6 ; /* force to range 0 <= j <= 5 */
62 printf ("+----- --+ \n") ;
63 for (k = 0 ; k < 3 ; k++)
64 printf("I%sl\n " , die_faces[(j * 3) + k]) ;
65 printf ("+----- --+ \n\n " ) ;
66
61
68 return 0 ;
69
Including the PID as part of the seed value guarantees that you'll get different results,
even when two programs are started within the same second.
Because it produces a higher-quality sequence of random numbers , r a ndom () IS
preferred over ra nd ( ) , and GNU/Linux and all modern Unix systems support it.
460 Chapter 12 • General Library Imerfaces - Part 2
12.6.3 The I dev I random and I dev l urandom Special Files

Both rand ( ) and srandom() are pseudorandom number generators. Their output,
for the same seed, is a reproducible sequence of numbers. Some applications, like
cryptography, require their random numbers to be (more) truly random. To this end,
the Linux kernel, as well as various BSD and commercial Unix systems, provide special
device files that provide access to an "entropy pool" of random bits that the kernel
collects from physical devices and other sources. From the random(4) manpage:
Ide v/ random
[Bytes read from this file are] within the estimated number of bits of noise in the
entropy pool. I dev I random should be suitable for uses that need high quality
randomness such as one-time pad or key generation. When the entropy po ol is
empty, reads to I dev / random will block until additional environmental noise
is gathered.
I dev / ur andom
[This device will] return as many bytes as are requested. As a result, if there is not
sufficient entropy in the entropy pool, the returned values are theoretically vulner-
able to a cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current non-classified literature, but it is
theoretically possible that such an attack may exist. If this is a concern in your
application, use I dev / randorn instead.
For most applications, reading from Idev / urandom should be good enough. If
you're going to be writing high-quality ctyptographic algorithms, you should read up
on cryptography and randomness first; don't rely on the cursory presentation here!
Here's our die rolling program, once more, using I dev lurandorn:
1 /* ch12-devrandom.c - - - generate die rolls, u s ing / dev / urandom . * /
2
3 #include <stdi o . h>
4 #include <fcntl . h>
5 #include <s tdlib.h>
6
7 char *di e _faces [] / * ASCII graph ics rule! */
... as before ...
31 } ;
32
33 /* myrandom -- - return data from I d ev /u random as unsigned l ong * /
34
12. 7 Meracharacrer Expansions 461
35 unsigned long myr andom (void)

36
37 stati c int fd = -1;
38 u n signed long d ata ;
39
40 if (fd == - 1 )
41 fd = open (" /dev/ura ndom", O_RDONLY) ;
42
43 i f (fd == -1 II read(fd, & data, sizeo f data) <= 0)
44 return r andom () ; / * fall back * /
45
46 return data ;
47
48
49 /* main --- print N different die faces * /
50
51 int ma in(int argc , char **argv )
52
53 int nfaces ;
54 int i, j, k;
55
... check args, compute nfaces, as before.
68
69 f or (i = 1 ; i <= nf ace s ; i ++)
70 j = myrandom() % 6 ; / * force to range 0 <= <= 5 */
71 printf ( " +- ----- -+\n " ) ;
72 for ( k = 0; k < 3 ; k++ )
73 pri ntf ("I%sl\n", d i e_faces[(j * 3) + k]);
74 printf ( "+- ----- - +\n " ) ;
75 putc h a r( ' \n' ) ;
76
71
78 return 0 ;
79
Lines 35-47 provide a functio n-call interface to Idev lurandom, reading an unsi gned
l ong's worth of data each time. The cos t is one file descriptor that remains open
throughout the program's life.
12.7 Metacharacter Expansions

Three sets of functions, of increasing complexity, provide the ability to match shell
wildcard patterns. Many programs need such library functi ons . One example is fi nd:
' find. - name '* . c ' -print ' .Anotheristhe - -excludeoptioninmanyprograms
th at accepts a wildcard pattern of files to exclude from some action or other. This section
looks at each set of functions in turn.
12.7.1 Simple Pattern Matching: fnrnatch ()

We start with the fnmatch () ("filename match") function:
#i nclude <fnmat ch.h> POSIX
int fnma tch (const char *pa ttern, const char *string, int flags ) ;
This function matches string against pat te rn, which is a regular shell wildcard
pattern. The flags value (described shortly) modifies the function's behavior. The re-
turn value is 0 if string matches pattern, FNM_NOMATCH ifit doesn't, and a nonzero
value if an error occurred. Unfortunately, POSIX doesn't define any specific errors;
thus , yo u can only tell that something went wrong, but not what.
The f lags variable is the bitwise-OR of one or more of the flags listed in Table 12.1 .
TABLE 12.1
Flag values for frona t ch ( )
Flag name GLlBC only Meaning

FNM_CASEFOLD ./ Do case-insensitive matching.
FNM_FILE_NAME ./ This is a GNU synonym for FNM_ PATHNAME.
FNM_LEADING DIR ./ This is a flag for internal use by GLIBC; don 't use it in
your programs. See fomatch(3) for the details.
FNM_NOESCAPE Backslash is an ordinary character, not an escape
character.
FNM_PATHNAME Slash in string must match slash in pa tte rn; it can-
not be matched by *, ?, or ' [ .. . l'.
FNM_PERIOD A leading period in string is matched only ifpattern
also has a leading period. The period must be the first
character in st ring. However, ifFNM_PATHNAME is
also set, a period following a slash is treated as a leading
period.
frona tch () works with strings from any source; strings to be matched need not be
actual filenames. In practice though, you would use fnmatch () from code that reads
a directory with readdir () (see Section 5.3.1, "Basic Directory Reading," page 133):
12.7 Meracharacrer Expansions 463
struct dirent dp ;
DIR *dir;
c har pattern[lOO] ;
... fill pattern, open directory, check for errors ...
while (( dp = readdir(dir)) != NULL )
if ( fnmatch(pattern, dir->d_name, FNM_PERI OD) 0)
/ * filename matches pattern */
else
continue ; / * doesn't match */
GNU 15 uses fnmatch() to implement its --ignore option. You can provide
multiple patterns to ignore (with multiple options), 15 tests each filename against all
the patterns. It does this with the file_interesting () function in ls. c :
2269 / * Return n o nzer o i f the file in ' next' should be listed . * /
2210
2211 static in t
2212 file_interesting (c onst struct dirent *next)
2213
2214 register struct ignore-pattern *ignore;
2215
2216 for ( ign o re = ignore-patterns; igno re ; ignore = ignore->next )
2211 if ( fnmatch ( ignore->pattern , next - >d_name , FNM_PERIOD ) == 0)
2218 re t urn 0 ;
2219
2280 if ( really_all_files
2281 II next->d_name[O] ! = '.'
2282 II (all_files
2283 && next->d_name[l] '= ' \0 '
2284 && (next ->d_name[l] != II next->d_name[2] ! = '\0')))
2285 return 1 ;
2286
2281 return 0 ;
2288 }
The loop on lines 2276-2278 tests the filename against the list of patterns for files
to ignore. If any of the patterns matches, the file is not interesting and file_interest-
ing () returns false (that is, 0).
The all_files variable corresponds to the -A option, which shows files whose
names begin with a period but that aren't'.' and' .. '. The really_all_files variable
corresponds to the -a option, which implies -A, and also shows' .' and' .. '. Given this
information, the condition on lines 2280-2284 can be represented with the following
pseudocode:
if show every file , n o matter what its name I-a)

OR the first character of the n ame isn' t a peri od
OR sh ow dot files I-A )
AND there are multiple characters in the filenam e
AND the second character isn' t a period
OR the third character doesn't end the name)))
return TRUE;
III NOTE fnrna tch () can be an expensive function if it's used in a locale that
uses a multibyte character set. We discuss multibyte character sets in
Section 13.4, "Can You Spell That for Me, Please ?", page 521 .
12.7.2 Filename Expansion: glob () and gl o bfree ( )

The glob () and globfree () functions are more elaborate than fnrnat ch ( ) :
#include <glob.h> POSIX
int glob(const char *pattern, int flags ,

int ( *errfunc ) ( const char *epath , int eerrno ) ,
glob_t *pglob ) ;
v oid globfree(glob_t *pg lob) ;
The glob () function does directory scanning and wildcard matching, returning a
list of all path names that match the pattern. Wildcards can be included at multiple
points in the pathame, not just for the last component (for example, '/usr / * / * . s o') .
const char *pattern

The pattern to expand.
int flag s
Flags that control glob ( ) ,s behavior, described shortly.
int (*errfunc) (const char *epath , int eerrno)
A pointer to a function to use for reporting errors. This value may be NULL. If it's
not, and if (* errfunc) () returns nonzero or if GLOB_ERR is set in flags, then
glob () stops processing.
The arguments to (* err func) () are the pathname that caused a problem and
the value of errno set by opendir (), readdir (), or stat ().
glob_t *pglob
A pointer to a glob_t structure used to hold the results.
12.7 Metacharacter Expansions 465
The glob_t structure holds the list of pathnames that glob () produces:
typedef struct ( POSIX
size_t gl-pachc ; Cou nt of paths matched so far
char **gl-pathv; List of matched pathnames
size_t gl_offs ; Slots to reserve in gl---pathv
glob_ t ;
size_t glJ)athc
The number of paths that were matched.
char **glJ)athv
An array of matched pathnames . glJ)a thy [gl-pa the 1 is always NULL .
size_t gl_offs
" Reserved slots" in glJ)a thy. The idea is to reserve slots at the.front of g lJ)a thv
for the application to fill in later, such as with a command name and options. The
list can then be passed directly to exeev () or exeevp () (see Section 9.1.4,
"S tarring New Programs: The exec () Family, " page 293). Reserved slots are set
to NULL . For all this to work, GLOB_DOOFFS must be set in flags.
Table 12.2 lists the standard flags for glob ( ).
TAB LE 12.2
Flags for glob ( )
Flag na me Meaning
GLOB_APPEND Append current call's results to those of a previous call.
GLOB_DOOFFS Reserve gl_offs spots at the front of glJlathv.
GLOB_MARK Append a I character to the end of each path name that is a directory.
GLOB_NOCHECK If the pattern doesn 't match any filename, return it unchanged.
GLOB_NOESCAPE Treat backslash as a literal character. This makes it impossible to escape
wildcard metacharacters.
GLOB_ NOSORT Don't sort the results; the default is to sort them.
The CUBe version of the glob_t structure contains additional members:

typedef struct { CUBC
/ * POSIX components : * /
size_t gl-pathc; Count ofpaths matched so far
char **gl-pathv; List of matched path names
size_t gl_offs ; Slots to reserve in gl---pathv
/* GLIBC components: * /
int gl_ flags; Copy of flags, additional CLlBC flags
void ( *gl_closedir ) ( DIR * ) ; Private version of c1osedir()
struct dirent * ( *g l_r e addir ) (DIR * ) ; Private version of readdir()
DIR * ( *gl_ opendir) (const char * ) ; Private version of opendir()
int ( *gl_lsta t) (const c har * struct sta t * ) ; Private version of Istat()
int ( *gl _ stat ) (con st char * struct stat * ) ; Private version of stat()
glob_t;
T h e members are as follows:
int gl_flags
Copy of flag s. Also includes GLOB_MAGCHAR if patt ern included any
metacharacters.
vo id (*gl_closedir ) (DIR *)
Pointer (0 alternative version of closedir ( ) .
struc t dirent * (*gl_readdir) (DIR *)
Pointer to alternative version of readdir ( ) .
DIR * (*gl_opendir) (const char * )
Pointer to alternative version of opendir ( ) .
int (*gl_lstat) (canst char *, struct stat * )
Pointer (0 alternative version of 1 s ta t ( ) .
in t (*gl_stat) (canst char *, struct stat *)
Pointer (0 alternative version of s ta t ( ) .
The pointers (0 private versions of the standard functions are mainly for use in im-
plementing GLIBC; it is highly unlikely that you will ever need to use them. Because
GUBC provides the gl_flags field and additional flag values , the manpage and Info
manual document the rest of the CUBC glob_t structure. Table 12.3 lists the addi-
tional flags .
The GLOB_ONLYDIR flag functions as a hint to the implementation that the caller is
only interested in directories. Its primary use is by other functions within GLIBC, and
a caller still has to be prepared to handle nondirectory files. You should not use it in
your programs.
glob () can be called more than once: The first call should not have the GLOB_ APPEND
flag set, and all subsequent calls must have it set. You cannot change gl_ offs between
12.7 Metacharacter Expansions 467
TABLE 12 .3
Additional GLlBC flags for glob ()
Flag name Meaning

GLOB_ALTDI RFUNC Use altern ative functions for directo ry access (see text).
GLOB BRACE Perform csh- an d Bash-style brace expansions.
GLOB_MAGCHAR Set in gl_flags if metacharacters were found.
GLOB_NOMAGIC Return the pattern if it doesn 't co ntain metacharacters.
GLOB_ONLYDIR If poss ible, only match directories. See text.
GLOB_PERI OD Allow metacharacters like * and ? to match a leading period.
GLOB_TILDE Do shell-style tilde expansions.
GLOB_TILDE_CHECK Like GL OB_TILDE, bur if there are problems w ith the named home
directory, return GLOB_NOMATCH instead of placing pattern into
the list.
calls, and if yo u m odify any values in gl-pathv or gl-pathc, yo u must restore them
before making a subsequent call to glob ( ) .
The ability to call glob () multiple times allows yo u to build up the results in a single
lise. This is quite useful; it approaches the power of the shell's wildcard expansion facil-
ity, but at the C programming level.
glob () returns 0 if there were no problems or one of the values in Table 12.4 if
there were.
TABLE 12.4
glob () return values
Constant Meaning
Scanning Stopped early because GLOB_ERR was set o r because
( *errfunc) () returned nonzero.
No filenames matched pattern, and GLOB_NOCHECK was not set in

the fl ags.
There was a problem allocating dynamic memory.
glob free () releases all the memory that glob () dynamically allocated. The follow-
ing program, ch12 -glob. c , demonstrates glob ( ) :
1 /* ch12- glob.c --- demonstrate glob() . * /

2
3 #include <stdi o.h>
5 #include <glob.h>
6
7 char *myna me:
8
9 /* globerr --- print error message for glob () */
10
11 int globe rr (const c h ar *path , int eerrno )
12
13 fprintf(stderr, " %s: %s : %s\ n", myname, path, strerror(ee rrno)):
14 return 0: / * let glob() keep going * /
15
16
17 / * main () --- expand command- line wildcards and print results * /
18
20
21 int i;
22 int flags = 0 :
23 glob_t results :
24 int ret :
25
26 if (argc == 1) {
27 fprintf(stder r , "usage : %s wildcard ... \ n", argv [O]):
28 exi t (1 ) ;
29
30
31 myname argv[O] : / * for globerr () */
32
33 for (i 1: i < argc: i++ )
34 flags 1= (i > 1 ? GLOB_ APPEND: 0):
35 ret = glob(argv[i], flags, globerr, & re s ults ) :
36 i f (ret ,= 0) {
37 fprintf(s tderr, " %s: problem with %s (%s), stopping early\n ",
38 myname , argv [ i] ,
39 / * ugly : */ (r et GLOB_ABORTED? "filesystem prob lem" :
40 ret == GLOB_NOMATCH ? " no match of pattern"
41 ret == GLOB_NOSPACE ? "no dyna mic memory "
42 "unknown problem" ) ):
43 break:
44
45
46
47 for ( i = 0 : i < results . gl-pa thc: i++)
48 printf ( "%s \n" , results . gl-pathv[i ]) :
49
50 globf ree (& results) :
51 return 0:
52
12.7 Metacharacre r Expansions 46 9
Line 7 defines myname, which points to the program's name; this variable is for error
messages from glo b er r ( ) , defined on lines 11- 15.
Lin es 33-45 are the heart of the program. They loop over the patterns given on the
co mmand line, calling glob () on each one to append its results to the list. Most of
the loo p is error handling (lines 36- 44) . Lines 47-48 print the resulting list, and lines
50-51 clean up and return.
Lines 39-41 aren 't pretty; a separate function that converts the integer constant to
a stri ng should be used; we've done it this way primarily to save space. Code like this
is tolerable in a small program, but a larger program should use a functio n.
If yo u think abo ut all the wo rk going on under the hood (opening and reading direc-
tories, matching patterns, dynamic allocation to grow the list, sorting the list) , yo u can
start to ap preciate how much glob () does for yo u! Here are some results:
$ ch12-g1ob '/usr/li b / x*.so ' ' .. I .. I* .texi'
lusr/lib/ xchat-au tob5 . so
lusr/lib/xchat-autogb . so
· . 1 . . /OO-preface.t exi
· . 1 . . /Ol-intro . texi
· . I . . I02-cmdline. texi
· . 1 . . /03 -memory. texi
Note that we have to quote the arguments to keep the shell fro m doing the expansion!
Globbing? What's That?

In days of yore, circa V6 Unix, the shell used a separate program to perform wildcard
expansion behind the scenes. This program was named / etc / glob , and according to
the V6 source code: the name "glob" is short for "global. "
The verb "to glob" thus passed into the U nix lexicon, with the meaning " to perform
wildcard expansion. " This in turn gives us the function names glob () and globfree ( ) .
The usually understated sense of humor that occasionally peeked through from the U nix
m an ual therefore lives o n , formally enshrined in the POSIX standard. (Can yo u imagine
anyone at IBM, in the 1970s or 1980s , naming a system ro utine glob () ?)
· See lusr I source I sl I gl o b . c in rhe V6 disrriburion.

470 Chapter 12 • General Library Inte rfaces - Part 2
12.7.3 Shell Word Expansion: wo rdexp () and wordfree ( )

M any members of the POSIX committee felt that gl ob ( ) didn't do en o ugh: They
wanted a library routine capable of doing everything the shell can do: tilde expansion
('echo - a rnol d'), shell variable expansion (' e c ho $ HOME') , and command substitution
('e c h o $ (cd ; pwd) '). Many o thers felt that g l ob ( ) wasn 't the right function for
this purpose. To "satisfy" everyone, POSIX supplies an additional two functions that
do everything:
# include <wo r dexp . h> POSIX
int wordexp(const char *words, wo r d exp_ t *pwordexp , int flags);

void wordfree(wordexp_ t *wordexp) ;
These functions work similarly to glob () and globf r ee (), bur on a wordexp_t
structure:
typedef struct {
s i z e_t we_wordc; Count of words matched
char **we_wo r d v; List of expanded words
s iz e_t we_o ffs ; Slots to reserve in we_wordy
wordexp _t ;
The members are completely analogo us to those of th e glob_ t described earlier; we

wo n ' t repeat the whole d escription h ere.
As with glob (), several flags control wo rdexp ( ) 's behavior. The flags are listed in
T able 12.5.
TABLE 12.5
Flags for wordexp ( )
Constant Meaning
WRDE_ APPEND Append current call's results to those of a previous call.
WRDE_DOOFF S Reserve we_ o ffs spots at the front of we_wo r dv.
WRDE_NOCMD Don 't allow command substitution.
WRDE_ REUSE Reuse the storage already pointed to by we_wordv.
WRDE_ SHOWERR Don't be silent about errors during expansion.
WRDE_UNDEF Cause undefined shell variables to produce an error.
The return value is 0 if everything went well or one of the values in Table 12.6 if no t.
12.8 Regu lar Expressions 471
TABLE 12.6
wordexp () error return values
Con stant Meaning

WRDE_BADCHAR A metacharacter (one of newline, ' I', &, ;, < , > , (, ) , (, or }) appeared
in an invalid location.
WRDE_BADVAL A variable was undefined and WRDE_ UNDEF is set.
WRDE_CMDSUB Command substitution was attempted and WRDE_NOCMD was set.
WRDE_NOSPACE There was a problem allocating dynamic memory.
WRDE SYNTAX There was a shell syntax error.
We leave it to you as an exercise (see later) to modifY ch12 -gl ob. c to use wor dexp ( )
and wordfree ( ) . Here's our version in action:
$ ch12-wordexp 'ec ho $HOME' Shell variable expansion
echo
I home/arnold
$ ch12 -wordexp 'echo $HOME/ * .gz' Variables and wildcards
echo
I home / arnold/48000 . wav . gz
I home/arn old/ipmas q- HOWTO . tar . g z
I home / arnold/ r c . fi r ewall-ex amples . tar . gz
$ ch12-wordexp 'echo -arnold' Tilde expansion
echo
I home/arn old
S ch12-wordexp 'echo -arnold/.p*' Tilde and wildcards
echo
Ihome/arnold/ . post i t n ote s
I h ome/ar n old/ . procmail r c
Ihome/arnold/ . profile
$ ch12-wordexp "echo '-arnold/ . p *' " Quoting works
ec h o
-a r nold/ . p *
12.8 Regular Expressions

A regular expression is a way to describe patterns of text to be matched. If yo u've used
GNU/Linux or Unix for any time at all, yo u're undo ubtedly familiar with regular ex-
pressio ns: They are a fundamental part of the Unix programmer's toolbox. They are
integral to such everyday programs as grep, egrep, sed, awk, Perl , and the ed, vi ,
vim, and Em acs editors. If yo u 're not at all familiar with regular exp ressions, we sugges t
you take a detour to some of the books or URLs named in Section 12.9, "Suggested
Reading," page 480.
POSIX defines two flavors of regular expressions: basic and extended. Programs such
as grep, sed, and the ed line editor use basic regular expressions. Programs such as
egrep and awk use extended regular expressions . The following functions give you the
ability to use either flavor in your programs:
#include <sys /types . h > POSIX
#include <regex.h>
int regcomp(regex_t *preg, co nst char *regex, int cflags);

int regexec (const regex_t *preg, const char *string , size t nmatch,
regmatch_t pmatch[], int e flags ) ;
size_t regerror ( int errcode, const regex_t *preg,
char *errbuf, size_t e rrbuf_size );
void regfr ee(regex_t *pr e g ) ;
To do regular expression matching, you must first compile a string version of the
regular expression. Compilation converts the regular expression into an internal form.
The compiled form is then executed against a string to see whether it matches the orig-
inal regular expression. The functions are as follows:
int re gcomp(regex_t *preg, canst char *regex , int c flags )

Compiles the regular expression regex into the internal form, storing it in the
regex_t structure pointed to by preg. cflags controls how the compilation is
done; its value is 0 or the bitwise OR of one or more of the flags in Table 12.7 .
int re gexec(const regex_t *preg, const char *string, size_t nmatch,
regmatch_t pmatch[], in t eflags)
Executes the compiled regular expression In *preg against the string string.
eflag s controls how the execution is done; its value is 0 or the bitwise OR of
one or m ore of the flags in Table 12.8. We discuss the other arguments shortly.
size_t regerror(int errc ode, canst regex_t *preg, char *errbu f,
siz e_ t errbuf_size )
Converts an error returned by either regcomp () or regexec () into a string that
can be printed for a human to read.
void regfree(regex_t *preg)
Frees dynamic memory used by the compiled regular .expression in *preg.
12.8 Regular Expressions 473
The <regex. h> header file defines a number of flags. Some are for use with
regcomp ( ) ; others are for use with regexec ( ) . However, they all start with the prefix
'REG_' . Table 12.7 lists the flags for regular expression compilation with regcomp () .
TABLE 12.7
Flags for regcomp ( )
Constant Meaning
REG_EXTENDED Use extended regular expressions. The default is basic regular expressions.
REG_leASE Matches with regexe c () ignore case.
REG_NEWLINE Operators that can match any character don't match newline.
REG_NO SUB Subpattern start and end information isn't needed (see text).
The flags for regular expression matching with regexec () are given in Table 12.8.
TABLE 12 .8
Flags for regexec ( )
Constant Meaning
REG_NOTBOL Don't allow the A (beginning of line) operator to match.
REG_NOTEOL Don't allow the $ (end of line) operator to match.
The REG_NEWLINE, REG_NOTBOL, and REG_NOT EOL flags interact with each other.
It's a little complicated, so we take it one step at a time.
• When REG_NEWLINE is not included in cflags, the newline character acts like
an ordinary character. The ' . ' (match any character) metacharacter can match it,
as can complemented character lists (' [ " .. . 1') . Also, $ does not match immedi-
ately before an embedded newline, and " does not match immediately after one.
• When REG_NOTBOL is set in e flags, the " operator does not match the beginning
of the string. This is useful when the string parameter is the address of a character
in the middle of the text being matched.
• Similarly, when REG_NOTEOL is set in eflags, the $ operator does not match the
end of the string.
• When REG_NEWLINE is included in cflags , then:
• Newline is not matched by , .' or by a complemented character list.

• The operator always matches immediately following an embedded newline,
A
no matter the setting of REG_BOL.

• The $ operator always matches immediately before an embedded newline, no
matter the setting of REG_E OL .
When you're doing line-at-a-time I/O, such as by grep, you can leave REG_NEWLINE
out of cfl ags. If you have multiple lines in a buffer and want to treat each one as a
separate string, with and $ matching within them, then you should include
A
REG_NEWLINE.
The regex_ t structure is mostly opaque. It has one member that user-level code
can examine; the rest is for internal use by the regular expression routines :
typedef struct {
... internal stuff here ...
size_t re_nsub;
... internal stuff here ...
regex_t;
T he regmatch_t structure h as at least two members for use by user-level code:

t ypedef struct {
... possible internal stuff here ..
Byte offset to start of substring
regoff_ t rrn_e o ; Byte offset to first character after substring end
... possible internal stuff here ..
regrnatch_ t;
Both the re_l1sub field and the regmatch_t structure are for subexpression matching.
Consider an extended regular expression such as:
[ [ : space : II + ( [ [ : digit: II +) [ [ : space : II + ( [ [ : alpha: I I ) +
The two parenthesized subexpressions can each match one or more characters. Further-
more, the text matching each subexpression can start and end at arbitrary positions
within the string.
regcomp () sets the re_nsub field to the number of parenthesized subexpressions
in the regular expression. regexec () fills in the pmatch array of regmatch_ t structures
with the start and ending byte offsets of the text that match ed the corresponding
subexpressions. Together, these data allow you to do text substitution-deletion of
matched text or replacement of matched text with other text, just as in your favorite
text editor.
pma tch [0] describes the portion of s t r i ng that matched the entire regular expres-
sion. pma tch [1] through pma tch [preg-> r e_n s u b] describe the portions that matched
each parenthesized subexpression. (Thus, subexpressions are numbered from 1.) Unused
elements in the pma t ch array have their rm_so and rm_eo elements set to -l.
regexec () fills in no more than nmatc h - 1 elements of pmatch; you should thus
ensure that there are at least as many elements (plus 1) as in preg - >re_nsub.
Finally, the R E G_NOSUB Bag for regcomp ( ) indicates that starting and ending infor-
mation isn't necessary. You should use this Bag when you don ' t need the information;
it can potentially improve the performance of r egexec ( ) , making a significant
difference.
In other words, if all you need to know is "did it match?" then include RE G_N OSUB.
However, if you also need to know "where is the matching text? " then omit it.
Finally, both regcomp () and r egexec () return 0 if they were successful or a specific
error code if not. The error codes are listed in Table 12.9.
TABLE 12.9
Error codes for regcomp () and regexec ( )
Constant Meaning
REG_BADBR The contents of '\ { ... \ }' are invalid.
REG_BADPAT The regular expression is invalid.
REG_ BADRPT A ?, +, or * is not preceded by valid regular expression.
Braces ('\ { ... \ }') are not balanced correctly.
Square brackets (' [ ... 1') are not balanced correctly.
REG_ECOLLATE The pattern used an invalid collating element.
REG_ECTY PE The pattern used an invalid character class.
REG_EESCAPE The pattern has a trailing \ character.
REG_EPAREN Grouping parentheses (' ( ... ) , or '\ ( ... \) ') are not balanced correctly.
REG_ERANGE The endpoint in a range expression is invalid.
REG_ES PAC E The function ran out of memory.
REG_ESUBREG The digit in '\di g i t' is invalid.
REG_ NOMATCH r egexec () did not match the string to the pattern.
To demonstrate the regular expression routines, ch12 -g rep . c provides a basic

reimplementation of the standard grep program, which searches files for a pattern.
Our version uses basic regular expressions by default. It accep ts a -E option to use ex-
tended regular expressions instead and a - i option to ignore case. Like the real g r ep,
if no files are provided on the command line, our gr ep reads standard input, and as in
the real grep , a filename of' - ' can be used to mean standard input. (This technique is
useful for searching standard input along with other files.) Here's the program:
1 1* ch12-grep . c - - - Si mple v e rsion of grep u s i ng POS IX R . E . func t i on s. * 1
2
3 #def ine - GNU- SOURCE 1 I * for getline () *I
4 #i nc1ude <stdi o.h>
6 #incluCie <reg e x. h>
7 #inc1ude <uni std.h>
8 # i nclude <sys I type s . h>
9
10 char *rnynamei 1* for error messages *1
11 in t ignore_ca s e 0; 1* -i op tion: ignore cas e * 1
12 in t e x ten ded = 0 ; 1* - E op t i on: use extended RE' s * /
13 in t errors = 0 ; 1* number o f e rrors * /
14
15 r e gex_ t patte r n; 1* pattern to ma tc h * /
16
17 v o id compile-patter n (cons t char *pat ) ;
18 void proces s (const char *name, FILE *fp ) ;
19 void usage(void);
Lines 10-15 declare the program's global variables. The first set (lines 10-13) are
for options and error messages. Line 15 declares pa t t ern, which holds the compiled
pattern. Lines 17-19 declare the program's other functions.
21 1* main --- proc ess options, open fi l es * /
22
23 in t main(int a r gc, char **argv)
24
25 i nt c;
26 int i ;
27 FILE *fp;
28
29 myna me = ar gv[O];
30 while ((c = getopt(argc, argv, ":iE" )) != -1) {

31 switch (c) {
32 case 'i' :
33 ignore_case 1;
34 break;
35 case 'E' :
36 extended 1;
37 break ;
38 case ' ?' :
39 usage () ;
40 break ;
41
42
43
44 if (optind == argc) /* sanity check */
45 usage() ;
46
47 compile-pattern(argv[optind]) ; / * compile the pattern */
48 if (erro rs) /* compile failed * /
49 return 1;
50 else
51 optind+ +;
Line 29 sets myname, and lines 30-45 parse the options, Lines 47-51 compile the
regular expression, placing the results into pa t tern. campi le-pa t tern () increments
erro rs if there was a problem. (Coupling the functions by means of a global variable
like this is generally considered bad form. It's OK for a small program such as this one,
but such coupling can become a problem in larger programs.) If there was no problem,
line 51 increments op tind so that the remaining arguments are the files to be processed.
53 if (optind == argc) /* no files, default to stdin * /
54 process ( "standard input ", stdin) ;
55 else (
56 / * loop over files * /
57 for (i = optind; i < argc ; i++) {
58 if (st rcmp (argv[ i], "-" ) == 0)
59 process("standa rd input " , stdin ) ;
60 else i f ((fp = fopen(argv[il. "r")) != NULL) {
61 process(argv[i], fp);
62 fclose(fp);
63 else {
64 fprintf(stderr, " %s : %s : could not open : %s\n ",
65 argv[O], argv[i], strerr or(errno)) ;
66 erro rs+ +;
67
68
69
70
71 reg free (& pattern);
72 return errors ! = 0 ;
73
Lines 53-69 process the files , searching for lines that match the pattern. Lines 53-54
handle the case in which no files are provided: The program reads standard input.
Otherwise, lines 57-68 loop over the files. Line 58 handles the special casing of'-' to
mean standard input, lines 60-62 handle regular files, and lines 63-67 handle problems.
75 /* compile-pattern --- compil e the pattern * /
76
77 v oid compile-pattern(const char *pat)
78 (
79 int flags = REG_NOSUB ; / * don't need where- matched info */
80 int ret;
81 #define MSGBUFSIZE 512 /* arbitrary * /
82 char error [MSGBUFSIZE] ;
83
84 if ( ignore_case )
85 flags 1= REG_leASE;
86 if ( extended )
87 flags 1= REG_EXTENDED;
88
89 ret = regcomp(& pattern, pat, fla gs) ;
90 i f (ret != 0) (
91 (void) regerror(ret, & pattern, error , sizeof error);
92 fprintf (stderr, " %s: pattern '%s' : %s\n ", myname, pat, error);
93 errors++;
94
95
Lines 75-95 define the compile--pattern () function. It first sets flags to

REG_ NOSUB since all we need to know is "did a line match?" and not "where in the line
is the matching text?" .
Lines 84-85 add additional Bags III accordance with the command-line optIOns.
Line 89 compiles the pattern , and lines 90-94 report any problems.
97 / * process --- read lines of text and match against the pattern */
98
99 void process (const char *name, FILE *fp )
100
101 char *bur = NULL;
102 size_t siz e = 0;
103 char error [MSGBUFSIZE ] ;
104 int ret;
105
106 while (getline(& buf, &size, fp) '= -1) (

107 ret = regexec(& pattern, buf, 0, NULL, 0) ;
108 if ( ret '= 0) {
109 if (ret '= REG_NOMATCH) (
110 (void) regerror ( ret, & pattern, error, sizeof error);
111 fprintf(stderr, "%s: file %s : %s \ n", myname, name , error);
112 free(buf);
11 3 errors++;
114 return;
11 5
11 6 else
11 7 printf ( "%s : %s", name, buf ) ; / * print matching lines * /
119
119 free(buf ) ;
120
Lines 97-120 define process ( ) , which reads the file and does the regular expression
match. The outer loop (lines 106-119) reads input lines. We use getline() (see
Section 3.2.l.9, "CLIBe Only: Reading Entire Lines: getline () and getdelim () ,"
page 73) to avoid line-length problems. Line 107 calls r egex ec ( ) . A nonzero return
indicates either failure to match or some other error. Thus, lines 109-115 check for
REG_ NOMATCH and print an error only if some other problem occurred-failure to match
isn' t an error.
If the return value was 0 , the line matched the pattern and thus line 117 prints the
filename and matching line.
122 /* usage --- print usage message and exit * /
123
124 void usage (void)
125 {
126 fprintf (stderr, "usage : %s [-il [-El pattern [ files .. . 1 \n", myname);
127 exit(l);
128
The us age () function prints a usage message and exits. It's called when invalid op-
tions are provided or if no pattern is provided (lines 38-40 and 44-45).
That's it! A modest, yet useful version of grep, in under 130 lines of code.

l. Programming Pearls, 2nd edition, by Jon Louis Bentley. Addison-Wesley,
Reading, Massachusetts, USA, 2000. ISBN: 0-201-65788-0. See also this book's
web site. 10
Program design with assertions is one of the fundamental themes in the book.
2. Building Secure Software: How to Avoid Security Problems the Right Way, by
John Viega and Gary McGraw. Addison-Wesley, Reading, Massachusetts,
USA, 200l. ISBN: 0-201-72152-X.
Race conditions are only one of many issues to worry about when you are
writing secure software. Random numbers are another. This book covers both,
among other things. (We mentioned it in the previous chapter.)
3. The Art of Computer Programming: Volume 2: Seminumerical Algorithms, 3rd
edition, by Donald E. Knuth. Addison-Wesley, Reading, Massachusetts, USA,
1998. ISBN: 0-201-89684-2. See also the book's web site. I I
This is the classic reference on random number generation.
4. Random Number Generation and Monte Carlo Methods, 2nd edition, by James
E. Gentle. Springer-Verlag, Berlin, Germany, 2003. ISBN: 0-387-00178-6.
This book has wide coverage of the methods for generating and testing pseudo-
random numbers. While it still requires background in mathematics and
statistics, the level is not as high as that in Knuth's book. (Thanks to Nelson
H.F . Beebe for the pointer to this reference.)
5. sed 6- awk, 2nd edition, by Dale Dougherty and Arnold Robbins. O'Reilly and
Associates, Sebastopol, California, USA, 1997. ISBN: 1-56592-225-5.
This book gently introduces regular expressions and text processing, starting
with grep, and moving on to the more powerful sed and awk tools.
6. Mastering Regular Expressions, 2nd edition, by Jeffrey E.F. Friedl. O 'Reilly and
Associates, Sebastopol, California, USA, 2002. ISBN: 0-59600-289-0.
10 http:// www.es . bel l-labs . eoml eml es / pearls l

ll http : // www-es-faeulty . stanford . edu / -knuth / taoep . html
12.10 Summary 481
Regular expressions are an important part of Unix. For learning how to chop,
slice, and dice text using regular expressions, we recommend this book.
7. The online manual for GNU grep also explains regular expressions. On a
GNU/Linux system, you can use 'info gr ep' to look at the local copy. Or
use a web browser to read the GNU Project's online documentation for grep. 12
12.10 Summary
• Assertions provide a way to make statements about the expected state of a program.
They are a useful design and debugging tool and should generally be left in pro-
duction code. Be careful, however, not to confuse assertions with runtime checks
for possible failure conditions.
• The me mXXX () functions provide analogues to the better-known st rXXX ()
functions. Their greatest value is that they can work on binary data; zero bytes are
no different from other bytes. Of particular note is memcpy ( ) vs. memmove () and
the handling of overlapping copies.
• T emporary files are useful in many applications. The tmpf i 1 e () and mks temp ( )
APls are the preferred way to create temporary files while avoiding race conditions
and their security implications. Many ptograms use the TMPDI R environment
variable to specifY the location for their temporary files, with a meaningful default
(usually / tmp) if that variable isn' t defined. This is a good convention, one you
should adopt for your own programs.
• The abort () function sends a SIGABRT to the calling process. The effect is to
kill the process and create a core dump, presumably for debugging.
• setjmp () and longjmp () provide a nonlocal goto. This is a powerful facility
that must be used with care. s igsetjmp () and sig long jmp () save and restore
the process signal mask when a program does a nonlocal jump. The problems with
nonlocal gotos sometimes outweigh their benefits; thus, use these routines only if
there isn' t a better way to structure your application.
• Random numbers are useful in a variety of app lications. Most software uses
pseudorandom numbers-sequences of numbers that appear random but that can
12http : // www. gnu . o rg / sof tware/ grep /doc/grep . html

be reproduced by starting with the same seed each time. rand () and srand ( )
are the original API, standardized by the C language. On many systems, rand ()
uses a subpar algorithm. random () and srandom () use a better algorithm, are
included in the POSIX standard, and are preferred over rand () and srand ( ) .
Use the / dev / random and / dev l urandorn special files (a) if they' re available and
(b) if you need high-quality random numbers.
• Three APIs provide increasingly powerful facilities for metacharacter expansion
(wildcarding) .
• fnrnatch () is the simplest, returning true/false as a given string does or doesn 't
match a shell wildcard pattern.
• glob ( ) works its way through the filesystem , returning a list of pathnames that
match a given wildcard. When the standard glob () functionality is all that's
needed, it should be used. While the GLIBC version of glob () has some ex-
tensions, portable programs needing the extra power should use wo rdexp ( )
instead. (Programs that will only run on GNU/Linux systems should feel free
to use the full power of the GLIBC glob ( ) .)
• wordexp () not only does what glob () does , but it also does full shell word
expansion, including tilde expansion , shell variable expansion, and command
substitution.
• The regcornp () and regexe c () functions give you access to POSIX basic and
extended regular expressions. By using one or the other, you can make your pro-
gram behave identically to the standard utilities, making it much easier for pro-
grammers familiar with GNU/Linux and Unix to use your program.
Exercises
1. Use read () and merncrnp () to write a simple version of the cmp program that
compares two files. Your version need not support any options.
2. Use the <stdi o . h> getc ( ) macro and direct comparison of each read character
to write another version of cmp that compares two files. Compare the perfor-
mance of this version against the one you wrote for the previous exercise.
12. 11 Exercises 483
3. (Medium.) Consider the <stdio. h> fgets () and GLIBC getl ine () func-
tions. Would memc cpy () be useful for implementing them? Sketch a possible
implementation of fget s () using it.
4. (Hard. ) Find the source to the GLIBC version of memcmp ( ) . This should be
on one of the source code CD-ROMs in your GNU/Linux distribution , or
yo u can find it by a Web search. Examine the code, and explain it.
5. Test your memory. How does tmpf ile () arrange for the file to be deleted
when the file pointer is closed?
6. Using mkst emp () and fdopen () and any other functions or sys tem calls you
think necessary, write yo ur own version of tmpfile () . Test it too.
7. Describe the advantages and disadvantages of usi ng unlink ( ) on the filename
created by mks temp () immediately after mks temp () returns.
8. Write yo ur own version of mkstemp ( ) , using mkt emp () and open ( ). How
can you make the same guarantees about uniqueness that mks temp () does?
9. Programs using mkstemp () should arrange to clean up the file when they exit.
(Assume that the file is not immediately unlinked after opening, for whatever
reason. ) This includes the case in which a terminating signal could arrive. So,
as part of a signal catcher, the file should be removed. How do you do this?
10. (Hard.) Even with the first-cut signal-handling cleanup, there's still a race
condition. There's a small window between the time mks t emp () creates the
temporary file and the time its name is returned and recorded (for use by the
signal handling function) in a variable. If an uncaught signal is delivered in
that window, the program dies and leaves behind the temporary file. How do
you close that window? (Thanks to Jim Meyering.)
11. Try compiling and running ch12-s etjmp. c on as many different systems
with as many different compilers as you have access to. Try compiling with
and without different levels of optimizations. What variations in behavior, if
any, did you see?
12. Look at the file /usr / src /libc / gen / sleep. c in the V7 Unix source distri-
bution. It implements the sl eep () function described in Section 10.8.1 ,
"Alarm Clocks: sleep () , alarm(), and SIGALRM," page 382. Print it, and
annotate it in the style of our examples to explain how it works.
13. On a GNU/Linux or System V Unix system , look at the Irand48(3) manpage.

Does this interface look easier or harder to use than r andom ( ) ?
14. Take ch0 8-nftw. c from Section 8.4.3, "Walking a Hierarchy: nftw (),"
page 260, and add a -- e x c lude=pat option. Files matching the pattern should
not be printed.
15. (Hard.) Why would G LIBC need pointers to private versions of the standard
directory and stat () calls? Can't it just call them directly?
16. ModifY ch12-g1ob. c to use the wor dexp ( ) API. Experiment with it by doing
some of the extra things it provides. Be sure to quote your command-line argu-
ments so that wordexp () is really doing all the work!
17. The standard grep prints the filename only when more than one file is provided
on the command line. Make ch12 -grep. c perform the same way.
18. Look at the grep(1) manpage. Add the standard -e , -s, and - v options to
ch12 -gr ep. c .
19. Write a simple substitution program:
subs t [ -g ) p at t ern replac emen t [ fi les .. . )
It should read lines of text from the named files or from standard input if
no files are given. It should search each line for a match of pa t tern. If it finds
one, it should replace it with repla c ement.
With - g, it should replace not just the first match but all matches on the line.
In this chapter
• 13 .1 Introduction page 486

• 13.2 Loca les a nd t he C Lib rary page 487
• 13.3 Dynami c Translation of Program Messages page 507
• 13.4 Can You Spell That for Me , Please? page 521
• 13.5 Suggested Readi ng page 526
• Exerci ses page 527
485
E arly computing systems generally used English for their output (prompts, error
messages) and input (responses to queries, such as "yes" and "no"). This was
true of Unix systems, even into the mid-1980s. In the late 1980s, beginning with
the first ISO standard for C and continuing with the POSIX standards of the 1990s
and the cutrent POSIX standard, facilities were developed to make it possible for
programs to work in multiple languages, without a requirement to maintain multiple
versions of the same program. This chapter describes how modern programs should
d eal with multiple-language issues.
13.1 Introduction
The central concept is the locale, the place in which a program is run. Locales encap-
sulate information about the following: the local character set; how to display date and
time information; how to fo rmat and display monetary amounts; and how to format
and display numeric val ues (with or without a thousands separator, what character to
use as the decimal point, and so on).
Internationalization is the process of writing (or modifying) a program so that it can
function in multiple locales. Localization is the process of tailoring an internationalized
program for a specific locale. These terms are often abbreviated i18n and lJ On, respec-
tively. (The numeric values indicate how many characters appear in the middle of the
word, and these abbreviations bear a minor visual resemblance to the full terms. They're
also considerably easier to type.) Another term that appears frequently is native language
support, abbreviated NLS; NLS refers to the programmatic support for doing i 18n
and lIOn.
Additionally, some people use the term globalization (abbreviated glOn) to mean the
process of preparing all possible localizations for an internationalized program. In other
words, making it ready for global use.
NLS facilities exist at two levels. The first level is the C library. It provides information
abo ut the locale; routines to handle much of the low-level detail work for formatting
dateltime, numeric and monetary values; and routines for locale-correct regular expres-
sion matching and character classification and comparison. It is the library facilities
that appear in the C and POSIX standards.
At the application level, GNU get t ext provides commands and a libraty for local-
izing a program: that is, making all output messages available in one or more n atural
486
13.2 Locales and the C Library 487
languages. GNU gettext is based on a design originally done by Sun Microsystems

for Solaris; 1 however it was implemented from scratch and now provides extensions to
the original Solaris gett e x t . GNU gettex t is a de facto standard for program local-
ization, particularly in the GNU world.
In addition to locales and g et t ex t , Standard C provides faci lities for working with
multiple character sets and their encodings-ways to represent large character sets with
fewer bytes. We touch on these issues, brieRy, at the end of the chapter.
13.2 Locales and the C Library

You control locale-specific behavior by setting environment variables to describe
which locale(s) to use for particular kinds of information. The number of available locales
offered by any particular operating system ranges from fewer than ten on some commer-
cial Unix systems to hundreds of locales on GNU/Linux systems . (' l o cale -a ' prints
the full list of available locales.)
Two locales, " C" and" POS IX" , are guaranteed to exist. They act as the default locale,
providing a 7 -bit ASCII environment whose behavior is the same as traditional, non-
locale-aware Unix systems. Otherwise, locales specify a language, country, and, option-
ally, character set information. For example, " i t _ IT" is for Italian in Italy using the
system's default character set, and" i t _ IT . UTF -8" uses the UTF-8 character encoding
for the Unicode character set.
More details on locale names can be found in the GNU/Linux setlocaie(3) manpage.
Typically, GNU/Linux distributions set the default locale for a system when it's installed,
based on the language chosen by the installer, and users don't need to worry about it
anymore.
13 .2 .1 Locale Categories and Environment Variables

The <loc a le . h> header file defines the locale functions and structures. Locale cate-
gories define the kinds of information about which a program will be locale-aware. The
categories are available as a set of symbolic constants. They are listed in Table 13 . l.
1 An earli er des ign, known as catgets ( ) , exists. Although this des ign is standardized by POSIX, it is much
hard er ro use, and we don 't recommend it.
488 Chapter 13 • Inrernationalization and Localization
TABLE 13.1
ISO C locale category constants defined in <l ocale . h>
Category Meaning
This category includes all possible locale information. This consists of
the rest of the items in this table.
The category for string collation (discussed below) and regular expression
ranges.
LC CTYPE The category for classifYing characters (upper case, lower case, etc.). This
affects regular expression matching and the isxxx() functions in
<ctype . h>.
The category for locale-specific messages. This category comes into play
with GNU get tex t, discussed later in the chapter.
The category for formatting monetary information, such as the local and
international symbols for the local currency (for example, $ vs. USD for
U.S. dollars), how to format negative values, and so on.
LC_NUMERIC The category for formatting numeric values.
LC TIME The category for formatting dates and times.
These categories are the ones defined by the various standards. Some systems may
support additional categories, such LC_TELEPHONE or LC_ADDRESS. However, these
are not standardized; any program that needs ro use them but that still needs to be
portable should use # i fde f to enclose the relevant sections.
By default, C programs and the C library act as if they are in the "C" or "pos IX"
locale, to provide compatibility with historical systems and behavior. However, by
calling setlocale () (as described below), a program can enable locale awareness.
Once a program does this, the user can, by setting environment variables, enable and
disable the degree of locale functionality that the program will have.
The environment variables have the same names as the locale categories listed in
Table 13.l. Thus , the command-
-specifies that numbers should be printed according to the " en_DK" (English In
Denmark) locale, but that date and time values should be printed according to the
regular" C" locale. (This example merely illustrates that you can specify different locales
for different categories; it's not necessarily something that you should do.)
13.2 Locales and [he C Library 489
The environment variable LC_ALL overrides all other LC _xxx variables. If LC_ALL
isn 't set, then the library looks for the specific variables (LC_CTYPE, LC_MONE TARY , and
so on). Finally, if none of those is set, the library looks for the variable LANG . H ere is a
small demonstration, using gawk :
$ unset LC_ ALL LANG Remove default variables
$ export LC_ NUMERIC=en_ DK LC_ TIME=C European numbers, default date, time
$ gawk 'BEGIN { print 1 . 234 ; print strftime() }' Print a number, current date, time
1 , 23 4
Wed Jul 09 09 : 32 : 18 PDT 2003
$ export LC_ NUMERIC=it _ IT LC_ TIME=it_ IT Italian numbers, date, time
$ gawk ' BEGIN { print 1.234 ; print strftime() }' Print a number, current date, time
1,23 4
mer lug 09 09 : 32 : 4 0 PDT 2003
$ export LC_ ALL=C Set overriding variable
$ gawk 'BEGIN { print 1.234 ; print strftime() }' Print a number, current date, time
l . 234
Wed Jul 09 09 : 33 : 00 PDT 2 003
(For awk, the POSIX standard states that numeric constants in the source code always
use ' . ' as the decimal point, whereas numeric outp ut follows the rules of the locale.)
Almost all GNU versions of the standard Unix utilities are locale-aware. T hus , par-
ticularly on GNU/Linux sys tems, se tting these variab les gives yo u co ntrol over the sys-
tern's behavio r. 2 •
13.2.2 Setting the Locale: set l o cale ( )

fu mentioned, if you do nothing, C programs and the C library act as if they're in
the" C " locale. The setloc ale () functio n enables locale awareness:
#inc lude <locale . h > ISOC
char *setlo cale (int cat egory, const char * local e ) ;
The cat egory argument is one of the locale categories described in Section 13.2.1,
"Locale Categories and Environment Variables," page 487. The locale argument is
a string naming the locale to use for that category. When locale is the empty string
(" ,,), setloc al e () inspects the appropriate environment variables.
Ifl ocal e is NULL, the locale info rmation is n ot changed. Instead , the function returns
a string representing the current locale for the given category.
2 Long-time C and Unix programmers may prefer to use the "C" locale, even if th ey are native English speakers;
the Engl ish locales prod uce different res ults fro m what grizzled, batrle-scarred U nix veterans expecL
490 Chapter 13 • Internationalization and Localization
Because each category can be set individually, the application's author decides how
locale-aware the program will be. For example, if ma in () only does this-
setl ocal e(LC_TIME, ""); /* Be locale-aware for time, but that's it. * /
-then, no matter what other L C_xxx variables are set in th e environment, only the
time and date fun cti ons obey the locale. All others act as if the program is still in the
"C" locale. Similarly, the call:
setlocale(LC_TIME, "i t_I T"); /* For th e time, we'r e always in Italy . */
overrides the L C_ TIME environment variable (as well as LC_ALL) , forcing the program
to be Italian for time/date computations. (Altho ugh Italy may be a great place to be,
programs are better off using " " so that they work correctly everywhere; this example
is here just to explain how setl ocale () works.)
You can call setlocale () individually for each category, but the simplest thing to
do is set everything in one fell swoop:
/ * When in Rome, do as the Romans do, for *everything*. :- ) * /
setlocale(LC_ALL , "" );
set l ocale () 's return value is the current setting of the locale. This is either a string
value passed in from an earlier call or an opaque value representing the locale in use at
startup. This same value can then later be passed back to set locale ( ) . For later use,
the return value should be copied into local storage since it is a pointer to internal data:
cha r *ini tial_lo cal e ;
initial_loc a l e = strdup (setlo ca le (LC_ALL, "" )) ; / * save copy * /
(vo id ) set local e (LC_ALL, init ia l_locale); / * restor e it * /
Here, we've saved a copy by using the POSIX strdup () function (see Section 3.2.2,
"String Copying: strdup () ," page 74).
13.2.3 String Collation: strcoll () and strxfrrn ( )

The familiar s trcmp () function compares two strings, returning negative, zero, or
positive values if the £if$( string is less than, equal to, or greater than the second one.
This comparison is based on the numeric values of characters in the machine's character
set. Because of this, s tr cmp ( ) ,s result never varies.
However, in a locale-aware world, simple numeric comparison isn't enough. Each
locale defines the collating sequence for characters within it, in other words, the relative
order of characters within the locale. For example, in simple 7 -bit ASCII, the two
characters A and a have the decimal numeric values 65 and 97, respectively. Thus, In
the fragmen t
int i = strc mp( "A", "a " );
i has a negative value. However, In the" en_US. UTF- 8 " locale, A comes after a , not
before it. Thus, using s trcmp () for applications that need to be locale-aware is a bad
idea; we might say it returns a locale-ignorant answer.
The s trc oll () (string collate) function exists to compare strings in a locale-aware
fashio n:
#i nc lude <string . h > ISOC
i nt strc oll(const cha r *5 1 , const char *52) ;
Its return value is the same negativelzero/positive as st rcmp () . The following pro-
gram, ch13- compare. c, interactively demonstrates the difference:
1 / * ch13 -compare .c - - - demonstrat e strcmp () vs . strcoll() */
2
3 #i nclude <std io . h>
4 #include <local e. h>
5 #i nclude <string . h>
6
7 in t main (void)
8 (
9 #de fi ne STRBUFS IZE 1 02 4
10 char locale[ STRBUFSIZE], c urloc[STRBUFSI ZE];
11 char left [STRBUF SIZE ] , r ight[STRBUFSIZ E ];
12 char buf[BUFSI Z] ;
13 i nt count ;
14
15 se tlocale(LC_ALL, .... ) ; / * s et to e nv locale */
16 strcpy(curlo c, set local e (LC_ALL, NULL)) ; /* save it * /
17
18 pr intf ("- -> .. ) ; fflush( s tdout);
492 Chapter 13 • Internarionalizarion and Localizarion
19 while (fgets(but, sizeof but, stdin ) != NULL ) {

20 locale[O) = '\0' ;
21 count = sscanf (bu t, "%s %s %s", left , right, locale);
22 if (count < 2 )
23 br ea k;
24
25 if ( * locale)
26 set local e (LC_ ALL, loc ale );
27 strcpy(curloc, locale);
28
29
30 pr in tf( " %s: strcmp (\ " %s\ ", \ " %s\ " ) is %d\ n" , curloc , lef t,
31 right, s trcmp(left, r ight) ) ;
32 print f("%s : strc oll(\"%s\", \ "%s \") is %d \ n" , curloc, left,
33 righ t, strcoll(le ft, right));
34
35 p ri ntf( " \n--> " ); fflush(st d out);
36
37
38 exi t (0) ;
39
The program reads input lines, which consist of two words to compare and, option-
ally, a locale to use for the comparison. If the locale is given, that becomes the locale
for subsequent entries. It starts out with whatever locale is set in the environment.
The cu rloc array saves the current locale for printing results; lef t and right are
the left- and right-hand words to compare (lines 10-11). The main part of the program
is a loop (lines 19-36) that reads lines and does the work. Lines 20-23 split up the input
line. local e is initialized to the empty string, in case a third value isn' t provided.
Lines 25-28 set the new locale if there is one. Lines 30-33 print the comparison re-
sults, and line 35 prompts for more input. Here's a demonstration:
$ ch13-compare Run the program
-- > ABC abc Enter two words
c: strcmp( "ABC " , "abc" ) is -1 Program started in "C" locale
c: strcoll("ABC", "abc" ) i s - 1 Identical results in "C" locale
--> ABC abc en_ US Same words, "en_US " locale

en_US : strcmp ( "ABC ", "abc" ) is - 1 strcmp() results don't change
en_US : strcoll ( "ABC", "abc" ) is 2 strcoll() results do!
--> ABC abc en_ US . UTF-8 Same words, "en_ US . UTF- 8" locale
en_U S .UTF -8: str c mp( "ABC", " abc") i s -1
en_US .UTF-8 : strcol l(" ABC " , "abc " ) is 6 Different value, still positive
-- > junk JUNK New words

e n_US.UTF - 8 : s trcmp("junk " , "JUNK") is 1 Previous locale used
en_US . UT F-8 : strcoll(" junk", "JUNK") is -6
This program clearly demonstrates the difference between str emp ( ) and stre oll () .
Since stremp () works in accordance with the numeric character values, it always returns
the same result. st reoll () understands collation issues, and its result varies according
to the locale. We see that in both en_US locales, the uppercase letters come after the
lowercase ones.
w
o NOTE Locale-specific stri ng collation is also an issue in regular-expression
IImatching. Regular expressions allow character ranges within bracket expressions,

such as ( [a- z l' or ( [ " - / 1'. The exact meaning of suc h a construct (the
I characters numerically between the start and end points, inclusive ) is defined
s only for the" C" and " POS IX" locales.
tl
For non-ASCII locales, a range such as ([ a-zl' can also match uppercase
;1
@ letters, not just lowercase ones! The range ([ " - / l' is valid in ASCII , but not in
*&"en_US . UTF- 8 " .
iI The long-term most portabl e solution is to use POSI X character classes, such
as ( [ [ : lower: 11 ' and ([ [ : punet : ll'. If you find yourself needing to use
range expressions on systems that are locale-aware and on older systems that
ti are not, but without having to change your program, th e sol ution is to use
I brute force and list each character individual ly within the brackets. It isn 't pretty,
M.·.
·
but it wo rks.
W
Locale-based collation is potentially expensive. If you expect to be doing lots of

compar iso ns, where at least one of the strings wi ll not change or where string values
will be co mpared against each other multiple times (s uch as in sorting a list), then you
should consider using the s trx frm () function to convert your strings to versions that
can be used with s tr ernp ( ) . The s trx frm () function is declared as follows:
~in clude <string . h> ISOC
size_t st r xfrm(char *de st, const cha r *src, size_t n ) ;
The idea is that s trxf rm () transforms the first n characters of sre, placing them
into de s t. The return value is the number of characters necessary to hold the trans-
formed characters. If this is more than n, then the contents of des tare « indeterminate. "
The POSIX standard explicitly allows n to be zero and des t to be NULL . In this case,
s trxfrm () returns the size of the array needed to hold the transformed version of sre
(not including the final' \0 ' character) . Presumably, this value would then be used
with malloe () fo r creating the de st array or for checking the size against a predefined
494 Chapter 13 • Interna[ionaiiza[ion and Locaiiza[ion
array bound. (When doing this, obviously, src must have a terminating zero byte.)
This fragment illustrates how to use strxfrm():
#define STRBUFSIZE . ..
char Sl[STRBUFS IZE], s2[STRBUFSIZ E]; Original strings
char sI x [STRBUFSIZE], s2x[S TRBUFS IZE ]; Transformed copies
siz e_t len1, len 2;
in t cmp;
.. . fll in s 1 and s2 ...

len1 strlen(sl);
len2 = strlen(s2);
if ( strxfrm (slx , sl, len1 ) >= STRBUFSIZE II strxfrm ( s2x, s2, len2 ) >= STRBUFSIZE )
/* too big, recover * /
cmp = strcmp(slx, s2x) ;

if ( c mp == 0)
/ * equal * /
el se if (cmp < 0 )
/* sl < s2 * /
els e
/* sl > s2 * /
For one-time comparisons, it is ptobably faster to use strcoll () directly. But if

strings will be compared multiple times, then usin g strx frm () once and stremp ()
on the transformed values will be faster.
There are no locale-aware collation functions that correspond to s trncmp () or
strcas ecmp ( ) .
13.2.4 Low-Level Numeric and Monetary Formatting: l o cal ec onv ( )

Correctly forma tting numeric and monetary values requires a fair amo unt of low-
level information. Said information is available in the struct le onv, which is retrieved
with the localeconv () function:
#include <locale . h> ISOC
struct lconv *localeconv(void ) ;
Similarly to the ctime () function , this functi on returns a pointer to internal static
data. You sho uld m ake a copy of the returned data si nce subsequent calls co uld return
different values if the locale has been changed. Here is the struet lconv (condensed
slightly) , direct from GLIBC's <loca le. h>:
13 .2 Locales and [he C Library 495
struct lconv (
f * Nume r ic (non-monetary ) in f ormati on . * f
char *decimal-point ; f* Decimal p o i nt charac ter . * f
char * thousa nds_sep; f * Thous a n ds s eparator . * f
f * Each eleme nt is the nu mber o f di git s in each g r oup ;
el ements with higher indices are fa rther left .
An element wit h value CHAR MAX means that no furt her grouping is done .
l L'1 element with value 0 means that the previ ous eleme nt is used
f o r all groups f arther le ft. *f
cha r *grouping ;
f * Monetary i nformation . * f
f * Fi r st three chars are a c urr ency symbo l from ISO 4217 .
Fourth char is the sepa r ato r. Fifth char is '\0' . * f
cha r *int_curr_s ymbol;
c har *currency_symbol ; f * Local curr ency symbol . * f
char *mon_deci mal -point ; f * Deci mal point c h aracter . * f
c har *mon_thousands _sep ; f* Tho u sands separator . * f
c har *mon_group ing; f * Like 'g r o up ing' elemen t (above) . *f
char *positive_sign; f * Sign fo r p ositive v al ues . * f
cha r *negative_sign; f * Sign for n egative v a lues . * f
c har i nt_frac_d ig its; f * Int'l fr act ional dig its. *f
char fr ac _ di gits; f * Loca l fracti onal di gits. * f
f * 1 if currency_symbol precedes a pos itive v alue , 0 if suc ceeds . */
char p_cs-precedes;
/ * 1 iff a space separates c ur rency_symbol f rom a pos iti ve value . */
char p_sep_by_spac e ;
/ * 1 if currency_symbol precedes a nega tive value, 0 if succeeds . */
c har n _cs-pre cedes;
/ * 1 i ff a space sepa rate s curre n cy _s ymbol fr om a n egative value . */
cha r n _sep_by_space;
/ * po s itive and negative sign positions :
o Parenthes es s urround th e quantity a n d currency_symbol .
1 The sign stri ng precede s t he quantity a n d curren c y_s ymbol .
2 The sign string follo ws the quant ity and curren c y_symbol .
3 The s ign string i mmediately precedes the currency_symbo l .
4 The sign str ing immediately follo ws th e currency_symbol . * f
c har p_sign-po sn;
cha r n_ sign-po sn;
f * 1 if int_curr_symbol p recedes a posit ive value, 0 i f s ucceeds . */
c har in t-p_cs-precedes ;
/ * 1 iff a space s e p arate s int_cur r_symbol fr om a p os i tive val ue. */
c har in t-p_sep_ by_ space ;
/ * 1 i f int_curr_symbo l precedes a negative v a lue, 0 if suc c eeds. */
char int_n_c s -precede s;
/ * 1 if f a space sepa rat e s int_curr_symbol from a negative val u e. *f
c h ar in t_n_sep _ by_space;
/ * positive and negat ive sign positions :

o Parentheses sur round the quantity and in t_curr_symbol.
1 The sign string precede s the quantity and int_curr_symbol .
2 The sign string follows the quantity and int_cu rr_symbol.
3 The sign string immediate l y precedes the int_curr_symbo l .
4 The sign string immedi ately follows the int_curr_symbol. */
char int-p_s ign-posn;
char int_n_sign-posn ;
};
The comments make it fairly clear what's going on. Let's look at the first several
fields in the struct lconv:
decimal-po int
The decimal point character to use. In the United States and other English-
speaking countries, it's a period, but many countries use a comma.
thousands_sep
The character to separate each 3 digits in a value.
group ing
An array of single-byte integer values. Each element indicates how many digits to
group. As the comment says, CHAR_MAX means no further grouping should be
done, and 0 means reuse the last element. (We show some sample code later in
the chapter.)
in t _ curr_symbo l
This is the international symbol for the local currency. For example, 'USD' for U.S.
dollars.
currency_symbol
This is the local symbol for the local currency. For example, $ for U.S. dollars.
mon_decirnal-point,rnon_thousands_ sep,mon_grouping
These correspond to the earlier fields, providing the same information, bur for
monetary amounts.
Most of the rest of the values are not useful for day-to-day programming. The fol-
lowing program, ch13 -lconv . c, prints some of these values, to give you a feel for what
kind of information is available:
13.2 Locales and (he C Library 497
/ * ch13-lconv . c --- show some of the components of the struct lconv */

#include <limits . h>
#include <locale . h>
int main(void)
struct lconv 1;
int ii
setlocale (LC_ALL, "");

1 = *localeconv () ;
printf ("decimal-point [% s] \n" , 1 . dec imal-point) ;

printf("thousands_sep [%s]\n", l . thousands_sep);
for ( i = 0; 1. grouping [i] ! = 0 && 1. grouping [i] ! = CHAR_MAX; i++)

printf("grouping[%d] = [%d]\n", i, 1.grouping[i]) ;
printf("int_curr_symbol = [%s]\n", l . int_curr_symbol) ;

printf("currency_symbol = [%s]\n", 1. currency_symbol) ;
printfl "mon_decimal-point = [%s]\n", l . mon_decimal-point) ;
printf ( "mon_thousands_sep = [%s] \n", l . mon_thousands_sep);
printf ("positive_sign [%s] \ n", l . positive_sign);
printf ( "negative_sign = [%s] \n", l . negative_sign ) ;
When run with different locales, not surprisingly we get different results:
$ LC_ ALL=en_ US ch13-1conv Results for the United States
decimal-point = [ . ]
thousands_sep = [,]
grouping[O] = [3]
grouping[l] = [3]
int_curr_symbol = [USD ]
currency_symbol = [$]
mon_decimal-point [.]
mon_thousands_sep [,]
positive_sign []
negative_sign = [-]
$ LC_ ALL=it_ IT ch13-1conv Results for Italy

decimal-point = [ . ]
thousands_sep = []
int_curr_symbol []
currency_symbol []
mon_decimal-point []
mon_thousands_sep []
positive_sign []
negative_sign = []
498 Chapter 13 • Inrernationalization and Localization
Note how the value for int_curr_symbo l in the " en_US" locale includes a trailing
space character that acts to separate the symbol from the following monetary value.
13.2.5 High-Level Numeric and Monetary Formatting: s trfrnon ( )

and prin tf ( )
After looking at all the fields in the struct lconv, yo u may be wondering, "Do I
really have to figure out how to use all that information just to format a monetary value?"
Fortunately, the answer is no. 3 The strfmon () function does all the work for you:
#include <mone tary . h> POSIX
ssize_t strfmon(char *s, size_t max , const char *format, ... ) ;
This routine is much like s trftime () (see Section 6.1.3.2, "Complex Time Format-
ting: strftime ( ) ," page 171), using format to copy literal characters and formatted
numeric values into s, placing no more than max characters into it. The following
simple program, ch13 -strfmon . c , demonstrates how strfmon () works:
/* ch13-strfmon.c --- demonstra te strfmon() */
#include <stdi o . h>

#include <l oc a le .h>
#include <mone tary . h>
int main (vo i d )
char bu f [ EUFSI Z];

doub le val = 1234.567;
setlocale (LC_ ALL , "" );

strfmon (buf, sizeof buf, "You owe me %n ( %i) \n", va l , va l) ;
fputs( buf, s tdout);

return 0;
When run in two different locales, it produces this output:

$ LC_ALL=en_ US ch13-strfmon In the United States
You owe me $1,234.5 7 (USD 1,234 . 57)
$ LC_ALL=it _ IT ch13-strfmon In Italy
You owe me EUR 1 . 235 (EUR 1.235)
3 We're as happy as you are, since we don 't have ro provide example code that uses thi s, er, full-fearured struct.
13.2 Locales and (he C Library 499
As you can see, strfmon () is like str ftime () , copying regular characters unchanged
into the destination buffer and formatting arguments according to its own formatting
specifications. There are only three:
%n Print the national (that is, local) form of the currency value.
%i Print the international form of the currency value.
%% Print a literal %character.
The values to be formatted must be of type doubl e. We see the difference between
%n and %i in the "en_ US" locale: %n uses a $ character, whereas %i uses USD, which
stands for "u.s.
Dollars. "
Flexibility-and thus a certain amount of complexity-comes along with many of
the APls that were developed for POSIX, and str fmon () is no exception. As with
print f ( ) , several optional items that can appear between the % and the i or n provide
increased control. The full forms are as follows:
%[fla gs] [field width] [#left-pre c ] [ . right-pre c ] i
%[fla gs] [field width] [#left-pre c] [. right-prec)n
%% No flag, field width, etc., allowed
The flags are listed in Table 13.2.
TABLE 13.2
Flags for strfmon ()
Flag Meaning
=c Use the character c for the numeri c fill character, for use with the left precision.
The default fill character is a space. A common alternative fill character is o.
Disable the use of the grouping character (for example, a comma in the United
States) .
Enclose negative amounts in parentheses. Mutually exclusive with the + flag.
+ Handle positive/negative values normally. Use the locale's positive and negative
signs. Mutually exclusive with the ( flag.
Do not include the currency sym bol. This flag is useful if yo u wish to use
s trfmon () to get more flexible formatting of regular numbers than what
sprintf () provides.
Left-justify the result. The default is right justification. This flag has no effect
without a field width.
The field width is a decimal digit string, providing a minimum width. The default
is to use as many characters as necessary based on the rest of the specification. Values
smaller than the field width are padded with spaces on the left (or on the right, if the
, -' flag was given).
The left precision consists of a # character and a decimal digit string. It indicates the
minimum number of digits to appear to the left of the decimal point character;4 if the
converted value is smaller than this, the result is padded with the numeric fill character.
The default is a space, but the = flag can be used to change it. Grouping characters are
not included in the count.
Finally, the right precision consists of a ' . ' character and a decimal digit string. This
indicates how many digits to round the value to before it is formatted. The default is
provided by the fr ac_dig i ts and int_ fra c_digi ts fields in the struct lc onv.
If this value is 0, no decimal point character is printed.
strfmon () returns the number of characters placed into the buffer, not including
the terminating zero byte. If there's not enough room, it returns -1 and sets errno
to E2BI G.
Besides strfmon (), POSIX (but not ISO C) provides a special flag-the single-quote
character, '-for the pr intf () formats %i, %d, %u, %f, %F, %g, and %G . In locales that
supply a thousands separator, this flag adds the locale's thousands separator. The follow-
ing simple program, ch13 - qu otefl ag. c, demonstrates the o utput:
/ * ch13 -quotef 1ag . c --- demonstrate prin tf's quote flag * /
# include <stdio.h>
# include <loc ale.h>
int main (vo i d )
setloca1e (LC_ALL, "" ) ; / * Have to do th is, or it won ' t work * /

print f ( " %' d\n ",1234567 ) ;
return 0;
Here's what happens for two different locales: one that does not supply a thousands
separator and one that does:
4 The technical term used in the standards is radix point, si nce numbers in different bases may have fracti onal parts
as well. However, for monetary values, it seems pretry safe to use th e term "decimal point. "
$ LC~LL=C ch13-quoteflag Traditional environment, no separator

1234567
S LC_ ALL=en_ US ch13-quoteflag English in United States locale, has separator
1,234,567
As of this writing, only GNU/Linux and Solaris support the ' Bag. Double-check
your system 's printf(3) m anpage.
13.2.6 Example: Formatting Numeric Values in gawk

gawk implements its own version of the printf () and sprintf () functions. For
full locale awareness, gawk must support the ' flag, as in C. The following fragment,
from the file builtin . c in gawk 3.1.4, sh ows how gawk uses the struct lconv for
numeric forma tting:
1 case' d' :
2 case 'i' :
3
4 tmpval forc e_number (a rg) ;
5
6
1 uval = (uintmax_ t) tmpval;
8
9 ii jj = 0;
10 do
11 *-- cp = (cha r ) ('0' + uval % 1 0) ;
12 #ifdef K~V E _LOCALE_H
13 if (quote_flag && loc . grouping [ ii ] && ++jj == loc .grouping[ii]) {
14 *--cp = loc . thousands_sep[O ] ; / * XXX - assumption it's o ne char * /
15 if (loc . group ing [ ii+1 ] == 0)
16 jj = 0 ; / * keep using current val in loc.gr ouping [ii] */
11 else if (loc . grouping[ii+1 ] == CHAR_MAX)
18 quote_flag = FALSE ;
19 else {
20 ii++ ;
21 jj = 0 ;
22
23
24 #endif
25 uval /= 10 ;
26 } while (uval > 0) ;
(The line numbers are relative to the start of the fragment.) Some parts of the code
that aren't relevant to the discussion have been omitted to make it easier to focus on
the parts that are important.
The variable lac, used in lines 13- 17, is a struct lconv. It's initialized in main () .
Of interest to us here are loc . thousands_sep, which is the thousands-separator
S02 Chapter 13 • Internationalization and Localization
character, and loc . grouping, which is an array describing how many digits between
separators. A zero element means "use the value in the previous element for all subse-
quent digits," and a value of CHAR_ MAX means "srop inserting thousands separators ."
With that introduction, let's look at the code. Line 7 sets uval, which is an unsigned
version of the value to be formatted. ii and j j keep track of the position in
loc . grouping and the number of digits in the current group that have been converted,
respectively.5 quote_flag is true when a ' character has been seen in a conversion
specification.
The do -while loop generates digit characters in reverse, filling in a buffer from the
back end toward the front end. Each digit is generated on line 11. Line 25 then divides
by 10, shifting the value right by one decimal digit.
Lines 12-24 are what interest us. The work is done only on a system that supports
locales, as indicated by the presence of the <loca l e . h> header file. The symbolic
constant HAVE_LOCALE_H will be true on such a system. 6
When the condition on line 13 is true, it's time to add in a thousands-separator
character. This condition can be read in English as "if grouping is requested, and the
current position in loc . grouping indicates an amount for grouping, and the current
count of digits equals the grouping amount. " If this condition is true, line 14 adds the
thousands separator character. The comment notes an assumption that is probably true
but that might come back to haunt the maintainer at some later time. (The 'xxx' is a
traditional way of marking dangerous or doubtful code. It's easy ro search for and very
noticeable to readers of the code.)
Once the current position in loc . grouping has been used, lines 15-22 look ahead
at the value in the next position. If it's 0, then the current position's value should con-
tinue ro be used. We specify this by resetting j j to 0 (line 16). On the other hand, if
the next position is CHAR_MAX, no more grouping should be done, and line 18 turns it
off entirely by setting quote_flag to false. Otherwise, the next value is a grouping
value, so line 20 resets j j to 0, and line 21 increments i i.
5 We probably should have chosen more descriptive names than just i i and j j . Since the code that uses th em is
short, our lack of imagination is not a significant problem.
6 This is set by the Autoconf and Automake machinery. Autoconf and Automake are powerful software suites that
make it possible to support a wide range of Un ix systems in a sys tematic fash ion.
This is low-level, detailed code. H owever, once you understand how the information
in the struct lconv is presented , the code is straightforward to read (and it was
straigh tforward to write) .
13 .2.7 Formatting Date and Time Values: ctirne () and strftirne ()

Section 6. 1, "Tim es and Dates, " page 166, described the functions for re trieving and
fo rmatting time and d ate values . The st rftime () functi on is also locale-aware
if setlocale () has been called appropriately. The fo llowing simple program,
ch 13 - t i mes. c demonstrates this:
/* ch13-times . c --- demonstrate locale-bas ed times */

#include <local e . h>
#include <time . h>
int main (void)
char buf [100 1 ;

time_t now ;
struct tm *curtime;
set locale (LC_ALL, "" ) ;

time(& now) ;
curtime = loc alti me(& now) ;
(void) str ftime(buf, siz eof bu f ,
"It is now %A , %8 %d, %Y, %I : %M %p", curtime ) ;
p ri ntf ( "%s \ n", buf);
printf ( " ctime() says : %s ", ctime(& now));

e xi t (0) ;
When the program is run , we see th at indeed the strftime () results vary while the
ctime () results do n ot:
$ LC_ ALL=en_ US ch13-times Time in the United States
It is now Friday, July 11, 2003, 10 : 35 AM
ctime() says: Fri Jul 11 10 : 35 : 55 200 3
$ LC_ ALL=it_ IT c h 1 3 - times Time in Italy

It is now venerdi, lugl io 11, 2003 , 10 : 36
ctime () says : Fri Jul 11 10 : 36 : 00 2003
$ LC_ ALL=fr_ FR ch1 3 - times Time in France

It is now vendredi, ju illet 11, 2003, 10 : 36
ctime() says : Fri Jul 11 10 : 36 : 05 200 3
The reaso n for the lack of variation is that ctirne () (and asctirne ( ), upon which
ctirne () is based) are legacy interfaces; they exist to support old code. strft irne ( ),
being a newer interface (developed initially for C89), is free to be locale-aware.
13.2.8 Other Locale Information : nl_langinfo ()

Although we said earlier that the ca tgets ( ) API is hard to use, one part of that API
is generally useful: nl_langinfo ( ) . It provides additional locale-related information,
above and beyond that which is available from the struct lconv:
#include <nl_types . h> XSI
#in clude <langinf o .h>
char *nl_langin fo(nl_ item item) ;
The <nl_types. h > header file defines the nl_i tern type. (This is most likely an
int or an enum.) The i t ern parameter is one of the symbolic constants defined in
<langinfo. h>. The return value is a string that can be used as needed, either directly
or as a format string for strftirne ( ) .
The available information comes from several locale categories. Table 13.3 lists the
item constants , the corresponding locale category, and the item's meaning.
An era is a particular time in history. As it relates to dates and times, it makes the
most sense in countries ruled by emperors or dynasties?
POSIX era specifications can describe eras before A. D. 1. In such a case, the start date
has a higher absolute numeric value than the end date. For example, Alexander the
Great ruled from 336 B.C. to 323 B.C.
The value returned by 'nl_lang info (ERA)', if not NULL, consists of one or more
era specifications. Each specification is separated from the next by a ; character. Com-
ponents of each era specification are separated from each other by a : character. The
components are described in Table 13.4.
7 Although Americans often refer to the eras of particular presidents, these are not a formal part of the narional
calendar in rhe same sense as in pre-World War II Japan or pre-Commu nist China.
13.2 Locales and the C Library 50S
TABLE 13.3
Item values for nl_langinfo ( )
Ite m n ame Category Mea n in g

The ab breviated names of the days of the week.
Sunday is Day 1.
ABMON_l, . .. , ABMON_12 LC TIME The abbreviated names of the months.
Alternative symbols for digits; see text.
The a.m'!p.m. notations for the locale.
CODESET LC_TYPE The name of the locale's codeset; that is, the
character set and encoding in use.
CRNCYSTR LC_MONETARY The local currency symbol , described below.
LC TIME The names of the days of the week. Sunday is
Day 1.
The date format.
The date and time format.
ERA_D_FMT The era date form at.
ERA_D_T_FMT The era date and time format.
The era time format.
Era description segments; see text.
The names of the months.
RADIXCHAR The radix character. For base 10, this is the
decimal point character.
THOUSEP LC_NUMERIC The thousands-separato r character.
LC_TIME The time format with a.m'!p.m. notation .
LC_TIME The time format.
YESEXPR,NOEXPR LC_MESSAGES Strings representing positive and negative
responses .
TABLE 13.4
Era specification components
Component Meaning
Direction A + or ' - ' character. A + indicates that the era runs from a numerically lower
year to a numerically higher one, and a '- ' indicates the opposite.
Offset The year closest to the start date of the era.
Start date The date when the era began, in the form 'YYYYI mm l dd'. These are the year,
month, and day, respectively. Years before A. D. 1 use a negative value for yyyy.
End date The date when the era ended, in the same form. Two additional special forms
are allowed: -* means the "beginning of time," and +* means the "end
of time."
Era name The name of the era, corresponding to st rftime () 's %EC conversion
specification .
Era format The format of the year within the era, corresponding to str f t ime () 's %EY
conversion specification.
The ALT_DI GITS value also needs some explanation. Some locales provide for "alter-
native digits." (Consider Arabic, which uses the decimal numbering system but different
glyphs for the digits 0- 9 . Or consider a hypothetical "Ancient Rome" locale using roman
numerals.) These come up, for example, in st rf t ime ( ) 's various %O c conversion
specifications. The return value for 'nl_lang i n f o (ALT_DIGITS )' is a semicolon-
separated list of character strings for the alternative digits. The first should be used for
0 , the next for 1, and so On. POSIX states that up to 100 alternative symbols may be
provided. The point is to avoid restricting locales to the use of the ASCII digit characters
when a locale has its Own numbering system.
Finally, 'nl_l anginf o ( CRNCYSTR ) ' returns the local currency symbol. The first
character of the return value, if it's a '-', +, or ' . " indicates how the symbol should
be used:
The symbol should appear before the value.
+ The symbol should appear after the value.
The symbol should replace the radix character (decimal point).
13.3 Dynamic Translation of Program Messages 507
13.3 Dynamic Translation of Program Messages

The standard C library interfaces just covered solve the easy parts of the localization
problem. Monetary, numeric, and time and date values, as well as string collation issues,
all lend themselves ro management through tables of locale-specific data (such as lists
of month and day names).
However, most user interaction with a text-based program occurs in the form of the
messages it outputs, such as prompts or error messages. The problem is to avoid having
multiple versions of the same program that differ only in the contents of the message
strings. The de facto solution in the GNU world is GNU get t ex t . (GUI programs
face similar issues with the items in menus and menu bars ; typically, each major user
interface toolkit has its own way to solve that problem.)
GNU gettext enables translation of program messages into different languages at
runtime. Within the code for a program, this translation involves several steps , each of
which uses different library functions. Once the program itself has been properly pre-
pared, several shell-level utilities faci litate the preparation of translations into different
languages. Each such translation is referred to as a message catalog.
13.3.1 Setting the Text Domain: textdornain ( )

A complete application may contain multiple components: individual executables
written in C or C++ or in scripting languages that can also access gettext facilities,
such as gawk or the Bash shell. The components of the application all share the same
text domain, which is a string that uniquely identifies the application. (Examples might
be "gawk" or "c o reutils"; the former is a single program, and the latter is a whole
suite of programs.) The text domain is set with textdomain ( ) :
# i n clude <lib i ntl . h> CUBe
cha r *textdoma i n ( c onst char *domainname ) ;
Each component should call this function with a string naming the text domain as
part of the initial startup activity in main ( ) . The return value is the current text domain.
If the domainname argument is NULL, then the current domain is returned; otherwise,
it is set to the new value and that value is then returned. A return value of NULL indicates
an error of some sort.
If the text domain is not set with textdomain ( ) , the default domain is "messages" .
13.3.2 Translating Messages: get text ( )

The next step after setting the text domain is to use the get text () function (or a
variant) for every string that should be translated. Several functions provide translation
servIces:
#include <libintl . h> CUBe
char *gettext(const char *msgid);

char *dgettex t(const char *domainname, const char *ms gid ) ;
char *dcgettext(const char *domainname , const char *msgid, int category) ;
The arguments used in these functions are as follows:
const char *msgi d

The string to be translated. It acts as a key into a database of translations .
const char *domainname
The text domain from which to retrieve the translation. Thus, even though main ( )
has called textdomain () to set the application's own domain, messages can be
retrieved from other text domains. (This is most applicable to messages that might
be in the text domain for a third-party library, for example.)
int category
One of the domain categories described earlier (LC_ TIME, etc.).
The default text domain is whatever was set with textdomain () (" messages" if
textdomain () was never called). The default category is LC_ ME SSAGES. Assume that
main () makes the following call:
tex tdomain ( "ki llerapp" ) ;
Then, 'get text ("my message")' is equivalent to 'dgettext ( "killerapp", "my

message") '. Both of these, in turn, are equivalent to 'dcgettext ("killerapp",
"my message", LC_ MESSAGE S) ' .
You will want to use gettext () 99.9 percent of the time. However, the other
functions give you the flexibility to work with other text domains or locale categories.
You are most likely to need this flexibility when doing library programming, since a
standalone library will almost certainly be in its own text domain.
All the functions return a string. The string is either the translation of the given msgid
or, if no translation exists, the original string. Thus, there is always some output, even
if it's just the original (presumably English) message. For example:
13.3 Dynamic Translation of Program Messages 509
/ * The canonical first program, localized version . * /

#include <locale . h>
#include <libintl . h>
int main(void)
setlocale(LC_ALL, " " ) ;

printf ("%s\n", gettext ( "hello, world " )) ;
return 0 ;
Although the message is a simple suing, we don 't use it directly as the prin tf ( ) control
suing, since in general, translations can contain % characters.
Shortly, in Section 13.3.4, "Making gettext () Easy to Use," page 510, we'll see
how to make g et t ext () easier to use in large-scale, real-world programs.
13.3.3 Working with Plurals: nget text ( )

Translating plurals provides special difficulties. Naive code might look like this:
printf("%d word%s misspelled\n", nwords, nwords > 1 ? "s" "" ) ;
/ * or * /
printf ("%d %s misspelled\n", nwords, nwords == 1 ? "word" "words") ;
This is reasonable for English, but uanslation becomes difficult. First of all, many
languages don ' t use as simple a plural form as English (adding an s suffix for most
words). Second, many languages , particularly in Eastern Europe, have multiple plural
forms, each indicating how many objects the form designates. Thus, even code like this
isn't enough:
if (nwords == 1)
printf( " one word misspelled\n") ;
else
printf("%d words misspelled\n " , nwords ) ;
The solution is a parallel set of routines specifically for formatting plural values:
#include <libintl . h> CLiBe
char *ngettex t(const char *msgid , const cha r *msgid-plural ,

unsigned long int n) ;
char *dngettext(const char *domainname, const char *msgid ,
const char *msgid-plural , unsigned long int n) ;
char *dcngettex t(const char *domainname , const char *msgid ,
const char *msgid-plural , unsigned long int n , int category);
Besides the original msg i d argument, these function s accept additional arguments:
const char *ms gid-p l ura l
T he default string to use for plural values. Examples shortly.
unsigned l ong in t n
The number of items there are.
Each locale's message catalog specifies how to translate plurals. 8 The ngettext ()
function (and its variants) examines n and, based on the specification in the message
catalog, returns the appropriate translation of msgid. If the catalog does not have a
translation for ms g i d, or in the " c " locale, nget text ( ) returns ms gid if 'n == 1';
otherwise, it returns msg id-plur a l. Thus, our misspelled words example looks like this:
printf ( "% s \ n ", ngettext ( "%d wor d mi sspe lled", "%d words mi sspel l e d", nwords ),
n wo rds ) ;
Note that nwo r ds must be passed to ngettex t ( ) to select a format string, and then
to printf () for formatting. In addition, be careful not to use a macro or expression
whose value changes each time, like 'n++'! Such a thing could happen if you're doing
global editing to add calls to nget text ( ) and you don ' t pay attention.
13.3.4 Making ge t t ext () Easy to Use

The call to g e t tex t () in pro gram source code serves two purposes. First, it does
the translation at runtime, which is the main point, after all. However, it also serves to
mark the strings that need translating. The xget text utility reads program source code
and extracts all the original strings that need translation. CWe briefly cover the mechanics
of this later in the chapter.)
Consider the case, though, of s t a tic strings that aren 't used directly:
static char *c opyrights[ ] = {
"Copyrigh t 2004, Jane Prog ramme r ",
"Permi s si on is granted . . . " ,
LOTS oflegalese here
NULL
};
8 The d etails are given in the GNU ge t te x t d ocumenrario n . Here, we' re focusing o n rh e developer's n eed s, nor
the translato r's.
13.3 Dynam ic T ransia(ion of Program Messages 511
v o id copyright (v o id )
{
int i;
fo r (i = 0 ; c opyr igh t s[ i ] ! = NULL, i++)

p r i ntf ( " %s \ n', ge tt e xt ( c opyrights [ i] )) ;
Here, we'd like to be able to print the translations of the copyright strings if they're
available. However, how is the xgettext extractor supposed to find these strings? We
can't enclose them in calls to ge t text () because that won't work at compile time:
1 * BAD CODE : wo n't c ompile * 1
static char * c o pyr i ghts [] = {
gettext ( "C o pyright 200 4, Jane Pr ogrammer " ) ,
get text ( "Permissi on is g ranted . .. ' ) ,
LOTS of Iega lese here
NULL
};
13.3.4.1 Portable Programs: "get text. h"

We assume here that you wish to write a program that can be used along with the
GNU get t e xt library on any Unix system, not just GNU/Linux systems. The next
section describes what to do for GNU/Linux-only programs.
The solution to marking strings involves two steps. The first is the use of the
g et tex t . h convenience header that comes in the GNU g et t e x t distribution. This
file handles several portability and compilation issues, making it easier to use ge t t e xt ( )
In your own programs:
!ldef ine ENABLE_NLS 1 ENABLCNLS must be true for gettext() to work

!linclude "gettext . h " Insteadof<libintl.h>
If the ENABLE_NLS macto is not defined 9 or it's set to zero, then get t ext. h expands
calls to gettext () into the first argument. This makes it possible to port code using
get text () to systems that have neither GNU get text installed nor their own version.
Among other things, this header file defines the following macro:
9 T hi s macro is usuall y auromatically d efined by rhe c o nf igure program, eirher in a special header or o n rh e
compiler com m a nd lin e. c o nfigure is ge nerared wi rh AU(Qconf and Auromake.
1* A pseudo function call t hat s e r ves a s a ma rker for the automated

extraction of messages , but does not ca l l gettex t(). The run-time
translation is done at a d i f f e r ent place in the code .
The argument, String, should be a literal string . Concatenated strings
and other string expressions won ' t work .
The macro ' s expansion is not parenthes i zed , so that it is suitable as
initializer for static 'char[] ' or ' co n st char[]' variables . * 1
#define gettext_noop(String) String
The comment is self-explanatory. With this macro, we can now proceed to the second
step . We rewrite the code as follows:
#define ENABL E_NLS 1
#include "gettex t . h"
sta t i c char copyr ight s []

gettex t_noop ("Copyri ght 2004 , Jan e Pr og r amme r\n"
" Permis s ion is g r a n t e d .. . \n "
LOTS of legalese here
"So there . ");
void copyright (void)

{
printf("%s\n " , gettext(copyr ights)) ;
Note that we made two changes. First, copyrights is now one long string, built
up by using the Standard C string constant concatenation feature. This single string is
then enclosed in the call to get text_noop ( ) . We need a single string so that the
legalese can be translated as a single entity.
The second change is to print the translation directly, as one string in copyri ght ( ).
By now, you may be thinking, "Gee, having to type 'get text ( . . . ) ' each time is
pretty painful. " Well, you're right. Not only is it extra work to type, it makes program
source code harder to read as well. Thus, once you are using the gettext . h header
file , the GNU g e t text manual recommends the introduction of two more macros,
named _ ( ) and N_ ( ) , as follows :
#de fi ne ENABLE_NLS 1
#i n clude "gettext .h "
#define _(m s gid) gettex t(msg i d)
#define N_(msgid) msgid
This approach reduces the burden of using get text () to just three extra characters
per translatable string constant and only four extra characters for stati c strings:
13.3 Dynamic T ransla[ion of Program Messages 513

#define ENABLE_NLS 1
#include "gettext . h"
#define _ (msgid) gettext (msgid)
static char copyrights[]

N_( "Copyright 2004, Jane Programmer \n"
"Permission is granted ... \ n"
LOTS of legalese here
"So there . " ) ;
void copyright (void )

(
printf("%s\n ", gettext(copyrights)) ;
int main (v o id )
setl o cale (LC_ALL, "o J ; / * gettext . h gets <loca l e . h> for us too * /
printf ("% s \ n", _ ( "hell o , world")) ;
copyright ( ) ;
exit (0) ;
These macros are unobtrusive, and in practice, all GNU programs that use GNU
get text use this convention. If you intend to use GNU get text, you too should
follow this convention.
13.3.4.2 GLlBC Only: <libintl. h >

For a program that will only be used on systems with GUBe, the header file usage
and macros are similar, but simpler:
#include <stdio.h>
#include <libintl . h>
#define _(msgid) gettext (msgid)
". everything else is the same."
As we saw earlier, the <libintl . h> header file declares get text () and the other
functions. You still have to define _ () and N_ ( ) , but you don't have to worry about
ENABLE_NLS, or distributing get text . h with your program's source code.
514 Chapter 13 • Inrernacio nalizacion and Localizacio n
13 .3.S Rearranging Word O rder wi t h printf ( )

When translations are produced, sometimes the word order that is natural in English
is incorrect for other languages. For instance, while in English an adjective appears before
the noun it modifies, in many languages it appears after the noun. Thus, code like the
following presents a problem:
char *animal_color, *anima l ;
if ( ... ) {
animal_color = _( "brown " ) ;
animal = _( "cat" );
else if ( .. . ) {
else (
printf(_("The %5 %5 looks at y ou enquir ingly. \n"), animal_color, color);
Here, the format string, an i mal_color and animal are all properly enclosed in
calls to get text ( ) . However, the statement will still be incorrect when translated,
since the order ofthe arguments cannot be changed at runtim e.
To get around this, the POSIX (but not ISO C) version of the p r intf () family allows
you to provide a positional specifier within a format specifier. This takes the form of a
decimal number followed by a $ character immediately after the initial % character. For
example:
printf ("%2$5 , %l$s\n", "world", "hell o") ;
The positional specifier indicates which argument in the argument list to use; counts
begin at 1 and don ' t include the format string itself. This example prints the famous
'hello, world' message in the correct order.
GLIBC and Solaris implement this capability. As it's part of POSIX, if your Unix
vendor's printf () doesn 't have it, it should be appearing soon.
Any of the regular pri n tf () flags, a field width, and a precision may fo llow the
positional specifier. The tules for using positional specifiers are these:
• The positional specifier form may not be mixed with the nonpositional form. In
other words, either evety format specifier includes a positional specifier or none
of them do. Of course, %% can always be used.
13.3 Dynamic T ranslarion of Program Messages 515
• If the N' th argument is used in the format string, all the arguments up to N must
also be used by the string. Thus, the following is invalid:
printf("%3$s %l$s\n", "he llo", "cruel", "world");
• A particular argument may be referenced with a positional specifier multiple

times. Nonpositional format specifications always move through the argument
list sequentially.
This facility isn' t intended for direct use by application programmers, but rather by
translators. For example, a French translation for the previous format string, "The %s
%s looks at you enquiringly . \ n " , might be:
" Le %2$s %l$s te regarde d'un aire interrogateur . \ n"
(Even this translation isn' t perfect: the article "Le" is gender specific. Preparing a program
for translation is a hard job!)
13.3.6 Testing Translations in a Private Directory

The collection of messages in a program is referred to as the message catalog. This
term also applies to each translation of the messages into a different language. When a
ptogram is installed, each translation is also installed in a standard location, where
get text () can find the right one at runtime.
It can be useful to place translations in a directory other than the standard one, par-
ticularly for program testing. Especially on larger systems, a regular developer probably
does not have the permissions necessary to install files in system directories . The
bindt extdoma i n () function gives get tex t () an alternative place to look for
translations:
#include <libintl . h> CLiBe
cha r *bindtextdomain (const cha r *domainname, canst char *dirname);
Useful directories include' . ' for the current directory and / tmp . It might also be
handy to get the directory from an environment variable, like so:
set locale ( LC_ALL, "" ) ;

textdomain ( "ki llerapp" ) ;
if (( td_dir = getenv( "KILLERAPP_TD_DIR " )) ! = NULL )
bindtextdomain( "killerapp", td_dir) ;
bindtextdomain ( ) should be called before any calls to the gettext () family of

functions. We see an example of how to use it in Section 13.3.8, "Creating Translations,"
page 51 7.
13.3.7 Preparing Internationalized Programs

So far, we've looked at all the components that go into an internationalized program.
This section summarizes the process.
1. Adopt the g e tt ext. h header file into your application, and add definitions
for the _ () and N_ () macros to a header file that is included by all your C
source files. Don't forget to define the ENABLE_ NLS symbolic constant.
2. Call s e tlocale ( ) as appropriate. It is easiest to call 's e tlocale (LC_ A L L,
" " ) " but occasionally an application may need to be more picky about which
locale categories it enables.
3. Pick a text domain for the application, and set it with textdomain ( ) .
4. If testing, bind the text domain to a particular directory with bindtextdo -
ma i n() .
5. Use strfmon (), strf time (), and the ' flag for pr i ntf () as appropriate. If
other locale information is needed, use nl _ langinfo ( ) , particularly in con-
junction with strftime ( ) .
6. Mark all strings that should be translated with calls to _ () or N_ (), as appro-
pnate.
A few should not be so marked though. For example, if you use getopt_long ( )
(see Section 2.1.2, "GNU Long Options," page 27), you probably don't want
the long option names to be marked for translation. Also, simple format strings
like " %d %d \ n " don't need to be translated, nor do debugging messages.
7. When appropriate, use nge ttext () (or its variants) for dealing with va.lues
that can be either 1 or greater than l.
13.3 Dynamic T ranslarion of P rogram Messages 5 17
8. Make life easier for yo ur translators by using multiple strings rep resenting
complete sentences ins tead of do ing word substituti ons with %8 and ? : . For
example:
if (an error occurred) { f* RIGHT *f
f* Use mul tiple strings to make translati o n ea s ier . *f
if ( input _ type == INPUT_ FILE )
fprintf(stderr, _ ( "%s : cannot r e ad file : %s\n" ) ,
a r gv[O], strerror(errno)) ;
else
fpr intf (stderr , _ ( "%s : cannot read pipe : %s\n " ) ,
argv[O], st rer ror(errno)) ;
T his is better than

if (an error occurred) f * WRONG * f
fprintf( s tderr, _ ( "%s : cannot rea d %s : %s\n " ) , argv[O],
input_type = = INPUT_FILE ? _ ("file") : _ ("pipe") ,
screrro r ( errno )) ;
As just shown, it's a go od idea to include a co mment stating that there are
multiple messages on purpose-to m ake it easier to translate the messages.
13.3 .8 Creating Translations

O nce your program has been internatio nalized , it's necessary to prepare translations.
This is done with several shell-level tools. We start with an internationalized versio n of
ch06 -echodate . c, from Section 6. l.4, "Converting a Broken-Down Time (0 a
time_t ," page 176:
f * c h 1 3 -ec h odate . c -- - demonst ra t e transl ations * f

#include <time . h>
#include <locale . h >
#defi ne ENABLE_NLS 1
# i nclude "gettex t . h "
#de fine _ (msgid) g e ttext(msg i d)
in t main (v o i d )
struct tm tm;
time_ t then ;
s e tloca le (LC_ALL, "" ) ;

bindt e x tdomain ( " echodate", "." ) ;
tex tdoma in ( " e choda te " ) ;
pri n t f ( "%s " , _ ( " Ent er a Date / time as YYYY / MM /OO HH : MM:SS "));
scanf( "%d/%d/%d %d : %d : %d ",
& tm . tm-ye a r , & tm . tm_ mon , & t m. tm_mday,
& tm . tm_ho ur, & t m. tm_ min, & tm . t m_se c) ;
/ * Err o r che c k ing on valu es omi t ted f or b r evity . * /

tm . tm-year -= 190 0 ;
tm . tm_ mon -= 1;
tm . tm_isds t = -1; / * Don' t know a b out DST * /
then = mktime (& tm ) ;
pr in tf(_ ( "Got : %s " ) , ct ime (& then )) ;

exit (0 ) ;
We have purposely used "get text. h" and not < get text. h>. If our application
ships with a private copy of the g e t text library, then "g et text. h" will find it,
avoiding the system's copy. On the other hand, if there is only a system copy, it will
be found if there is no local copy. The situation is admittedly complicated by th e fact
that Solaris systems also have a g e t tex t library which is not as featureful as the
GNU version.
Moving on to creating translations, the first step is to extract the translatable strings.
T his is done with the x get text program:
$ xgettext --keyword= _ --keyword=N_ \
> --default-domain=echodate ch13-echodate.c
The - - k e ywo r d options tell x get text to look for the _ () and N_ () macros . It
already knows to extract strings from gettext () and its variants, as well as from
get tex t _ n oop ( ) .
The output from xget tex t is called a portable object file. The default filename
is messages .po , corresponding to the default text domain of "messages ". The
-- defau l t - domain option indicates the text domain, for use in naming the output
file. In this case, the file is named echodate . po . Here are its contents:
13.3 Dynamic Transla(ion of Program Messages 519
# SOME DESCRIPTIVE TITLE . Boilerplate, to be edited

# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package .
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR .
#
#, fuzzy
msgid .. .. Detailed informa tion
msgstr .. .. Each translator completes
"Project-Id-Version : PACKAGE VERSION\n"
" Report-Msgid-Bugs-To : \n"
"POT-Creation-Date : 2003-07-14 18 : 46-0700\n"
" PO-Revision-Date : YEAR-MO-DA HO : MI+ZONE\n "
"Last-Translat or : FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team : LANGUAGE <LL@li . org>\n"
" MIME-Ve rs ion : 1 . 0\n"
"Content-Type : text/plain ; charset=CHARSET\n"
"Content-Transfer-Encoding : 8bit\n"
# : ch13-echodate . c : 19 Message location

msgid "Enter a Date/time as YYYY /MM/DD HH : MM : SS : .. Original message
msgstr .... Tra nslation goes here
# : ch13-echodate . c : 32 Same for each message

#, c-format
msgid "Got : %5"
msgstr ....
This original file is reused for each translation. It is thus a template for translations,
and by convention it should be renamed to reflect this fact, with a . pot (portable object
template) suffix:
$ mv echodate.po echodate.pot
Given that we aren 't fluent in many languages, we have chosen to translate the
messages into pig Latin. Thus, the next step is to produce a translation. We do this by
copying the template file and adding translations to the new copy:
$ cp echodate.pot piglat.po
$ vi piglat.po Add translations, use your favorite editor
The filename convention is language . po where l angua ge is the two- or three-
character international standard abbreviation for the language. Occasionally the form
language_coun try . po is used: for example, p t _BR. po for Portugese in Brazil. As
pig Latin isn' t a real language, we've called the file piglat .po . Here are the contents,
after the translations have been added:
# echodate translat ions into pig Latin

# Copyrigh t (C) 2004 Prentice-Hall
# This file is distributed under the same license as the echodate package.
# Arnold Robbins <arnold@example.com> 2004
#
#, fuzzy
msgid ""
msgstr ""
" Project -I d - Version: echodate 1 . 0\n"
"Report-Msgid-Bugs-TO: arnold@exampl e. com\n"
"POT-C reation -Date : 2003-07-14 18 : 46-0700\n "
"PO-Revision- Date: 2003-07-14 19 : 00+8\n "
" Last-Translator : Arnold Robbins <arnold@examp le . com>\n "
"Language-Team : Pig Latin <piglat@li .examp le . org> \n"
"MIME-Version : 1.0\n"
"Content - Type : text/plain; charset =ASCII \n"
"Content-Transfer-Encoding: 8bit\n"
#: chI3-echodate.c : 19
msgid "Enter a Da tel time as YYYY 111J.! / DD HH : MM: SS
msgstr "Enteray A Ateday/imetay asay YYYY/MM/DD HH:MJ.! : SS
#: ch13-ec hodate. c : 32
#, c- format
msgid "Got: %s"
msgstr "Otgay : %s"
While it would be possible to do a linear search directly in the portable object file ,
such a search would be slow. For example, gawk has approximately 350 separate mes-
sages, and the GNU Coreutils have over 670. Linear searching a file with hundreds of
messages would be noticeably slow. Therefore, GNU gett ext uses a binary format
for fast message lookup. msgfmt does the compilation, producing a message object file:
$ msgfmt piglat.po -0 piglat.mo
As program maintenance is done, the strings used by a program change: new strings
are added, others are deleted or changed. At the very least, a string's location in the
source file may move around. Thus, translation . p o files will likely get out of date. The
msgmerge program merges an old translation file with a new . pot file. The result can
then be updated. This example does a merge and then recompiles:
$ msgmerge piglat.po echodate . pot -0 piglat . new . po Merge ~/es
$ mv piglat . new. po piglat .po Rename the result
$ vi piglat. po Bring translations up to date
$ msgfmt piglat .po -0 piglat .mo Recreate .mo ~/e
Compiled . mo files are placed in the file basel locale / category/ textdomain. mo.
On GNU/Linux systems, base is' / usr lsha re / locale. locale is the language, such
as' es', 'fr', and so on. category is a locale category; for m essages , it is LC_MESSAGES.
13.4 Can You Spell Thar for Me, Please? 521
textdomain is the text domain of the program: in our case, echodate . As a real
example, the GNU Coreutils Spanish translation is in /us r /s hare /local e / es /
LC_MESSAGES/coreutils . mo .
The bindtextdoma i n () function changes the base part of the location. In
ch13- e chodate.c, we change it to ' . '. Thus, it's necessary to make the appropriate
directories, and place the pig Latin translation there:
$ mkdir -p en_ US/LC_ MESSAGES Have to use a real locale
$ cp piglat .mo en_ US/LC_ MESSAGES/echodate .mo Put the fie in the right place
A real locale must be used; 10 thus, we "pretend" by using" e n_ US ". With the trans-
lation in place, we set LC_ ALL appropriately, cross our fingers, and run the program:
$ LC_ ALL=en_ US ch13-echodate Run the program
Enteray A Ateday / imetay asay YYYY / MM/DD HH : MM : SS : 2003/07/14 21:19:26
Otgay : Mon Jul 14 21 : 19 : 26 2003
The latest version of GNU get text can be found in the GNU get text distribution
directory. 11
This section has necessarily only skimmed the surface of the localization process.
GNU gettext provides many tools for working with translations, and in particular
for making it easy to keep translations up to date as program source code evolves.
The manual process for updating translations is workable bur tedious. This task is
easily automated with make ; in particular GNU get text integrates well with Autoconf
and Automake to provide this functionaliry, removing considerable development burden
from the programmer.
We recommend reading the GNU get t ex t documentation to learn more about
both of these issues in particular and about GNU get tex t in general.
13.4 Can You Spell That for Me, Please?

In the very early days of computing, different systems assigned different correspon-
dences between numeric values and glyphs-symbols such as letters, digits, and punctu-
ation used for communication with humans. Eventually, two widely used standards
emerged: the EBCDIC encoding used on IBM and workalike mainframes, and ASCII ,
lOWe spenc a fru strating 30 o r 45 minutes attemptin g to use a piglat!LC_MESSAGES directory and se tting
'LC_ALL=piglat', all to no effect, uncil we fi gured this our.
I I ftp : !!ftp . gnu . org!gnu !gettext

used on everything else. Today, except on mainframes, ASCII is the basis for all other
character sets currently in use.
The original seven-bit ASCII character set suffices for American English and most
punctuation and special characters (such as $, but there is no character for the "cent"
symbol). However, there are many languages and many countries with different char-
acter set needs. ASCII doesn't handle the accented versions of the roman characters
used in Europe, and many Asian languages have thousands of characters. New technolo-
gies have evolved to solve these deficiencies.
The i 18n literature abounds with references to three fundamental terms. Once we
define them and their relationship to each other, we can present a general description
of the corresponding C APIs.
Character set
A definition of the meaning assigned to different integer values; for example, that
A is 65 . Any character set that uses more than eight bits per character is termed a
multibyte character set.
Character set encoding
ASCII uses a single byte to represent characters. Thus, the integer value is stored
as itself, directly in disk files . More recent character sets, most notably different
versions of Unicode,12 use 16-bit or even 32-bit integer values for representing
characters. For most of the defined characters, one, two , or even three of the
higher bytes in the integer are zero, making direct storage of their values in disk
files expensive. The encoding describes a mechanism for converting 16- or 32-bit
values into one to six bytes for storage on disk, such that overall there is a significant
space savings.
Language
The rules for a given language dictate character set usage. In particular, the rules
affect the ordering of characters. For example, in French, e, e, and e should all
come between d and f, no matter what numerical values are assigned to those
characters. Different languages can (and do) assign different orderings to the
same glyphs.
12 http: // www . unicode . org

13.4 Can You Spell Thar for Me, Please? 523
Various technologies have evolved over time for supporting multibyte character sets.
Computing practice is slowly converging on Unicode and its encoding, but Standard
C and POSIX support both past and present techniques. This section provides a con-
ceptual overview of the various facilities. We have not had to use them ourselves, so we
prefer to merely introduce them and provide pointers to more information.
13.4.1 W ide Ch aracters

We start with the concept of a w ide character. A wide character is an integer rype
that can hold any value of the particular multibyte character set being used.
Wide characters are represented in C with the rype wcha r _t . C99 provides a corre-
sponding wint_t rype , which can hold any value that a wc har_t can hold, and the
special value WEOF , which is analogous to regular E OF from <s tdio . h> . The various
rypes are defined in the <wcha r. h> header file. A number of functions similar to those
of <ctype . h> are defined by the <wctype . h> header file, such as iswalnum ( ) , and
many more.
Wide characters may be 16 to 32 bits in size, depending on the implementation. As
mentioned, they're intended for manipulating data in memory and are not usually
stored directly in fi les.
For wide characters, the C standard provides a large number of functions and macros
that correspond to the traditional functions that work on char data. For example,
wprin t f ( ) , iswlowe r ( ) , and so on. These are documented in the GNU/Linux
man pages and in books on Standard C.
13 .4.2 Multibyte Character Encodings

Strings of wide characters are stored on disk by being converted to a mu!tibyte
character set encoding in memory, and the converted data is then written to a disk file.
Similarly, such strings are read in from disk through low-level block I/O, and converted
in memory from the encoded version to the wide-character version.
Many defined encodings represent multibyte characters by using shift states. In other
words, given an input byte stream, byte values represent themselves until a special
control value is encountered. At that point, the interpretation changes according to the
current shift state. Thus , the same eight-bit value can have two meanings: one for the
normal, unshifted state, and another for the shifted state. Correctly encoded strings are
supposed to start and end in the same shift state.
A significant advantage to Unicode is that its encodings are self-correcting; the en-
condings don 't u se shift states, so a loss of data in the middle does not corrupt the
subsequent encoded data.
The initial versions of the mulribyte-to-wide-character and wide-character-to-
mulribyre functions maintained a private copy of the state of the translation (for example,
the shift state, and anything else that might be necessary). This design limits the func-
tions' use to one kind of translation throughout the life of the program. Examples are
mblen ( ) (multibyte-character length), mbtowe () (multi byte to wide character), and
wetomb ( ) (wide character to multibyte), mbs towes () (multi byte string to wide-char-
acter string), and westombs () (wide-character string to mulribyte string).
The newer versions of these routines are termed restartable. This means that the user-
level code maintains the state of the translation in a separate object, of rype mbs ta te_ t .
The corresponding examples are mbrlen (), mbrtowe (), and wert omb () , mbsrtowes ()
and wesrt ombs ( ) . (Note the r, for "restartable, " in their names .)
13.4.3 Languages
Language issues are controlled by the locale. We've already seen setl oeale ( ) earlier
in the chapter. PO SIX provides an elaborate mechanism for defining the rules by which
a locale works; see the CNU/Linux locale(5) manpage for some of the details and the
POSIX standard itself for the full story.
The truth is, yo u really don' t want to know the details. Nor should you, as an appli-
cation developer, need to worry about them; it is up to the library implementors to
make things work. All you need to do is understand the concepts and make your code
use the appropriate functions, such as streoll () (see Section 13.2.3, "String Collation:
streoll () and strxfrm() ," page 490) .
Current CUBC systems provide excellent locale support, including a mulribyre-
aware suite of regular expression matching routines. For example, the POSIX extended
regular expression [[ : alpha : 1 1 [ [ : alnum: 1 1 + matches a letter followed by one or
more letters or digits (an alphabetic character followed by one or more alphanumeric
13.4 Can You Spell That for Me, Please? 525
ones). The definition of which characters matches these classes depends on the locale.
For example, this regular expression would match the two characters 'ee ', whereas the
traditional Unix, ASCII-oriented regular expression [a- z A-Z] [a - A-ZzO - 9] + most
likely would not. The POSIX character classes are listed in Table 13.5.
TA BLE 13.5
POSIX regu lar exp ression character classes
Class n a m e Matc hes

[ : alnum : 1 Alphanumeric characters .
[ : alpha: 1 Alphabetic characters.
[ : blan k : 1 Space and TAB characters .
[ : cnt r l : 1 Control characters.
[ : digi t : 1 Numeric characters.
[ : graph : 1 Characters that are both printable and visible. (A newline is printable but not
visible, whereas a $ is both.)
[ : l owe r: 1 Lowercase alphabetic characters.
[ : p r in t: 1 Printable characters (not control characters).
[ : punc t : 1 Punctuation characters (not letters, digits, contro l characters, or space
characters) .
[ : space : 1 Space characters (s uch as space, TAB, newline, and so on) .
[ : upper: 1 Uppercase alphabetic characters.
[ : xd i g i t : 1 Characters from the set abcdefABCDEF 0123 4 56789 .
13.4.4 Conclusion
You may never have to deal with different character sets and encodings. On the
other hand, the world is rapidly becoming a "global village," and software authors and
vendors can't afford to be parochial. It pays, therefore, to be aware of internationalization
issues and character set issues and the way in which they affect your system's behavior.
Already, at least one vendor of GNU/Linux distributions sets the default locale to be
e n_US. UTF- 8 for systems in the United States.

1. C, A Reference Manual, 5th edition, by Samuel P. Harbison III and Guy L.
Steele, Jr. , Prentice-Hall, Upper Saddle River, New Jersey, USA, 2002 . ISBN :
0-13-089592-X.
We have mentioned this book before. It provides a concise and comprehensible
description of the evolution and use of the multibyte and wide-character facil-
ities in the C standard library. This is particularly valuable on modern systems
supporting C99 because the library was significantly enhanced for the 1999 C
standard.
2. GNU gettext tools, by Ulrich Drepper, Jim Meyering, Franc;:ois Pinard, and
Bruno Haible. This is the manual for GNU gettex t . On a GNU/Linux system,
you can see the local copy with 'in f o get tex t '. Or download and print the
latest version (from ftp : / / f tp . gnu . org / gnu / gettex t/ ).
13.6 Summary
• Program internationalization and localization fall under the general heading of
native language support. i18n, lIOn , and NLS are popular acronyms. The central
concept is the locale, which customizes the character set, date, time, and monetary
and numeric information for the current language and country.
.. Locale awareness must be enabled with setlocale ( ). Different locale categories
provide access to the different kinds of locale information. Locale-unaware pro-
grams act as if they were in the " c " locale, which prod uces results typical of Unix
systems before NLS: 7-bit ASCII, English names for months and days, and so on .
The" POS I X " locale is equivalent to the" C" one.
• Locale-aware string comparisons are done with s trcoll () or with the combination
of s trx f rm () and s tr cmp ( ) . Library facilities provide access to locale information
(l oca l econv () and nl_ langinf o ( ) ) as well as locale-specific information for-
matting (strfmon (), st r ft i me ( ) , and pr i n tf () ).
• The flip side of retrieving locale-related information is producing messages in the
local language. The System V catgets () design, while standardized by POSIX,
13.7 Exercises 527
is difficult to use and not recommended. 13 Instead, GNU get text implements
and extends the original Solaris design.
• With get text ( ), the original English message string acts as a key into a binary
translation file from which to retrieve the string's translation. Each application
specifies a unique text domain so that get text () can find the correct translation
file (known as a "message catalog"). The text domain is set with textdomain ().
For testing, or as otherwise needed, the location for message catalogs can be
changed with bindtextdomain ().
• Along with get text ( ) , variants provide access to translations in different text
domains or different locale categories. Additionally, the ngettext () function
and its variants enable correct plural translations withour overburdening the devel-
oper. The positional specifier within printf () format specifiers enables translation
of format strings where arguments need to be printed in a different order from
the one in which they're provided.
• In practice, GNU programs use the get text. h header file and _ () and N_ ( )
macros for marking translatable strings in their source files. This practice keeps
program source code readable and maintaina ble while still providing the benefits
of iI8n and lIOn.
• GNU get text provides numerous tools for the creation and management of
translation databases (portable object files) and their binary equivalents (message
object files).
• Finally, it pays to be aware of character set and encoding issues. Software vendors
can no longer afford to assume that their users are willing to work in only one
language.
Exercises
1. Does your system support locales? If so, what is the default locale?
2. Look at the loeale(l) manpage if you have it. How many locales are there if
you count them with 'locale -a I wc -l'?
J3 GNU/Li nux supp orrs it, bur on ly for compatib il ity.

528 Chapter 13 • Inrernarionaiizarion and Localizarion
3. Experiment with ch13-strings. c, ch13-1conv. c, ch13-strfmon. c,

ch13-quoteflag. c, and ch13-times. c in different locales. What is the most
"unusual" locale you can find, and why?
4. Take one of your programs. Internationalize it to use GNU gettext. Try to
find someone who speaks another language to translate the messages for you.
Compile the translation, and test it by using bindtextdomain ( ) . What was
your translator's reaction upon seeing the translations in use?
In this chapter
• 14.1 All ocating Aligned Memory: posix_memalign ()

and memalign ( ) page 530
• 14.2 Locking Files page 531
• 14.3 More Precise Times page 543
• 14.4 Advanced Searching with Binary Trees page 551
529
T his chapter describes several extended APIs . The APIs here are similar in nature
to those described earlier in the book or provide additional facilities. Some of
them could not be discussed easily until after the prerequisite topics were covered.
The presentation order here parallels the order of the chapters in the first half of
the book. The topics are not otherwise related to each other. We cover the following
topics: dynamically allocating aligned memory; file locking; a number of calls that
work with subsecond time values; and a more advanced suite of functions for storing
and retrieving arbitrary data values. Unless stated otherwise, all the APIs in this
chapter are included in the POSIX standard.
14.1 Allocating Aligned Memory: pas ix_merna 1 ign ( )

and rnernal ign ( )
For most tasks, the standard allocation routines, ma lloe ( ) , realloe ( ) , and so on,
are fine. Occasionally, though, you may need memory that is aligned a certain way. In
other words, the address of the first allocated byte is a multiple of some number. (For
example, on some systems, memory copies are significantly faster into and out of word-
aligned buffers.) Two functions offer this service:
#include <stdlib . h>
int posix_memalign(void * *memptr , size_t alignment, size_t size); POSIXADV

void *memalign(size_t boundary, size_t size ) ; Common
p o si x_ mema lign () is a newer function; it's part of yet another optional extension,
the "Advisory Information" (ADV) extension. The function works differently from
most other allocation Linux APIs. It does not return -1 when there's a problem.
Rather, the return value is 0 on success or an er r n o value on failure. The arguments
are as follows:
vo i d * *memptr
A pointer to a void * variable. The pointed-to variable will have the address of
the allocated storage placed into it. The allocated storage can be released with
free () .
siz e _ t al i g nmen t
The required alignment. It must be a multiple of sizeo f (vo i d *) and a power
of2.
530
14.2 Locking Files 531
size t size
The number of bytes to allocate.
mema li g n () is a nonstandard but widely available function that works similarly.

T he rerurn value is NULL on failure or the requested storage on success, with boundary
(a power of 2) indicating the alignmem and si z e indicating the requested amoum of
memory.
Traditionally, storage allocated with memalign () could not be released with fr ee ( ) ,
since me mali gn () would allocate storage with mall oc () and rerurn a poimer to a
suitably-aligned byte somewhere within that storage. T he GLIBC version does not have
this problem. Of the two , yo u should use pos ix_memalign () if yo u have it.
14.2 Locking Files

M odern Unix sys tems, including GNU/Linux, allow yo u to lock part or all of a file
for reading and writing. Like many parts of the Unix API that developed after V7, there
are multiple, conflicting ways to do file locking. This section covers the possibilities.
14.2.1 File Locking Concepts

Just as the lock in your front door prevents unwanted ently into your home, a lock
on a file prevents access to data within the file . File locking was added to Unix after the
development ofV7 (from which all m odern Unix systems are descended) and thus , for
a while, multiple, conflicting file-l ocking mechanisms were available and in use on dif-
ferent Unix systems. Both BSD Unix and System V had their own incompatible locking
calls. Eventually, POSIX formalized the System V way of doing file locks. Fortunately,
the names of the calls were different between Sys tem V an d BSD , so GNU/ Linux, in
an effort to please everyone, supports both kinds of locks.
Table 14.1 summarizes the different kinds oflocks.
There are multiple aspects to locking, as follows:
Record locking
A record lock is a lock on a portion of the file. Since Unix files are just byte streams,
it would be more correct to use the term range lock since the lock is on a range of
bytes. N evertheless, the term "record lock" is in common use.
532 Chapter 14 • Extended Interfaces
TABLE 14.1
File locking functions
Source Function Record Whole file R/W Advisory Mandatory

BSD flock ( ) ./ ./ ./
POSIX fcn tl () ./ ./ ./ ./ ./
POSIX lockf () ./ ./ ./ ./ ./
Whole file locking

A whole file lock, as the name implies, locks the entire file, even if its size changes
while the lock is held. The BSD interface provides only whole-file locking. To
lock the whole file using the POSIX interface, specify a length of zero. This is
treated specially to mean "the entire file."
Read locking
A read lock prevents writing on the area being read. There may be multiple read
locks on a file, and even on the same region of a file , without them interfering
with each other, since data are only being accessed and not changed.
Write locking
A write lock provides exclusive access to the area being written. If that area is
blocked with a read lock, the attempt to acquire a write lock either blocks or fails ,
depending on the type of lock request. Once a write lock has been acquired, an
attempt to acquire a read lock fails.
Advisory locking
An advisory lock closely matches your front-door lock. It's been said that "locks
keep honest people honest," meaning that if someone really wishes to break into
your house, he will probably find a way to do so, despite the lock in your fro nt
door. So too with an advisory lock; it only works when evetyone attempting to
access the locked file first attempts to acquire the lock. However, it's possible for
a program to completely ignore any advisory locks and do what it pleases with the
file (as long as the file permissions allow it, of course).
Mandatory locking
A mandatory lock is a stronger form of lock: When a mandatory lock is in place,
no other process can access the locked file. Any process that attempts to ignore
the lock either blocks until the lock becomes available or will have its operation
fail. (Under GNU/Linux, at least, this includes root !)
Advisory locking is adequate for cooperating programs that share a private file, when
no other application is expected to use the file. Mandatory locking is advisab le in situ-
ations in which avoiding conflicting file use is critical, such as in commercial
database systems.
POSIX standardizes only adviso lY locking. Mandatory locking is available on
GNU/Linux, as well as on a number of commercial Unix systems , but the details vary.
We cover the GNU/Linux details later in this section.
14.2.2 POSIX Locking: f c ntl () and lockf ()

The fcntl () (file control) system call is used for file locking. (Other uses for fnctl ( )
were described in Section 9.4.3 , "Managing File Attributes: fcntl ( ) ," page 328.) It
is declared as fo llows:
#include <unistd .h> POSIX
in t fcntl(int fd, int cmd); Not relevant for file locking

int fcntl(int fd, int cmd, long arg ) ; Not relevant for file locking
int fcntl(int fd, int cmd , struct flock * lock);

fd The file descriptor for the open file.
cmd One of the symbolic constants defined in < fcntl . h> . These are described
in more detail below.
loc k A pointer to a struct flock describing the desired lock.
14.2.2.1 Describing a Lock

Before looking at how to get a lock, let's examine how to describe a lock to the oper-
ating system. You do so with the st r uct floc k structure, which describes the byte
range to be locked and the kind of lock being requested. The POSIX standard states
that a str uct flock contains "at least" certain members. This allows implementations
to provide addi tional structure members if so desired. From the fcntl(3) manpage,
slightly edited:
534 Chapter 14 • Exrended Inrerfaces
st r uct flock {
s ho r t l _ t ype ; Type oflock: F_ROLCK, F_WRLCK, F_UNLCK

s ho r t l_whence; How to interpret I_start: SEEK_SET, SEEK_CUR, SEEKJNO
off _ t l _s tar t ; Starting offset for lock
off_t l_len; Number of bytes to lock; 0 means from start to end-offle
pid_t l-pid; PIO of process blocking our lock (F_GETLK only)
};
The l_s t a rt field is the starting byte offset for the lock l _ l en is the length of the
byte range, that is, the total number of bytes to lock l _whence specifies the point in
the file that l _ start is relative to; the values are the same as for the whence argument
to lseek ( ) (see Section 4.5 , "Random Access: Moving Around within a File,"
page 102) , hence the name for the field. The structure is thus self-contained: The
l _ start offset and l_whence value are not related to the current file offset for reading
and writing. Some example code might look like this:
struct employee { /* whatever * / }; / * Describe an employee */
struct flock lock; / * Lock structure * /
/ * Lock si x th struct employ ee * /

lock . l_when ce = SEEK_S ET; / * Absolute po s ition * /
lock . l_start = 5 * sizeof(struct employ ee ) ; / * Start of 6th structure * /
lock . l_len = s iz eof(struct employee ) ; / * Lock one reco r d * /
Using SEEK_CUR or SE EK_END , you can lock ranges relative to the current position
in the file, or relative to the end of the file , respectively. For these two cases, l_s t ar t
may be negative, as long as the absolute starting position is not less than zero . Thus, to
lock the last record in a file:
/* Loc k last struct employ ee * /
lock . l_whence = SE EK_END; / * Relative to EOF * /
lock . l_start = -1 * s izeof(struct emp loy ee); / * Start of last structure * /
lock . l_len = sizeof ( s t r uct e mployee) ; / * Lock on e record * /
Setting l_len to 0 is a special case. It means lock the file from the starting position
indicated by l _ star t and l _whence through the end of the file. This includes any
positions past the end of the file, as well. (In other words, if the file grows while the
lock is held, the lock is extended so that it continues to cover the entire file.) Thus,
locking the entire file is a degenerate case of locking a single record:
lock . l_whenc e = SEEK_S ET ; / * Ab s olute position * /
lock . l_s tart = 0 ; / * St a rt of file * /
lock . l _ len = 0 ; / * Thr ough end of file * /
The fnctl(3) manpage has this note:
POSIX 1003.1-2001 allows l_len to be negative. (And ifit is, the interval
described by the lock covers bytes l_start + l_len up to and including
l_start - 1.) However, for currenr kernels the Linux system call returns
EINVAL in this situation.
(We note that the man page refers to the 2.4.x series of kernels ; it may pay to check the
currenr manpage if your system is newer.)
Now that we know how to describe where in the file to lock, we can describe the type
of the lock with l_type. The possible values are as follows:
F RDLCK A read lock. The file must have been opened for reading to apply a read
lock.
F _WRLCK A write lock. The file must have been opened for writing to apply a write
lock.
F _UNLCK Release a previously held lock.
Thus , the complete specification of a lock involves setting a total of four fields in the
struct flock structure: three to specifY the byte range and the fourth to describe the
desired lock type.
The F _UNLCK value for l_type releases locks. In general, it's easiest to release exactly
the same locks that you acquired earlier, but it is possible to "split" a lock by releasing
a range of bytes in the middle of a larger, previously locked range. For example:
struct employee { / * whatever */ }; / * Describe an employee * /
struct flock lock; /* Lock structure */
/ * Lock struct employees 6-8 */

lock . I_whence = SEEK_S ET ; /* Absolute position */
lock . I_start = 5 * sizeof(struct employee); / * Start of 6th structure * /
lock . I_len = sizeof(struct employee) * 3; /* Lock three records */
... obtain lock (see next section) ...
/ * Release record 7 : this splits the previous lock into two : */
lock . I_whence = SEEK_SET; / * Absolute position * /
lock . I_start = 6 * sizeof(struct employee) ; /* Start of 7th structure * /
lock . I_len = sizeof(struct employee) * 1; / * Unlock one record */
.. . release lock (see next section) .
536 Chapter 14 • Extended Imerfaces
14.2.2.2 Obtaining and Releasing Locks

Once the s t ruc t fl ock has been filled in, the next step is to request the lock. This
step is done with an appropriate value for the cmd argument to fcntl ( ) :
F GETLK Inquire if it's possible to obtain a lock.
F _SETLK Obtain or release a lock.
F _ SETLKW Obtain a lock, waiting until it's available.
The F _G ET L K command is the "Mother may I?" command. It inquires whether the
lock described by the s truc t fl oc k is available. If it is, the lock is not placed; instead,
the operating system changes the l_type field to F _ UNL CK . The other fields are
left unchanged.
If the lock is not available, then the operating system fills in the various fields with
information describing an already held lock that blocks the requested lock from being
obtained. In this case, l-p i d contains the PID of the process holding the described
lock. l There's not a lot to be done if the lock is being held, other than to wait awhile
and try again to obtain the lock, or print an error message and give up.
The F _ SETLK command attempts to acquire the specified lock. If f cntl ( ) returns
0, then the lock has been successfully acquired. If it returns -1, then another process
holds a conflicting lock. In this case, errn o is set to either EAGAIN (try again later) or
EACC ES (access denied) . Two values are possible, to cater to historical systems.
The F _SETLKW command also attempts to acquire the specified lock. It differs from
F _S ETLK in that it will wait until the lock becomes available.
Once you've chosen the appropriate value for the cmd argument, pass it as the second
argument to f cn tl ( ) , with a pointer to a filled-in struc t fl oc k as the third argument:
st r uct flock lock ;
int fd;
... open file, fill in struct flock ..
if (fcntl(fd, F_SETLK, & lo ck ) < 0) (
/ * Could not acqui r e loc k, a tt empt t o rec ove r * /
1 The GNU/Linux j ent/(3 ) man page points out that this may nor be enough information; the process could be
residing on another machine! There are other issues with locks held across a network; in general , using locks on
filesystems mounted from remote computers is nor a good idea.
The lockf () function 2 provides an alternative way to acquire a lock at the current
file position:
#include <sys/file . h> XSI
int loc k f(int fd, int cmd , off_t len) ;
The file descriptor, fd, must have been opened for writing. len specifies the number
of bytes to lock: from the current position (call it pos) to pas + len bytes if len is
posltlve, or from pas - len to pos - 1 if l en is negative. The commands are
as follows:
FLOCK Sets an exclusive lock on the range. The call blocks until the lock becomes
available.
F_TLOCK Tries the lock. This is like F _LOC K, but if the lock isn' t available, F _T LOCK
returns an error.
F ULOCK Unlocks the indicated section. This can cause lock splitting, as described
earlier.
Sees if the lock is available. If it is, returns 0 and acquires the lock. Oth-
erwise, it returns -1 and sets e r r n o to EACC E S .
The return value is 0 on success and -1 on error, with e r rno set appropriately.
Possible error returns include:
EAGAIN The file is locked, for F _TLOCK or F _T E ST .
EDEA DLK For F _T L OCK, this operation would cause a deadlock. 3

ENOL CK The operating system is unable to allocate a lock.
The combination of F _ T L OCK and EDEAD L K is useful: If you know that there can
never be potential for deadlock, then use F _L OC K. Otherwise, it pays to be safe and use
F _TLOCK. If the lock is available, you'll get it, but if it's not, you have a chance to recover
instead of blocking, possibly forever, waiting for the lock.
When you're done with a lock, you should release it. With fcntl ( ) , take the original
struct l oc k used to acquire the lock, and change the l _type field to F _ UNL CK . Then
use F _ SETLK as the cmd argument:
2 O n GNU/Linux, lock f () is impl em enred as a "wrapper" around f c nt l ( ) .

3 A deadlock is a sicu ati o n in which two processes would both block, each waiting on the oth er to release a resource.
lock . l - whence = . .. ; /* As before */

lock .l - start = ... ; /* As before */
lock.l - len = ... ; /* As before */
lock . l _type = F _UNLCK; /* Unlock */
if (fcntl(fd, F _SETLK, & lock) < 0) (
/* handle error * /
/* Lock has been released */
Code using l ock f () is a bit simpler. For brevity, we've omitted the error checking:
off_t curpos, len;
curpos = lseek(fd, (ofCt) 0 , SEEK_CUR); Retrieve current position

len = ... Set correct number of bytes to lock
lockE (Ed, F_LOCK, len); Acquire lock

... use locked area here .. .
lseek(Ed, curpos, SEEK_SET); Return to position of lock
lockf (fd, F_ULOCK, len); Unlockflie
If you don 't explicitly release a lock, the operating system will do it for you in two
cases . The first is when the process exits (either by main () returning or by the exi t ( )
function, which was covered in Section 9.1.5.1, "Defining Process Exit Status ,"
page 300). The other case is when you call close () on the file descriptor: more on
this in the next section.
14.2.2.3 Observing Locking Caveats

There are several caveats to be aware of when doing file locking:
• Ai> described previously, advisory locking is just that. An uncooperative process

can do anything it wants behind the back (so to speak) of processes that are doing
locking.
• These calls should not be used in conjunction with the <stdio . h> library. This
library does its own buffering, and while you can retrieve the underlying file de-
scriptor with f i lena ( ) , the actual position in the file may not be where you think
it is. In general, the Standard I/O library doesn' t understand file locks.
• Bear in mind that locks are not inherited by child processes after a fork but that
they do remain in place after an exec.
• A close () of any file descriptor open on the file removes aii of the process's locks
on a file , even if other file descriptors remain open on it.
That clos e () works this way is unfortunate, but because this is how fcnt l ()
locking was originally implemented, POSIX standardizes it. M aking this behavior the
standard avoids breaking existing Unix code.
14.2 .3 BSD Locking: fl o ck ( )

4.2 BSD Unix introduced its own file locking mechanism, floc k ( ) .4 It is declared
as foll ows:
#inc lude <sys / file . h> Common
int flock ( int fd, int opera tion) ;
The file descriptor fd represents the open file. These are the operations:
L OCK_SH Creates a shared lock. There can be multiple shared locks.
Creates an exclusive lock. There can be only one such lock.
Removes the previous lock.
When bitwise-OR'd with L OCK_ SH or LOCK_EX avoids blocking if the
lock isn't available.
By defa ult, the lock requests will block (not rerurn) if a competing lock exists. Once
the competing lock is removed and the requested lock is obtained, the call returns .
(This implies that, by default, there is potential for deadlock.) To attempt to obtain a
lock without blocking, perform a bitwise OR of LOCK_NB with one of the other values
for operat i o n.
The salient points about fl oc k () are as follows:
• fl oc k () locking is also advisory locking; a program that does no locking can
come in and blast, with no errors , a file locked with floc k ( ) .
• The whole file is locked. There is no mechanism for locking or unlocking just a
part of the file.
4 Ir is fo rrun are rh ar floc k () is a differenr name fro m lock f ( ) , since the semanrics are different. Ir is also terribly
co nfusing. Keep yo ur manual handy.
• How the file was opened has no effect on the type of lock that may be placed.
(Compare this to fc ntl ( ) , whereby the file must have been opened for reading
for a read lock, or opened for writing for a write lock.)
• Multiple file descriptors open on the same file share the lock. Any one of them
can be used to remove the lock. Unlike fc n tl ( ), when there is no explicit unlock,
the lock is not removed until all open file descriptors for the file have been closed.
• Only one fl ock () lock can be held by a process on a file; calling floc k ( ) succes-
sively with rwo different lock types changes the lock to the new type.
• On GNU/Linux systems , fl oc k () locks are completely independent of f cntl ( )
locks. Many commercial Unix systems implement fl oc k ( ) as a "wrapper" on
top of fcnt l ( ), but the semantics are not the same.
We don 't recommend using fl oc k () in new programs , because the semantics are
not as flexible and because the call is not standardized by POSIX. Support for it in
GNU/Linux is primarily for compatibility with sofrware written for older BSD
Unix systems.
I NOTE The GNU/ Linux flock (2) manpage warns t hat flock () locks do not
work for remotely mounted files . fcntl () locks do , provided you have a recent
, enough version ofLinux and an NFS server that supports locking.
14.2.4 Mandatory Locking

Most commercial Unix systems support mandatoty file locking, in addition to advi-
sory file locking. Mandatory locking works only with f c ntl () locks. Mandatory
locking for a file is controlled by the file 's permission settings, in particular, by addition
of the setgid bit to a file with the chmod command:
S echo hello, world > myfile Create fie
S ls -1 myfile Show permissions
- rw - r--r-- 1 a r nold devel 13 Apr 3 17 : 1 1 myfile
$ chmod g+s myfile Add setgid bit
S ls -1 myfile Show new permissions
-rw- r-Sr -- 1 arnold devel 13 Apr 3 17: 11 myfi le
The group execute bit should be left turned off. The s shows that the setgid bit is
turned on but that execute permission isn' t; an s would be used ifboth were on.
The combination of setgid on and group execute off is generally meaningless. For
this reason, it was chosen by the System V developers to mean "enforce mandatory
locking. " And indeed, adding this bit is enough to cause a commercial Unix system,
such as Solaris, to enforce file locks.
On GNU/Linux systems, the story is a little different. The setgid bit must be applied
to a file for mandatory locking, but that alone is not enough. The filesystem containing
the file must also be mounted with the mand option to the mount command.
We have already covered filesystems, disk partitions, mounting, and the mount
command, mostly in Section 8.1, "Mounting and U nmounting Filesystems, " page 228.
We can demonstrate mandatory locking with a small program and a test filesystem on
a floppy disk. First, here's the program:
/* ch14-lockall . c --- Demonstrate mandatory locking . */
2
3 #include <stdio . h> /* for fprintf(), stderr, BUFSIZ */
4 #include <errno . h> /* declare errno */
5 #include <fcntl . h> /* for flags for open() * /
6 #include <string . h> /* declare strerror() * /
1 #include <unistd . h> / * for ssize_t * /
8 #include <sys /types . h>
9 #include <sys/stat . h> / * for mode_t * /
10
11 int
12 main(int argc, char **argv )
13
14 int fd;
15 int i, j ;
16 mode_t rw_mode ;
11 static char message[] "hello, world\n";
18 struct flock lock;
19
20 if (argc ! = 2) {
21 fprintf(stderr, "usage : %s file\n", argv[O]) ;
22 exit(l);
23
24
25 rw_mode = S_IRUSR I S_IWUSR I S_IRGRP I S_IROTH; / * 0644 */
26 fd = open(argv[l] , O_RDWRlo_TRUNClo_CREATlo_EXCL, rw_mode);
21 i f (fd < 0) {
28 fprintf (stder r, "%s : %s : cannot open for read/write : %s\n",
29 argv[O], argv[l] , strerror(errno)) ;
30 (vo id ) close ( fd ) ;
31 return 1 ;
32
33
542 Chapter 14 • Excended Imerfaces
34 if (write (fd, message, strlen(message)) != strlen(message))

35 fprintf ( stderr, "%s: %s: cannot write : %s\n " ,
36 argv[ D], argv[l], strerror(errno));
37 (vo id ) close ( fd ) ;
38 return 1;
39
40
47 rw_mode 1= S_ISGID; /* add mandatory lock bit * /
42
43 if (fchrnod(fd, rw_mode) < 0) {
44 fprintf(stderr, "%s: %s: cannot change mode to %0: %s\n",
45 argv[O], argv[l], rw_mode, strerror (errno));
46 (void) close (fd) ;
47 return 1 ;
48
49
50 / * lock the file * /
57 memset(& lock, ' \D' , sizeof(lock )) ;
52 lock . I_whence = SEEK_SET;
53 lock. I_start = 0;
54 lock. I_len = 0; / * whole-file lock */
55 lock. I_type = F_WRLCK; / * write lock * /
56
57 if (fcntl(fd, F_SETLK, & lock) < D) {
58 fprintf(stderr, "%s: %s : cannot lock the file : %s\n",
59 argv[D], argv[l], strerror(errno));
60 (void) close (fd) ;
67 return 1;
62
63
64 pause () ;
65
66 (void) close(fd);
67
68 return D;
69
The program sets the permissions and creates the file named on the command line
(lines 25 and 26) . It then writes some data into the file (line 34). Line 4 1 adds the setgid
bit to the permissions, and line 43 changes them. (The fchrnod () system call was dis-
cussed in Section 5.5.2, "Changing Permissions: chmod () and fchmod () ," page 156.)
Lines 51-55 set up the st r uct fl ock to lock the whole file , and then the lock is
actually requested on line 57. Once we have the lock, the program goes to sleep, using
the pau se () system call (see Section 10.7, "Signals for Interprocess Communication,"
page 379). When done, the program closes the file descriptor and returns. Here is an
annotated transcript demonstrating the use of mandatory file locking:
14.3 More Precise T imes 543
$ fdformat /dev/fdO Format floppy disk

Double-sided, 80 tracks, 18 sec / track . Total capacity 144 0 kB .
Formatting .. . done
Verifying ... d one
$ /sbin/mke2fs / dev/fdO Make a Linux filesystem
... lots of output, omitted ..
$ su Become root, to use mount
Password : Password does not echo
# mount -t ext2 -0 mand / dev/fdO / mnt/floppy Mount floppy, with locking
# suspend Suspend root shell
[1J+ Stopped su
$ ch14-10cka11 /mnt/f10ppy/x & Background program
[2J 23311 holds lock
$ Is -1 / mnt/f10ppy/x Look at file
- rw -r- Sr-- 1 a r nold dev el 13 Ap r 6 14 : 23 / mnt / floppy / x
$ echo something > /mnt/floppy/x Try to modifY file
bas h2 : / mnt / floppy / x: Resource tempo rari l y u navai l ab le Error returned
$ kill %2 Kill program holding lock
$ Press ENTER
[2J- Te r minated ch14-lockall /mn t/floppy/x Program died
$ echo something > /mnt/f10ppy/x Retry modification, works
$ fg Return to root shell
su
# umount /mnt/f10ppy Unmount floppy
# exit Done with root shell
$
As long as c h14-1 o ckall is running, it holds the lock. Since it's a m andatory lock,
the shell's I/O redirection fails . Once c h14 - lockall exits, the lock is released, and the
1/0 redirection succeeds . As mentioned earlier, under GNU/Linux, not even r oo t can
override a mandatory file lock.
As an aside, Hoppy disks make excellent test beds for learning how to use tools that
manipulate filesystems . If you do something that destroys the data on a Hoppy, it's not
likely to be catastrophic, whereas experimenting with live partitions on regular hard
disks is much riskier.
14.3 More Precise Times

The time ( ) sys tem call and time_t type represent times in seconds since the Epoch
format. A resolution of one second really isn' t enough; today's machines are fast, and
it's often useful to distinguish subsecond time intervals. Starting with 4.2 BSD ,
Berkeley Unix introduced a series of system calls that make it possible to retrieve and
use subseco nd times. These calls are available on all modern Unix systems, including
GNU/Linux.
544 Chapter 14 • Ex tended Interfaces
14.3.1 Microsecond Times: get tirneofday ( )

The first task is to retrieve the time of day:
#include <sys/time.h> XSI
int gettimeofday(struct timeval *tv, void *tz ) ; POSIXdefinition, notGLlBCs

gett i me o f day() gets the time of day. 5 The return value is 0 on success, - 1 for an
error. The arguments are as follows:
struet timeval *tv

This argument is a pointer to a strue t time v a l , described shortly, into which
the system places the current time.
vo i d *tz
This argument is no longer used; thus, it's of type void * and you should always
pass NULL for it. (The manpage describes what it used to be and then proceeds to
state that it's obsolete. Read it if you ' re interested in the details, )
The time is represented by a s tru et t imeval :

struct timeval (
long tv_sec; / * seconds * /
long tv_usec; / * microseconds * /
) ;
The tv_ sec value represents seconds since the Epoch; tv_usee is the number of
microseconds within the second.
The GNU/Linux gettimeofday(2) manpage also documents the following macros:
#define timerisset(tvp) ((tvp)->tv_sec II (tvp)->tv_usec )
#define timercmp(tvp, uvp, cmp) \

((tvp) ->tv_sec cmp (uvp)->tv_sec I I
( tvp ) ->tv_sec == (uvp)->tv_sec && \
(tvp)->tv_usec cmp (uvp)->tv_usec )
#define timerclear ( tvp) ((tvp)->tv_sec = (tvp)->tv_usec = 0)
These macros work on struc t t imeva l * values; that is, pointers to structures,
and their use should be obvious both from their names and the code. The ti me remp ( )
macro is particularly interesting: The third argument is a comparison operator to indicate
5 The gettimeofday(2) man page docum enrs a co rrespo nding set timeofday ( ) fun ction, fo r use by [he superuser
(root) to set m e tim e of day f::>r the whole sys tem.
14. 3 More Precise T imes
the kind of comparison. For example, consider the determination of whether one
struet timeval is less than another:
struct timeval tl , t2;
if ( timercmp(& tl, & t2, <))

/ * tl is less than t2 */
The macro expands to

( (& t l ) ->tv_sec < (& t2) ->tv_sec II \
(& tl ) ->tv_sec == (& t2)->tv_sec && \
(& tl ) ->tv_usec < (& t2)->tv_usec)
This says "if tl . tv_sec is less than t2 . tv_sec , OR if they are equal and tl. tv_usee
is less than t2 . tv_usee , then .... "
14.3.2 Microsecond File Times: u times ( )

Section 5.5.3 , "Changing Timestamps: utime ( ) ," page 157, describes the utime ( )
system call for setting the access and modification times of a given file. Some filesystems
store these times with microsecond (or greater) resolution. Such systems provide the
utimes ( ) system call (note the trailing s in the name) ro set the access and modification
times with microsecond values:
#include <sys/time . h> XSI
int utimes(char *filename, struct timeval tvp[2] ) ;
The tvp argument should point to an array of two struet timeva l structures; the
values are used for the access and modification times, respectively. If tvp is NULL, then
the system uses the current time of day.
POSIX marks this as a "legacy" function, meaning that it's standardized only to
support old code and should not be used for new applications. The primary reason
seems to be that there is no defined interface for retrieving file access and modification
times that includes the microseconds value; the stru e t stat contains only time_t
values , not st r uet timeval values.
However, as mentioned in Section 5.4.3 , "Linux Only: Specifying Higher-Precision
File Times," page 143 , Linux 2.6 (and later) does provide access to nanosecond
resolution timestamps with the stat () call. Some other systems (such as Solaris) do
as wel1. 6 Thus, utimes () is more useful than it first appears, and despite its official
"legacy" status, there's no reason not to use it in your programs.
14.3.3 Interval Timers: seti timer () and geti timer ( )

The alarm () function (see Section 10.8.1, "Alarm Clocks: sleep ( ) , alarm ( ) , and
SIGALRM," page 382) arranges to send SIGALRM after the given number of seconds has
passed. Its smallest resolution is one second. Here too, 4.2 BSD introduced a function
and three different timers that accept subsecond times.
An interval timer is like a repeating alarm clock. You set the first time it should" go
off' as well as how frequently after that the timer should repeat. Both of these values
use struc t timeval objects; that is, they (potentially) have microsecond resolution.
The timer "goes off' by delivering a signal; thus, you have to install a signal handler
for the timer, preferably before setting the timer itself.
Three different timers exist, as described in Table 14.2.
TABLE 14.2
Interval timers
Timer Signal Function

ITIMER_REAL SIGALRM Runs in real time.
ITIMER_VIRTUAL SIGVTALRM Runs when a process is executing in user mode.
ITIMER_PROF SIGPROF Runs when a process is in either user or system mode.
The use of the first timer, ITIMER_REAL, is straightforward. The timer runs down
in real time, sending SIGALRM after the given amount of time has passed. (Because
SIGALRM is sent, you cannot mix calls to seti timer () with calls to alarm ( ), and
mixing them with calls to sleep () is also dangerous; see Section 10.8.1 , "Alarm Clocks:
sleep ( ) , alarm ( ) , and SIGALRM," page 382.)
The second timer, ITIMER_VIRTUAL, is also fairly straightforward. It runs down

when the process is running, but only in user-level (application) code. If a process is
6 Unfortun ately, there seems ro be no current standard for the names of the members in the struct sta t ,
making it an unportable operation.
14.3 More Precise Times 547
blocked doing I/O, such as to a disk or, more importantly, to a terminal , the timer
is suspended.
The third timer, ITI MER_PROF , is more specialized. It runs down whenever the
process is running, even if the operating system is doing something on behalf of the
process (such as I/O). According to the POSIX standard, it is "designed to be used by
interpreters in statistically profiling the execution of interpreted programs ." By setting
both ITIMER_V IRTUAL and ITIMER_PROF to identical intervals and computing the
difference between the times when the two timers go off, an interpreter can tell how
much time it's spending in system calls on behalf of the executing interpreted program?
(As stated, this is quite specialized.) The two system calls are:
#include <sys / time . h> XSI
int getitimer ( int which, struct itimerval *value) ;

inc setitimer ( int which, const st r uct i time rval *value,
struct itime r val *ov alue) ;
The which argument is one of the symbolic constants listed earlier naming a timer.
getitimer() fills in the struct itime r val pointed to by value with the given
timer's current settings. se t i t ime r ( ) sets the given timer with the value in value . If
oval ue is provided, the function fills it in with the timer's current value. Use an
oval ue of NU L L if you don' t care about the current value. Both functions return 0 on
success or -1 on error.
A struc t i timerva1 consists of two st r uct timeval members:
struct itimerval {
struct timeval it_interval ; /* nex t va l ue * /
st r uct timeval it_v alue; / * current v a lue * /
};
Application programs should not expect timers to be exact to the microsecond. The
getitimer(2) man page provides this explanation:
Timers will never expire before the requested time, instead expiring some
short, constant time afterwards , dependent on the system timer resolution
(currently lOms). Upon expiration, a signal will be generated and the timer
reset. If the timer expires while the process is active (always true for
I TIMER_V I RT ) the signal will be delivered immediately when generated.
7 D oing profilin g co rrecd y is nonrrivial; if you 're rhinkin g ab our wriring an inrerprerer, ir pays to do your
research firsr.
Otherwise the delivery will be offset by a small time dependent on the

system loading.
Of the three timers, I TIMER_REAL seems most useful. The following program,
ch1 4 -t i mers . c, shows how to read data from a terminal, but with a timeout so that
the program won't hang forever waiting for input:
1 1* ch14-timers.c ---- demonstrate interval timers *1
2
3 #include <stdio.h>
4 #inc l ude <assert . h>
5 #include <signal . h>
6 #include <sys / time.h>
7
8 1* handler --- handle SIGALRM * 1
9
10 void handler ( int signo)
11
12 static const char msg[] II \n*** Timer expired, you lose *** \ n" i
13
14 assert ( signo == SIGALRM ) ;
15
16 write ( 2, msg, sizeof (msg ) - 1);
17 exit ( l ) ;
18
19
20 1* main --- set up timer, read data with timeout * /
21
22 int main (v o id )
23
24 struct i timerva l t val;
25 char string [BUFSIZ ] ;
26
27 timerclear ( & tval.it_interval ) ; / * zero interva l means n o reset o f timer * /
28 timerclear ( & tva 1. it_value ) ;
29
30 tval.it_value . tv_sec = 10; /* 10 second timeout * 1
31
32 (void ) signal ( SIGALRM, handler);
33
34 printf ( "You have ten seconds to enter\nyour name,
rank, and serial number : ");
35
36 (void ) setitimer ( ITIMER_REAL, & tval, NULL ) ;
14.3 More Precise Times 549
37 if (fgets(string, sizeof string, stdin) != NULL) (

38 (void) setitimer(ITIMER_REAL, NULL, NULL); / * turn off timer */
39 / * process rest of data, diagnostic print for illustration */
40 printf('I'm glad you are being cooperative . \n " ) ;
41 else
42 printf('\nEOF, eh? We won't give up so easily!\n");
43
44 exi t (0) ;
45
Lines 10-18 are the signal handler for S I GA LRM ; the assert () call makes sure that
the signal handler was set up properly. The body of the handler prints a message and
exits, but it co uld do anyth ing appropriate for a larger-scale program.
In the ma i n () program, lines 27- 28 clear o ut the two s truc t t i meval members
of the s truct i time r val structure, t val. Then line 30 sets the timeout to 10 seconds.
Having tval . i t _i nte r val set to 0 means there is no repeated alarm ; it only goes off
once. Line 32 sets the signal h andler, and line 34 prints the prompt.
Line 36 sets the timer, and lines 37- 42 print appropriate messages based on the user's
action. A real program would do its work at this point. What's important to note IS
line 38, which cancels the timer because valid data was entered.
I point
NOTE There is a deliberate race co ndition between lines 37 and 38. Th e w hole
I,w
I is that if th e user doesn 't e nter a line w ith in the ti m er's ex p iratio n period ,
t he signal w ill be deli vered and t he signal handl er will print the " yo u los e"
I message .
Here are three successive runs of the program:

$ ch14-timers First run, enter nothing
You have ten seconds to ente r
your name, rank, and serial number :
*** Timer expired , you lose ***
$ ch14-timers Second run, enter data

You have ten seconds to enter
your name, rank, and serial number : James Kirk, Starfleet Captain, 1234
I'm glad you are being cooperative .
$ ch14-timers Third run, enter EOF (' 0 )

You have ten seconds to enter
your name, rank, and serial number : AD
EOF, eh? We won't give up so easily !
POSIX leaves it undefined as to how the interval timers interact with the sleep ( )
function, if at all. GLIBC does not use alarm () to implement sleep ( ), so on
GNU/Linux systems, sleep () does not interact with the interval timer. However, for
portable programs, you cannot make this assumption.
14.3.4 More Exact Pauses: nanosleep ( )

The sleep () function (see Section 10.8.1, "Alarm Clocks: sleep (), alarm(), and
SIGALRM," page 382) lets a program sleep for a given number of seconds. But as we
saw, it only took an integral number of seconds, making it impossible to delay for a
short period, and it also can potentially interact with SIGALRM handlers. The
nanosleep ( ) function makes up for these deficiencies:
#include <t i me . h> POSIX TMR
i n t n a nosl eep(con st s t r u c t timespe c *req, struct t i mesp e c *rem) ;
This function is part of the optional "Timers" (TMR) extension to POSIX. The nvo
arguments are the requested sleep time and the amo unt of time remaining should the
sleep return early (if rem is not NULL). Both of these are struet timespee values:
struct timespec {
time_t tv_sec; / * s e conds * /
long tv_nsec ; / * n a n o s e c on ds * /
};
The t v_nse e value must be in the range 0-999 ,999,999. As with sleep ( ), the
amount of time slept can be more than the requested amount of time, depending on
when and how the kernel schedules processes for execution.
Unlike sleep ( ), nanosleep () has no interactions with any signals, making it
generally safer and easier to use.
The return value is 0 if the process slept for the full time. Otherwise, it is -1, with
errno indicating the error. In particular, if errno is EI NTR, then nanosl eep () was
interrupted by a signal. In this case, if rem is not NULL, the s true t t i mespee it points
to is filled in with the remaining sleep time. This facilitates calling nanosleep () again
. .
to contmue nappmg.
Although it looks a little strange, it's perfectly OK to use the same structure for both
parameters:
14.4 Advanced Searching wirh Binary Trees 551
struct timespec sleeptime / * whatever */

i nt: reti
ret = nanosleep( & s leeptime, & sleep time) ;
The struct timeval and st ruct timespec are similar to each other, differing
o nly in the units of the seco nd component. The GLIBC <sys /time . h > header file
defines two useful macros for converting between them:
#include <sys/t i me . h> CLiBe
void TIMEVAL_TO_T IMESPEC(struc t timeval *tv, struct time spec *t s) ;

void TI MEPSEC_TO_TIMEVAL(struct timespec * t s, struct timeva l *tv);
Here they are:

# def ine TIMEVAL_TO_TIMESPEC(tv, ts) {
(ts) ->tv_sec = (tv)- >tv_sec;
(t s ) -> tv_nsec = (t v )->tv_usec * 1000 ;
# define TIMESPEC_TO_T IMEVAL (tv , ts ) {

( tv)->tv_sec = ( ts )->tv_sec;
( t v ) -> tv_usec = (ts)->tv_nsec / 1000;
#end if
I NOTE It is indeed confusing that some system call s use mi crosecond resolution
I and others use nanosecond resolution. This reason is histori cal: The
I mi crosecond calls were developed on syste ms whose hardware clocks did not
~ have any higher resolution , whereas the nanosecond calls were develo ped more
~:.~.: rece ntly, for systems with mu ch higher resolutio n clocks. C'est la vie. About all
m yo u ca n do is to keep your manua l hand y.
@
14.4 Advanced Searching with Binary Trees

In Section 6.2, "Sorting and Searching Functions," page 181 , we presented functions
for searching and sorting arrays. In this section, we cover a more advanced faci lity.
14.4.1 Introduction to Binary Trees

Arrays are about the simples t kind of str uctured data. T hey are easy to unders tand
and use. They have a disadvantage, though, which is that their size is fixed at compile
time. Thus, if yo u have more data than will fit in the array, you're out of luck. If yo u
have considerably less data than the size of your array, yo u're wasting memory. (Al though
modern systems have large memories, consider the constraints of programmers writing
software for embedded systems, such as microwave ovens or cell phones. On the other
end of the spectrum, consider the problems of programmers dealing with very large
amounts of inputs, such as weather simulations.)
The comp uter science field has invented numerous dyna mic data structures, structures
that grow and shrink in size on demand, that are more flexible than simple arrays, even
arrays created and resized dynamically with malloe () and realloe ( ) . Arrays also re-
quire re-sorting should new elements be added or removed.
One such structure is the binary search tree, which we'll just call a "binary tree" for
short. A binary tree maintains items in sorted order, inserting them in the proper place
in the tree as they come in. Lookup in a binary tree is also fast , similar in time to binary
search on an array. Unlike arrays, binary trees do not have to be re-sorted from scratch
every time you add an item.
Binary trees have one disadvantage. In the case in which the input data is already
sorted, the lookup time of binary trees reduces to that of linear searching. The techni-
calities of this have to do with how binary trees are managed internally, described
shortly.
Some more formal data-structure terminology is now unavoidable. Figure 14.1 shows
a binary tree. In computer science, trees are drawn starting at the top and growing
downwards. The further down the tree you go, the higher depth you have. Each object
within the tree is termed a node. At the top of the tree is the root node, with depth o.
At the bottom are the leafnodes, with varying depth. In between the root and the leaves
are zero or more internal nodes. Leaf nodes are distinguished by the fact that they have
no subtrees hanging off them, whereas internal nodes have at least o ne subtree. Nodes
with subtrees are sometimes referred to as parent nodes, with the subnodes being
called children.
Plain binary trees are distinguished by the fact that nodes have no more than two
children. (Trees with more than two nodes are useful but aren't relevant here.) The
children are referred to as the left and right children, respectively.
Binary search trees are further distinguished by the fact that the values stored in a
left subchild are always less than the value stored in the node itself, and the values stored
in the right subchild are always greater than the value in the node itself. This implies
that there are no duplicate values within the tree. This fact also explains why trees don't
14.4 Advanced Searching wirh Bi nary Trees 553
Depth 0 5 } Root node
Depth 1 3 9
/\ / ~\ Internal and
leaf nodes
Depth 2 2 4 7 10
Depth 3
/ /\
6 } Leaf nodes
FIGURE 14.1
A binary tree
handle presorted data well: Depending on the sort order, each new data item ends up
stored either to the left or to the right of the one before it, forming a simple linear list.
The operations on a binary tree are as follows:
Insertion
Adding a new item to the tree.
Lookup
Finding an item in the tree.
Removal
Removing an item from the tree.
Traversal
Doing something with every item that has been stored in the tree. Tree traversals
are also referred to as tree walks. There are multiple ways to "visit" the items stored
in a tree. The function s discussed here only implement one such way. We have
more to say about this, later.
554 Chapter 14 • Exrended Imerfaces
14.4.2 Tree Management Functions

The operations just described correspond to the following functions:
#include <search.h> XSI
void *tsearch (const void *key, void **rootPI

int ( *compare ) (const v oid * cons t void *) ) ;
void *tfind(const void *key, const void **rootp,
int ( *c ompare) (co nst v oid * const void * ) ) ;
v oid *tdelete(const void *key, void **r ootp ,
int ( *compare) (co nst v oid * const void * ) ) ;
typedef enum { preorde r, post orde r, endorder, l eaf } VISIT;

void twalk(const void *root,
void ( *action ) (cons t void *nodep, const VISIT which, const int depth) ) ;
void tdestroy (v oid *root, v oid (*free_ n ode) (void *nodep)); GLlBC
These functions were first defined for System V and are now formally standardized
by POSIX. They follow the pattern of the others we saw in Section 6.2, "Sorting and
Searchin g Functions, " page 18 1: using void * pointers for pointing at arbitrary data
types, and user-provided comparison functions to determine ordering. As for qsort ( )
and bsearch (), the comparison function must return a negative/zero/positive value
when the key is compared with a value in a tree node.
14.4.3 Tree Insertion: tse ar ch ( )

These routines allocate storage for the tree nodes. To use them with multiple trees,
you have to give them a pointer to a voi d * variable which they fill in with the address
of the root node. \lC'hen creating a new tree, initialize this pointer to NULL:
v o id *root = NULL; Root of new tree
void *va l; Pointer to returned data
ext ern in t my_compar e(const void * const v o i d *); Comparison function

e xtern char key[] , key2[] ; Values to insert in tree
val = tsearch(key, & root, my_c ompare) ; Insert first item in tree
... fill key2 with a different value. DON'T modifY root ...
val = tsearc h(key2, & r oot, my_compare ); Insert subsequent item in tree
As shown, the r oo t variable should be set to NULL only the first time and then left
alone after that. On each subsequent call, tsearch () uses it to manage the tree.
When the key being so ught is foun d, both tse arch ( ) and tfind () return pointers
to the node containing it. They differ when the key is not found: t find () returns
14.4 Advanced Searching wirh Binary Trees 555
NULL, and tsearch () inserts the new value into the tree and returns a pointer to it.
The poi nters returned by tsearch () and tfind () are to th e internal tree nodes . They
can be used as the value of ro o t in subsequent calls in order to wo rk on subtrees. As
we wi ll see shortly, the key value can be a pointer to an arbitrary structure; it's not re-
stricted to a character string as the previo us example might imply.
T h ese routines stOre only pointers to the d ata used for keys. T hus, it is up to you to
manage the stOrage holding the data values, usually with malloc ( ) .
I NOTE Since the tree fun ctions kee p pointers, be extra careful not to use
~ reall oc () for va lu es that have been used as keys! realloc () co uld move
I the data aro und , returning a new pointer, but th e tree routines wo uld still be
m maintaining dangling pointers into the old data .
h
14.4.4 Tree Lookup and Use of A Returned Pointer: t fi nd () and

tsear ch ()
The tfind () and tsearc h () functions search a binary tree for a given key. They
take the same list of arguments: a key to search for; a pointer to the root of the tree,
rootp; and compare, a pointer to a comparison functio n. Both functions return a
pointer to the node that matches key.
Just how do you use the pointer returned by t find ( ) or tsearch ( ) ? What exactly
does it point to , anyway? The answer is that it p o ints to a node in the tree. This is an
internal type; yo u can't see how it's defined. However, POSIX guarantees that this
pointer can be cast to a pointer to a pointer to whatever you're using for a key. Here is
so me fragmentary code to demonstrate, and then we show how this works:
struct emp loyee { From Chapter 6
c har lastname[30] ;
c har f i rstname[30] ;
};
/ * emp_name_id_compare --- compare by name, then by ID * /
int e mp_ name_id_compare (c o nst voi d *elp , co nst void *e2p)
... also from Chapter 6, reproduced in full later on ...
struct employee key = { ... } ;

void *vp, *r oot;

struct employee *e;
.. . fill tree with data ...
vp = tfi nd(& key, root , emp_name_id_c ompare) ;

if (vp ! = NULL) { / * i t's there, use it * /
e = * (( s truc t employee ** ) vp ) ; Retrieve stored data from tree
/* use inf o in *e ... * /
How can a pointer to a node double as a pointer to a pointer to the data? Well,
consider how a binary tree's node would be implemented. Each node maintains at least
a pointer to the user's data item and pointers to potential left and right subchildren .
So , it has to look approximately like this:
struct bi nary_tree {
void *user_data ; Pointer to user's data
struct binary_tree * l eft; Left subchild or NULL
s truc t binary_ tree *right; Right subchild or NULL
... possibly other fields here ...
node;
C and c++ guarantee that fields within a struct are laid out in increasing address
order. Thus, it's true that '& node .l eft < & node. righ t' . Furthermore, the address
of the s truct is also the address of its first field (in other words, ignoring type issues,
,& node == & node.u ser_dat a ') .
Conceptually, then, here's what ' e = *( (st ru ct employe e **) vp);' means:
1. vp is a vo id *, that is, a generic pointer. It is the address of the internal tree

node, but it's also the address of the part of the node (mos t likely another
vo id *) that points to the user's data.
2. '( struct employee * *) vp' casts the address of the internal pointer to the
correct type; it remains a pointer to a pointer, but now to a struc t employee .
Remember that casts from one pointer type to another don' t change any values
(bit patterns); they only change how the compiler treats the values for type
considerations.
3. '* ( (struct employee * *) vp ) ' in directs thro ugh the newly minted struc t
employee **, returning a usable struct empl oyee * pointer.
4. 'e = *( (struct empl oy ee **) vp)'storesthisvalue inefordirectuselater.
The concept is illustrated in Figure 14.2.

14.4 Advanced Searching with Binary Trees 557
vp lastname
left firstname
right
startdate
void ** struct binary_tree struct employee
FIGURE 14.2
Tree nodes and their pointers
You might consider defining a macro to simplify the use of the returned pointer:
#define tree_data(ptr, type) (*(type **) (ptr))
struct employee *e;

void *vp ;
vp tfind(& key, root, emp_name_id_compare) ;

if (vp ! = NULL) ( / * it's there, use it */
e = tree_data(vp, struct employee);
/ * use info in *e ... * /
14.4.5 Tree Traversal: twalk ( )

The twalk () function is declared as fo llows in <search . h> :
typedef enum { preorder, postorder, endorder, leaf} VISIT;
void twalk(const void *root ,
void ( *action) (const void *nodep, const VISIT which, const int depth)) ;
The first parameter is the root of the tree (not a pointer to the roOt). The second is
a pointer to a callback function, which is called with three arguments: a pointer to the
tree node being visited, an enumerated type indicating how the given node is being
visited, and an integer indicating the depth of the current node (the root is at depth 0,
as explained earlier).
The use here of a callback function is the same as for n ft w () (see Section 8.4.3.2,
"The nftw () Callback Function," page 263). There, the callback function is called for
each object in the filesystem. Here, the callback function is called for each object stored
in the tree.
There are several ways to traverse, or "walk," a binary tree:
• Left child, node itself, right child.

SS8 Chapter 14 • Extended Imerfaces
• Node itself, left child, right child.

• Left child, right child, node itself.
The GLIBC twalk () function uses the second of these: the node first, then the left
child, then the right child. Each time a node is encountered, the node is said to be vis-
ited.8 In the course of visiting anode's child, the function must visit the node itself.
Thus the values of type VI S I T indicate at what stage this node is being encountered:
p re order Before visiting any children.
post o rder After visiting the first child but before visiting the second child.
e n do r der After visi ting both children.
leaf This node is a leaf node, without children.
,I·
I NOTE The terminology used here does not exactly match that used in formal
data-structures texts. There , the terms used are in order, preorder, and postorder,
; referring respectively to the three ways listed earlier for traversing a tree. Thus,
, twa lk() uses a preordertraversal , but uses the pre o r d er, etc., symbolic
'I:constants to indicate at what stage a node is being visited . This can
I be confusing.
The following program, ch1 4-tsearc h . c , demonstrates building and traversing a

tree. It reuses the st r uct empl o yee structure and emp _ name_i d_ cornpar e () function
from Section 6.2, "Sorting and Searching Functions," page 181.
1 / * ch14-ts e arch . c --- demonstrate t r ee management * /
2
3 # i nclude <stdio . h>
4 # include <search . h>
5 #include <t i me . h>
6
7 struct employee {
8 char lastname[30] ;
9 char firstname[30] ;
10 long emp_id ;
11 time_t star t _ date;
12 };
13
8 Images come to mind of little binary data structures sitting down willi each other over tea and cookies. O r at
least that happens if you 've been spendin g toO much time in front of your computer .. .
14.4 Adva nced Searchingwi[h Binary Trees 559
14 /* emp_name_id_compare --- compare by name, then by 10 * /

15
16 int emp_name_id_compare (const void *e1p, cons t void *e2p)
17
18 co nst struct employee *e1, *e2;
19 int la st, first;
20
21 e1 (const struc t empl oyee *) e1p ;
22 e2 (co nst struc t employee * ) e2p ;
23
24 if (( last = strcmp (e1->lastname , e2->l astname)) '= 0)
25 return last;
26
27 /* same la st name, check first name * /
28 if ((firs t = strcmp(e1->firstname , e2->firstname)) ' = 0)
29 retu rn first;
30
31 / * same first name, check 10 numbe rs * /
32 i f (e 1->emp_id < e2->emp_ id)
33 return -1 ;
34 else if (e1->emp_id == e2 -> emp_id)
35 return 0 ;
36 else
37 return 1 ;
38
39
40 / * print_emp --- print an empl oyee structu re during a t ree walk */
41
42 vo i d print_emp(const void *node p, const VISIT which, const int depth)
43
44 struct empl oyee *e = * (( struct employee ** ) nodep) ;
45
46 swi tch (whi ch) (
47 case leaf;
48 case post order :
49 printf( "Oepth : %d . Employee:\n", depth) ;
50 printf(" \t%s , %s\t%d\t %s \n ", e->lastname , e->f irstname,
51 e->emp_id , ctime (& e->s tart_date));
52 break;
53 default :
54 break;
55
56
Lines 7-1 2 define the struct employee, and lines 14-38 define
emp_ n a me_id_ compare() .
Lines 40-56 define pr i nt_emp (), the callback function that prints a struct
employee, along with the depth in the tree of the current node. Note the magic cast
on line 44 to retrieve the pointer to the stored data.
58 /* main --- demonstrate maintaining data in binary tree * /

59
60 int main(void )
61 {
62 #define NPRES 10
63 struct employee presidents [NPRES ] ;
64 int i, npres;
65 char buf [BUFSIZ];
66 void *root = NULL;
67
68 / * Very simple code to read data : * /
69 for (npres = 0; npres < NPRES && fgets(buf, BUFSIZ, stdin ) != NULL;
70 npres++) {
71 sscanf (buf, "%s %s %ld %ld\n",
72 presidents [npres] .lastname,
73 presidents [npres] .firstname,
74 & presidents [npres] . emp_id,
75 & presidents [npres] . start_date) ;
76
77
78 for ( i = 0; i < npres; i++ )
79 (void ) tsearch (& presidents[i], & root, emp_name_id_compare ) ;
80
81 twa l k (root, print_emp ) ;
82 return 0 ;
83
The goal of printing the tree is to print the contained elements in sorted order. Re-
member that twalk () visits intermediate nodes three times and that the left child is
less than the node itself, while the right child is greater than the node. Thus, the swi tch
statement prints the node's information only if which is leaf, for a leaf node, or
postorder, indicating that the left child has been visited, but not yet the right child.
The data used is the list of presidents, also from Section 6.2, "Sorting and Searching
Functions, " page 181. To refresh your memory, the fields are last name, first name,
employee number, and start time as a seconds-since-the-Epoch timestamp:
$ cat presdata.txt
Bush George 43 98001360 0
Clint on Wi l liam 42 7275528 00
Bush Geo rge 41 6013224 00
Reagan Ronald 4 0 3488616 00
Carter James 39 2226 3 120 0
The data are sorted based on last name, then first name, and then seniority. When
run,9 the program produces this output:
9 Thi s ourpur is for rhe U.S. Easrern Time zone.

14.4 Advanced Searchingwirh Binary T rees 561
$ ch14-tsearch < presdata.txt

Depth : 1 . Employe e:
Bush, Georg e 41 Fri Jan 20 13 : 00 : 00 1989
Depth : o. Employee :
Bush, George 43 Sat Jan 20 13 : 00 : 00 2001
Depth : 2 . Employee:
Carte r, J ames 39 Thu Jan 20 13: 00 : 00 1977
Depth : l. Employee :
Clin ton, v-]illiam 42 Wed Jan 20 13 : 00 : 00 1993
Depth: 2 . Employee :
Reagan, Ronald 40 Tue Jan 20 13 : 00 : 00 1981
14.4.6 Tree Node Removal and Tree Deletion: tde let e () and
tdestr oy ( )
Finally, you can remove items from a tree and, on GLIBC systems, delete the entire
tree itself:
void *tdelete(const v oid *key, void **rootp,
int (*compa re ) (co nst void * const void * )) ;
/ * GLIBC extension, not in POSIX : * /

void tdestroy (void *root, void ( *free_node) (void *nodep )) ;
The arguments to tdelete () are the same as for tsearch () : the key, the address
of the tree's root, and the comparison function. If the given item is found in the tree,
it is removed and tdelete () returns a pointer to the parent of the given node. Other-
wise, it returns NULL. T his behavior has to be managed carefully in your code if you
need the original item being deleted, for example, to free its storage:
struc t employee *e, key; Variable declarations
void *vp, *rooti
... fill in key for item to remove from tree.
vp = tfind (& key, root, emp_name_id_c ompare); Find item to remove
if (vp ! = NULL ) {
e = * (( struct employee ** ) vp); Convert pointer
free(e) ; Free storage
(void) tdelete(& key, & root, emp_name_id_compare); Now remove it from tree
Although not specified in the man pages or in the POSIX standard, under
GNU/Linux, if you delete the item stored in the root node, the returned value is that
of the new root node. For portable code, you should not necessarily rely on this behavior.
The tdestroy () function is a GLIBC extension. It allows you to destroy a whole

tree. The first argument is the root of the tree. The second is a pointer to a function
that releases the data pointed to by each node in the tree. If nothing needs to be done
with these data (for example, they're held in a regular array, as in our earlier example
program), then this function should do nothing. Do not pass in a NULL pointer! Doing
so results in a crash.
14.5 Summary
• Occasionally, it's necessary to allocate memory aligned at a certain boundary.
posix_memalign () does this. Its return value is different from that of most of
the functions covered in this book: Caveat emptor. memalign () also allocates
aligned memory, but not all systems support releasing that memory with free ( ) .
• File locking with fcntl () provides record locks, down to the level of the ability
to lock single bytes within a file. Read locks prevent writing of the locked area,
and a write lock prevents other processes from reading and writing the locked
area. Locking is advisory by default, and POSIX standardizes only advisory locking.
Most modern Unix systems support mandatory locking, using the setgid permission
bit on the file and possibly additional filesystem mount options.
• On GNU/Linux, the lockf () function acts as a wrapper around POSIX locking
with fcntl (); the BSD flock ( ) function's locks are (on GNU/Linux) completely
independent of fcntl () locks . BSD flock () locks are whole-file locks only and
do not work on remote filesystems. For these reasons, flock () locks are
not recommended.
• get t imeo fday () retrieves the time of day as a (seconds, microseconds) pair in a
struct timeval. These values are used by utimes () to update a file 's accessed
and modification times . The geti timer () and seti timer () system calls use
pairs of struct timevals in a struct itimerval to create interval
timers-alarm clocks that "go off' at a set time and continue to go off at a set in-
terval thereafter. Three different timers provide control over the states in which
the timer continues to run down.
• The nanosleep () function uses a struct timespec, which specifies time in
seconds and nanoseconds, to pause a process for a given amount of time. It has
the happy trait of not interacting at all with the signal mechanism.
14.6 Exercises 563
• T he tree API is an additional set of data storage and search functions that maintains
data in binary trees, the effect of which is to keep data so rted. The tree API is very
flexible, allowi ng use of multiple trees and arbitrary data.
Exercises
1. Write the lockf () function, using fcntl () to do the locking.

2. The directory /usr/src/linux/Documentati on co ntains a number of files
that describe different aspects of the operating system's behavior. Read the files
locks . txt and mandat o ry. txt for more information about Linux's handling
of fi le locks.
3. Run the ch14-lockall program on your sys tem, wi thout mandatolY locking,
and see if yo u can change the operand file.
4. If you have a non-Linux system that supp orts mandatory locking, try the
ch14-lockall program on it.
5. Write a function named strftimes (), with the following API:
size_t strftimes(char *buf, siz e_t size, const char *format,
const struct timeval *tp);
It should behave like the standard strftime () function, except that it allows
%q to mean " the current number of microseconds."
6. Using the s trftimes () functi on you just wrote, write an enhanced version
of date that accepts a format string beginning with a leading + and formats
the current date and time. (See date(l) .)
7. The handling of the timeout in ch14-timers . c is rather primitive. Rewrite
the program to use setjmp () after printing the prompt and longjmp () fro m
within the signal handler. D oes this improve the structure or clariry of
the program?
8. We noted that ch14-timers . c contains a deliberate race condition. Suppose
the user enters a response within the right time period, but ch14-timers is
suspended before the alarm can be canceled. What call can you make to cut
down the size of the problem window?
9. Draw the tree as shown by the output of ch14-tsearch in Section 14.4.5,
"Tree Traversal: twalk ( ) ," page 557.
10. Examine the file / usr / share / d i ct / wo r ds on a GNU/Linux: system. (This

is the spelling dictionary for spell ; it might be in a different place on different
systems.) The words exist in the file , one per line, in sorted order.
First, use this awk program to create a new list, in random order:
$ awk '{ list[$O]++ }
> END { for (i in list) print i }' /usr/share/dict/words > /tmp/wlist
Next, write two programs. Each should read the new list and store each word
read into a tree and an array, respectively. The second program should use
qsort () to sort the array and bs ear ch () to search it.
Retrieve the word 'gravy' from the tree or array. Time the two programs to
see which one runs faster. You may need to put the word retrieval inside a loop
that runs multiple times (say 1,000) , to get running times that are long enough
to show a difference.
Use the output of p s to see how much memory each program uses.
11. Rerun the two programs , using the original sorted dictionary file , and see how,
if at all, the timing results change.
d P
tii 11 w
}::
iN
'i;
!:! I W
~ ,.: ~
In i;!~
Chapter 15 Debugging page 567
Chapter 16 A Project That Ties Everything Together page 64 1
565
In this chapter
• 15 .1 First Things First page 568
• 15 .2 Compil at io n for Debu ggi ng page 569
• 15 .3 GDB Basics page 570
• 15.4 Programm ing for Debugging page 577
• 15 .5 Debu gging Tools page 605
• 15 .6 Software Test ing page 6 32
• 15.7 Debugging Rules page 633
• Exerc ises page 639
567
T here are many practices, starting with program logic and data design, through
code breakdown and organization , and finally implementation, that can help
minimize errors and problems. We encourage you to study these; find good books
on software design and software engineering, and put their advice into practice! Any
program over a few hundred lines in size should be carefully thought out and de-
signed, instead of just hacked on until it appears to work.
H owever, since programmers are human, programming errors are unavoidable.
Debugging is the process of tracking down and removing errors in programs. Even
well-designed, well-implemented programs occasionally don't work; when some-
thing's going wrong and you can't figure out why, it's a good idea to point a debugger
at the code, and watch it fail.
This chapter covers a range of topics, starting off with basic debugging advice and
techniques (compiling for debugging and elementary use ofGDB, the GNU debug-
ger) , moving on to a range of techniques for use during program development and
debugging that make debugging easier, and then looking at a number of tools that
help the debugging process. It then closes with a brief introduction to software
testing, and a wonderful set of "debugging rules ," extracted from a book that we
highly recommend.
Most of our advice is based on our long-term experience as a volunteer for the GNU
project, maintaining gawk (GNU aWk) . Most, if not all , the specific examples we
present come from that program.
Througho ut the chapter, specific recommendations are marked Recommendation.
15 .1 First Things First

When a program misbehaves, you may be at a loss as to what to do first. Often,
strange behavior is due to misusing memory- using uninitialized values, reading or
writing outside the bounds of dynamic memory, and so on. Therefore, you may get
faster results by trying out a memory-debugging tool before you crank up a debugger.
The reason is that memory tools can point you directly at the failing line of code,
whereas using a debugger is more like embarking on a search-and-destroy mission, in
568
15.2 Compilation for Debugging 569
which you first have to isolate the problem and then fix it. Once you're sure that
memory problems aren't the issue, you can proceed to using a debugger.
Because the debugger is a more general too l, we cover it first. We discuss a number
of memory-debugging tools later in the chapter.
15.2 Compilation for Debugging

For a source code debugger to be used, the executab le being debugged (the debuggee,
if you will) must be compiled with the compiler's - g option. This option causes the
compiler to emit extra debugging symbols into the object code; that is, extra information
giving the names and types of variables, constants, functions, and so on. The debugger
then uses this information to match source code locations with the code being executed
and to retrieve or store variable values in the run ning program.
On many Unix systems, the -g compi ler option is murually exclusive with the -0
option, which turns on optimizations. This is because optimizations can cause rearrange-
ment of bits and pieces of the object code, such that there is no longer a direct relation-
ship between what's being executed and a linear reading of the source code. By disabling
optimizations, you make it much easier for the debugger to relate the object code to
the source code, and in rum, single-stepping through a program's execution works in
the obvious way. (Single-stepping is described shortly.)
GCC, the GNU Compiler Collection, does allow -g and -0 together. However, this
introduces exactly the problem we wish to avoid when debugging: that following the
execution in a debugger becomes considerably more difficult. The advantage of allowing
the two together is that you can leave the debugging symbols in an optimized, for-
production-use executable. They occupy only disk space, not memoty. Then, an installed
executable can still be debugged in an emergency.
In our experience, if you need to use a debugger, it's better to recompile the applica-
tion from scratch, with only the - g option. This makes tracing considerably easier;
there's enough detail to keep track of just going through the program as it's written,
without also having to worry about how the compiler rearranged the code.
There is one caveat: Be sure the program still misbehaves. Reproducibility is the key
to debugging; if you can't reproduce the problem, it's much harder to track it down
570 Chapter 15 • Debugging
and fix it. Rarely, compiling a program without -0 can cause it to stop failing. 1 Typi-
cally, the problem persists when compiled without -0, meaning there is indeed a logic
bug of some kind, waiting to be discovered.
15.3 GDB Basics

A debugger is a program that allows you to control the execution of another program
and examine and change the subordinate program's state (such as variable values). There
are two kinds of debuggers: machine-level debuggers, which work on the level of machine
instructions, and source-level debuggers, which work in terms of the program's source
code. For example, in a machine-level debugger, to change a variable's value, you
specify its address in memory. In a source-level debugger, you just use the
variable's name.
Historically, V7 Unix had adb, which was a machine-level debugger. System III had
sdb, which was a source-level debugger, and BSD Unix provided dbx, also a source-
level debugger. (Both continued to provide adb.) dbx survives on some commercial
Unix systems.
GDB, the GNU Debugger, is a source-level debugger. It has many more features, is
more broadly portable, and is more usable than either sdb or dbx. 2
Like its predecessors, GDB is a command-line debugger. It prints one line of source
code at a time, prints a prompt, and reads one line of input containing a command
to execute.
There are graphical debuggers; these provide a larger view of the source code and
usually provide the ability to manipulate the program both through a command-line
window and through GUI components such as buttons and menus. The ddd debugger 3
is one such; it is built on top of GDB, so if you learn GDB , yo u can make some use of
ddd right away. (ddd has its own manual, which you should read if you'll be using it
I Compiler optimizations are a notorious scapegoat for logic bugs. In the past, finger-pointing at the compiler was
more justified. In our experience, using modern systems and compilers, it is very unusual (0 find a case in which
co mpil er optimization introduces bugs into working code.
2 We're speaking of the original BSO dbx. We have used GOB excl usively for well over a decade.
3 d dd comes with many GNU/Linux systems. The source code is available from the GNU Project's FTP sire for
d d d (ftp: / / ftp.gnu. o r g /gn u/ d dd/ ).
15.3 GOB Basics 571
heavily.) Another graphical debugger is Insight,4 which uses T cl/Tk to provide a

graphical interface on top of GDB. (You should use a graphical debugger if one is
available to you and you like it. Since our intent is to provide an introduction to debug-
gers and debugging, we've chosen to go with a simple interface that can be presented
in print.)
GOB understands C and C++, including support for name demangling, which means
that you can use the regular C++ source code names for class member functions and
overloaded functions. In particular, GDB understands C exp ression syntax, which is
useful when you wish to look at the value of complicated expressions, such as
'* ptr ->x . a [ 1) - >q' . It also understands Fortran 77, although you may have to append
an undersco re character to the Fortran variable and functi o n names . GDB has partial
support for Modula-2, and limited support for Pascal.
If you 're running GNU/Linux or a BSD sys tem (and you installed the development
tools), then you should h ave a recent version of GOB already installed and ready to
use. If not, yo u can download the GOB so urce code from the GNU Project's FTP site
for GO B5 and build it yo urself.
GOB comes with its own manual, which is over 300 pages long. You can generate
the printable version of the m anual in the GOB source code directory and print it
yourself. You can also buy primed and bound copies from the Free Software Foundation;
your purchase helps the FSF and contributes directly to the p roduction of more free
software. (See the FSF web site 6 for ordering information.) This section describes the
basics of G D B; we recommend reading the manual to learn how to take full advantage
of GOB 's capabilities.
15 .3.1 Running GOB

The basic usage is this:
gdb [ option s 1 [ e x ecut able [ core-tile - name II
Here, exe c u tabl e is the executable program to be debugged. If provided,

core - file - name is the name of a core file created whe n a program was killed by the
4 http : // sources . redhat . com/insigh t /

5 ftp : // ftp . gnu . o rg / gnu / gdb/
G http : // www . gnu . o rg
operating system and dumped core. Under GNU/Linux, such files (by default) are
named core . pid,7 where pid is the process ID number of the running program that
died. The pid extension means you can have multiple core dumps in the same directory,
which is helpful, but also good for consuming disk space!
If you forget to name the files on the command line, you can use' f i 1 e execu tab] e'
to tell GOB the name of the executable file, and 'core-file core-file - name' to
tell GDB the name of the core file .
With a core dump, GDB indicates where the program died. The following program,
ch15-abort . c, creates a few nested function calls and then purposely dies by abo rt ( )
to create a core dump :
1* ch15-abort.c --- produce a core dump *1

#include <stdlib.h>
1* recurse --- build up some function calls * 1
void recurse (v o id )
static int i;
if ( ++i == 3)
abort () ;
e l se
recurs e () ;
int main ( int argc, char **argv )
recurse () ;
Here's a short GDB session with this program:

$ gee -g ch15-abort.e -0 eh15-abort Compile, no - 0
$ eh15-abort Run the program
Aborted (core dumped ) It dies miserably
$ gdb eh15-abort eore.4124 Start GOB on it
GNU gdb 5.3
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome t o change it and / or distribute copies of it under certain c onditions .
Type "show copying" to see the cond i tions .
7 See sysctl(8) if you wish to change (his behavio r.

15.3 GOB Bas ics 573
There is absolutely no warranty for GDB . Type "show warranty" for details .
This GDB was configured as "i686-pc-linux-gnu" .
Core was generated by 'ch1S-abort' .
Program terminated with signal 6, Aborted .
Reading symbols from /lib/i686/libc . so . 6 ... done .
Loaded symbols for /lib/i686/libc . so . 6
Reading symbols from /lib/ld-linux . so . 2 ... done .
Loaded symbols for /l ib /ld-linux . so.2
#0 Ox42028cc1 in kill () from /l ib /i686/l ibc . so . 6
(gdb) where Print stack trace
#0 Ox42028cc1 in kill () from /lib/i686/libc . so . 6
#1 Ox42028ac8 in raise () from / lib/i686/libc . so . 6
#2 Ox4202a019 in abort () from /lib/i686/libc . so . 6
#3 Ox08048342 in recurse () at ch1S-abort . c : 13 < - - - We need to examine here
#4 Ox08048347 in recurse () at ch1S-abort . c : 1S
#S Ox08048347 in recurse () at ch1S-abort . c : 1S
#6 Ox080483Sf in main (argc=l , argv=Oxbffff8f4) at ch1S-abort . c : 20
#7 Ox4201S8d4 in _libc_start_main () from /lib/ i686/libc . so. 6
The whe re command prints a stack trace, that is, a list of all the functions called,
most recent first. Note that there are three invocations of the recurse () function.
The command bt, for "back trace, " is an alias for where ; it's easier to type.
Each function invocation in the stack is referred to as a frame. This term comes from
the compiler field, in which each function's parameters, local variables, and return ad-
dress, grouped on the stack, are referred to as a stack fame. The GDB frame command
lets you examine a particular frame. In this case, we want frame 3 . This is the most recent
invocatio n of recurse ( ) , which called abort ( ) :
(gdb) frame 3 Move to fram e 3
#3 Ox 08048342 in recurse () at ch1S-abort . c : 13
13 abort( ) ; GOB prints source location in frame
(gdb) list Show several lines of source code
8 void recurse (void)
9
10 static int i;
11
12 if (++i == 3)
13 abort() ;
14 else
lS recurse() ;
16
17
(gdb) Pressing ENTER repeats the last command
18 int main(int argc , char **argv)
19
20 recurse() ;
21
(gdb) quit Leave the debugger (for now)
As demonstrated, pressing ENTER repeats the last command, in this case, list , to
show source code lines. This is an easy way to step through the source code.
GDB uses the re adline library for command-line editing, so you can use Emacs
or vi commands (as you prefer) for recalling and editing previous lines. The Bash shell
uses the same library, so if you're familiar with command-line editing at the shell prompt,
GDB's works the same way. This feature saves considerable typing.
15.3.2 Setting Breakpoints, Single-Stepping, and Setting Watchpoints

Often, program failures produce a core dump. The first step is to use GDB on the
co re file to determine the routine in which the program died. If the original binary
was not compiled for debugging (that is, no - g), all GDB can tell you is the function 's
name, but no other details.
The next step is to then recompile the program with debugging and without opti-
mization, and verifY that it still fails. Assuming it does, you'll want to run the program
under the control of the debugger and set a breakpoint in the failing routine.
A breakpoint is a point at which execution should break, or stop. You can set break-
points by function name, source code line number, source code file and line number
together, as well as in other ways.
After setting the breakpoint, you start the program running, using the r un command,
possibly followed by any command-line arguments to be passed on to the debuggee.
(GDB conveniently remembers the arguments for you; if you wish to start the program
over again from the beginning, all you need do is type the r un command by itself, and
GDB will start a fresh copy with the same arguments as before.) Here's a short session
using g a wk:
S gdb gawk Start GOB on gawk
GNU gdb 5 . 3
(gdb) break do-print Set breakpoint in do-print

Breakpoint 1 at OxS05a36a : file b u i ltin . c , line 1504.
(gdb) run 'BEGIN { print "hello, world" }' Start the program running
Starting program : / home / arnold / Gnu/gawk/gawk -3 . 1.3/gawk 'BEGIN { print
"hello, world" }'
Breakpoint 1, do-pr int (tree=OxS095290 ) at builtin . c : 1504

1504 struc t r edirect * r p = NULL; Execution reaches breakpoint
(gdb) list Show source code
1 4 99
15.3 G OB Basics 575
1500 v oid
1501 do-pri n t ( regist er NODE * tree)
1502 (
1503 regi s t e r NODE **t ;
1504 struc t r e direct *rp NULL ;
1505 regi s t er FILE *fp;
1506 int nurnnode s , i i
1507 NODE *s ave;
1508 NOD E *tva l ;
Once the breakpoint is reached, you proceed through the program by single-stepping
it. This means that GDB allows the program to execute one so urce code statement at
a time. GDB prints the line it's about to execute and then prints its prompt. To run
the statement, use the nex t command:
(gdb) next Run current statement ( 150 4, above)
1510 fp = redire cc_ to_fp ( tree - >rnod e , & rp ) ; GOB prints next statement
(gdb ) Hit ENTER to run it, and go to next
1511 if (fp == NULL )
(gdb ) ENTER again
15 19 sav e = tree = tr ee-> l n ode ;
(g db ) And again
15 20 f or (n u mnode s = 0 ; t r ee ! = NULL ; tr e e = t r e e->rnode )
The step command is an alternative command for single-stepping. There is an im-

portant difference between next and step. next executes the next statement. If that
statement contains a function call, the function is called and returns before GOB regains
co ntrol of the running program.
On the other hand, when yo u use st ep on a statement with a function call , GDB
descends into the called function, allowing you to co ntinue single-stepping (or tracing)
the program. If a statement doesn't contain a function call, then st ep is the same
as next.
I
~
NOTE It's easy to forget which command you're using and keep pressing
ENTER to run each subsequent statement. If you ' re using s tep, you can
'" accidentally enter a library function , such as strlen () or printf (), which
1
.• you really don't want
.•.
11
to bother with. In such a case, you can use the command
~ finish, which causes the program to run until the current function returns .
*
You can print memory contents by using the print command. GDB understands
C expression syntax, which makes it easy and natural to examine structures pointed to
by pointers:
(gdb ) print * save Print the structure pointed to by save

$1 = {sub = {nodep = {l = {lptr = Ox8095250, param_name = Ox8095250 "pR\t\b",
11 134828624}, r = {rptr = OxO, pptr = 0 , preg = Ox O, hd OxO,
av = Ox O, r_ent = O}, x = {extra = Oxo, xl = 0, param_Iist = OxO},
name = Ox O, number = 1, reflags = O}, val = {
fltnum = 6 . 6614191194446594e-316, sp = OxO, slen = 0, sref = 1,
idx = O}, hash = {next = Ox8095250, name = OxO, length = 0 , value = OxO,
ref = I } }, type = Node_expression_list, flags = I}
Finally, the cont (continue) command lets you continue the program's execution.
It will run until the next breakpoint or until it exits normally if it doesn 't hit any
breakpoints. This example picks up where the previous one left off:
1520 for (numnodes = 0; tree != NULL; tree = tree->rnode )
(gdb ) cont Continue
Continuing .
hello, world
Program exited normally . Informative message from GOB

(gdb ) quit Leave the debugger
A watchpoint is like a breakpoint, but for data instead of executable code. You set a
watchpoint on a variable (or field in a s truct or un i on, or array element) , and when
it changes, GDB notifies you. GDB checks the value of the watchpoint as it single-steps
the program, and stops when the value changes. For example, the do_ l int _ o l d variable
in gawk is true when the --l i nt-o l d option was issued. This variable is set to true by
get opt_long ( ) . r:we covered get opt_long ( ) in Section 2.1.2, "GNU Long Options,"
page 27 .) In gaWk'S ma i n . c file:
1* warn about stuff not in V7 awk * 1
static const struct option optab[] = {
};
Here's a sample session, showing a watchpoint in action:

$ gdb gawk Start GOB on gawk
GNU gdb 5 . 3
Set watch point on variable

Hardware watchpoint 1 : do_lint_old
(gdb ) run --lint-old 'BEGIN { print "hello, world" }' Run the program
Starting program : Ihome/arnold/Gnu/gawk/gawk - 3 . 1.4 / gawk --lint-old
'BEGIN { print "hello, world " }'
Hardware watchpoint 1 : do_ lint_old
Hardware watchpoint 1 : do_lint_old Watchpoint checked as program runs
15 .4 Programming for Debugging 577

old value = 0 Wa tchpoint stops the program

New value = 1
Ox42 0c4219 in _getopt_internal () from /lib/i686/libc . so . 6
(gdb) where Sta ck trace
#0 Ox420c4219 in _getopt_internal () from /lib/i686/libc . so . 6
#1 Ox420c4e83 in getopt_long () from /lib/i686/libc . so . 6
#2 Ox080683a1 in main (argc =3, argv=Oxbffff8a4) at main . c : 293
#3 Ox420158d4 in __ libc_start_main () from /lib/i686/libc . so . 6
(gdb) quit We're done for now
The program is running . Exit anyway? (y or n) y Yes, really
GOB can do much more than we've shown here. Although the GDB manual is large,
it is worthwhile to read it in its entirety at least once, to familiarize yourself with its
commands and capabilities. After that, it's probably sufficient to look at the NEWS fi le
in each new GOB distribution to see what's new or changed.
It's also worth printing the GDB reference card w hich comes in the file
gdb / doc / refcard . tex wi thi n the GOB source distribution. You can create a printable
PostScript version of the reference card, after extracting the source and running
configure, by using these commands:
$ cd gdb / doc Change to doc subdirectory
$ make refcard.ps Format the re ference card
The reference card is meant to be printed dual-sided, on 8.5 x 11 inch ("letter") paper,
in landscape format. It provides a six-column summary of the most useful GDB com-
mands. We recommend printing it and having it by your keyboard as you work
with GOB.
1 SA Programming for Debugging

There are many techniques for making source code easier to debug, ranging from
simple to involved. We look at a number of them in this section.
15.4.1 Compile-Time Debugging Code

Several techniques relate to the source code itself.
15.4.1.1 Use Debugging Macros

Perhaps the simplest compile-time technique is the use of the preprocessor to provide
conditionally compiled code. For example:
#i f de f DEBUG
fprintf (stderr, "myvar %d\n ", myvar);
fflush (stderr ) ;
#endif / * DEBUG * /
Adding -DDEBUG to the compiler command line causes the call to fprintf () to
execute when the program runs.
Recommendation: Send debug messages to stderr so that they aren' t lost down a
pipeline and so that they can be captured with an I/O redirection. Be sure to use
fflush () so that messages are forced to the output as soon as possible.
I NOTE The symbol DEBUG, while obvious, is also highly overused . It's a better
I idea to use a symbol specific to your program , such as MYAPPDEBUG. You can
I. even use different symbols for debugging code in different parts of your program ,
I such as file I/ O , data verification, memory management, and so on.
Scattering lots of # i f def statements throughout your code quickly becomes painful.
And too many #ifdefs obscure the main program logic. There's got to be a better way,
and indeed, a technique that's often used is to conditionally define special macros
for printing:
/ * TECHNIQUE 1 --- commonly used but not recommended, s e e text * /
/ * In applicati on header fi l e : * /
#ifdef MYAPPDEBUG
#define DPRINT O( msg ) fprintf(stderr, msg )
#define DPRINTl (msg, vl ) fprintf(stderr, msg, vl )
#define DPRINT2 (msg, vl, v2 ) fprintf(stderr, msg, vl, v2 )
#define DPRINT3 (msg, vl, v2, v3 ) fprintf(stderr, msg, vl, v 2 , v 3)
#else / * ! MYAPPDEBUG * /
#define DPRINT O( msg )
#define DPRINTl (msg, vl )
#define DPRINT2 (msg, vl, v2 )
#define DPRINT3 (msg, vl, v2, v3 )
#endif / * I MYAPPDEBUG * /
/ * In appl i cation s ou rce file : * /

DPRINTl ( "myvar = %d\n ", myvar);
DPRINT2 ( "vl = %d, v2 = %f\n", vl, v2) ;
There are multiple macros, one for each different number of arguments, up to
whatever limit you wish to provide. When MYAPPDEBUG is defined, the calls to the
DPRINTx() macros expand into calls to fprintf () . When MYAPPDEBUG isn't defined,
15.4 Programming for Debugging 579
then those same calls expand to nothing. (This is essentially how assert () works; we
described asse rt () in Section 12. 1, "Assertion Statements: assert () ," page 428.)
T his technique works; we h ave used it o urselves and seen it recommended in text-
books . H owever, it can be refined a bit further, reducing the number of macros down
ro one:
/* TECHNIQUE 2 --- most portable; recommended * /
/* In applicat ion header file : */
#ifdef MYAPPDEBUG
#define DPRINT(s tuff ) fprintf stuff
#else
#define DPRINT(stuff)
#endif
/ * In application s ou rce file : * /

DPRINT ( (stderr, "myvar = %d\n", myva r ) ) ; No te the double parentheses
Note how the macro is invoked, with two sets of parentheses! By making the entire
argument list for fprintf () into a single argument, you no longer need to have an
arbitrary number of debugging macros.
If you are using a compiler that conforms to the 1999 C standard, you have an addi-
tional choice, which produces the cleanest-looking debugging code:
/ * TECHNIQUE 3 - -- cleanest, but C99 o nly * /
/ * In applic ation header file: * /
#ifdef MYAPPDEBUG
#define DPRINT (me sg , ... )
#else
#define DPRINT(mesg, ... )
#endif
/ * In applic ation s ource file : */

DPRINT( "myvar = %d\n ", myvar);
DPRINT("vI = %d, v2 = %f\n", vI, v2);
The 1999 C standard provides variadic macros; that is , macros that can accept a
variable number of arguments . (This is similar to variadic functions, like p ri ntf ().)
In the macro definition, the three periods, ' ... " indicate that there will be zero or more
arguments. In the macro body, the special identifier __VA_ARGS __ is replaced with the
provided arguments, however many there are.
The advantage ro this mechanism is that only one set of parentheses is necessary
when the debugging macro is invoked, making the code read much more naturally.
580 Chapter 15 • Debuggin g
It also preserves the ability to have just one macro name, instead of multiple names that
vary according to the number of arguments. The disadvantage is that C99 compilers
are not yet widely available, reducing the portability of this construct. (However, this
situation will improve with time. )
Recommendation : Current versions of GCC do support C99 variadic macros. Thus,
if you know that you will never be using anything bur GCC (or some other C99 com-
piler) to compile your program, you can use the C99 mechanism. However, as of this
writing, C99 compilers are still not commonplace. So, if your code has to be compiled
by different compilers, you should use the double-parentheses-style macro.
15.4. 1.2 Avo id Expression Macros If Possible

In general, C preprocessor macros are a rather sharp, two-edged sword. They provide
you with great power, but they also provide a great opportunity to injure yourself. 8
For efficiency or code clarity, it's common to see macros such as this:
if (RS_is_null II today == TUESDAY ) .. .
At first glance, this looks fin e. The condition 'RS_ is_ null' is clear and easy to un-
derstand, and abstracts the details inherent in the test. The problem comes when you
try to print the value in GDB:
(gdb) print RS_ is_ nu ll
No symbol "RS_is_null" in current context .
In such a case, you have to track down the definition of the macro and print the
expanded value.
Recommendation: Use variables to represent important conditions in your program,
with explicit code to change the variable values when the conditions change.
Here is an abbreviated example, from i o . c in the gawk distribution:
8 Bjarn e StfO ustrup , th e creator of C++ , wo rked hard ro make the use of the C preprocessor completely unnecessary
in C ++. In our opinion, he didn 't quite succeed: #include is still needed, bur regular macros aren't. For C, th e
preprocessor remains a valuable rool, but it sh ould be used judiciously.
void set_RS ( )
{
if (RS->stlen == 0)
RS_is_null = TRUE ;
matchrec = rsnullscan ;
Once RS_ is_ null is set and maintained, it can be tested by code and printed from
within a debugger.
p
iI
J! NOTE Beginning with GCC 3.1 and version 5 ofGDB, if you compile your
program with the options -gdwarf -2 and -g3, you can use macros from within
GDB. The GOB manual states that the GOB developers hope to eventually find
I a more compact representation for macros, and that the -g 3 option will be
t~ subsumed into -g.
@
Ii However, only the combination ofGCC, GOB, and the special options allows
@ you to use macros this way: If you 're not using GCC (or if you're using an older
g.lb.~ version), you still have the problem. We stand by our recommendation to avoid
~ such macros if you can.
The problem with macros extends to code fragments as well. If a macro defines
multiple statements, you can ' t set a breakpoint inside the middle of the macro. This is
also true ofC99 and c++ inli n e functions: If the compiler substitutes the body of an
inl i ne function into the generated code, it is again difficult or impossible to set a
breakpoint inside it. This ties in with our advice to compile with - g alone; in this case,
compilers usually don' t do function inlining.
Along similar lines, it's common to have a variable that represents a particular state.
It's easy, and encouraged by many C programming books, to #de fi ne symbolic con-
stants for these states. For example:
/* The various states to be in when scanning for the end of a record. * /

#define NOSTATE 1 /* scanning not started yet (all) * /
#define INLEADER 2 / * skipping leading data (RS = "" ) * /
#define INDATA 3 / * in body of record (all) * /
#define INTERM 4 / * scanning terminator (RS = "", RS = regexp ) * /
int state;
state NOSTATE;
state I NLEADER;
if (sta te != INTERM )
At the source code level, this looks great. But again, there is a problem when you
look at the code from within GDB :
(gdb ) print state
$1 = 2
Here too, you're forced to go back and look at the header file to figure our what the
2 means. So, what's the alternative?
Recommendation: Use enums instead of macros to define symbolic constants. The

source code usage is the same, and the debugger can print the enums' values too.
An example, also from i o . c in g awk:
typedef enum scans tate (
NOSTATE, / * scanning not start ed yet (all) * /
INLEADER, / * skipping leading data (RS = '''' ) * /
INDATA , / * in body of record (all) * /
INTERM, / * scanning terminator (RS = "", RS = regexp ) * /
SCANSTATE;
SCANSTATE state;
... rest of code remains unchanged!
Now, when looking at st ate from within GOB , we see something useful:
(gdb) print state
$1 = NOSTATE
15.4.1.3 Reorder Code If Necessary

It's not uncommon to have a condition in an if or while consist of multiple com-
ponent tests, separated by && or 1I. If these tests are function calls (or even if they're
not) , it's impossible to single-step each separate part of the condition. GDB's s tep and
next commands work on the basis of statements, not expressions. (Splitting such things
across lines doesn't help, either.)
Recommendation: Rewrite the original code, using explicit temporary variables that
store return values or conditional results so that you can examine them in a debugger.
The original code should be maintained in a comment so that you (or some later pro-
grammer) can tell what's going on.
Here's a concrete example: the function do_i nput () from gawk's file i o . c :
1 / * do_input --- the main input processing loop * /
2
3 void
4 do_input ( )
5 (
6 IOBUF *iop;
7 extern int exiting;
8 int rval1, rva12, rva13 ;
9
10 (void ) setjmp (filebuf) ; / * for 'nextf ile' */
11
12 while (( iop = nextfile(FALSE)) != NULL) (
13 /*
14 * This was :
15 if ( inrec ( iop ) == 0)
16 while (in terpret (expr ession_value) && inrec(iop) 0)
17 continue ;
18 * Now expand it out for ease of debugging .
19 */
20 rval1 = inrec(iop);
21 i f (rvall == 0) (
22 for ( ;; ) (
23 rva12 = rva13 = -1; / * for debugging * /
24 rva12 = interpret ( expressi on_value ) ;
25 if (rva12 != 0)
26 rva13 = inrec(iop);
27 if ( rva12 == 0 II rva13 != 0)
28 break;
29
30
31 if (exiting )
32 break;
33
34
(The line numbers are relative to the start of the routine, not the file.) This function
IS the heart of g aWk 'S main processing loop. The outer loop (lines 12 and 33) steps
through the command-line data files. The comment on lines 13-19 shows the original
code, which reads each record from the current file and processes it.
A 0 return value from i nrec () indicates an OK status, while a nonzero return value
from in terpre t () indicates an OK status. When we tried to step through this loop,
verifYing the record reading process, it became necessary to perform each step
individually.
Lines 20-30 are the rewritten code, which calls each function separately, storing the
return values in local variables so that they can be printed from the debugger. Note
how line 23 forces these variables to have known , invalid values each time around the
loop: Otherwise, they would retain their values from previous loop iterations . Line 27
is the exit test; because the code has changed to an infinite loop (compare line 22 to
line 16) , the test for breaking out of the loop is the opposite of the original test.
As an aside, we admit to having had to study the rewrite carefully when we made it,
to make sure it did exactly the same as the original code; it did. It occurs to us now that
perhaps this version of the loop might be closer to the original:
/* possible replacement for li n es 22 - 29 * /
do
rva12 = rva13 = -1 ; / * for debugging * /
rva12 = interpret (expression_value ) ;
if (rva12 != O)
rva13 = inrec ( iop ) ;
while (rva12 != 0 && rva13 == 0);
The truth is, both versions are harder to read than the original and thus potentially
in error. However, since the current code works, we decided to leave well enough alone.
Finally, we note that not all expert programmers would agree with our advice here.
When each component of a condition is a function call, you can set a breakpoint on
each one, use step to step into each function , and then use finish to complete the
function. GDB will tell you the function 's return value, and from that point you can
use cant or step to continue. We like our approach because the results are kept in
variables, which can be checked (and rechecked) after the function calls , and even a
few statements later.
15.4.1.4 Use Debugging Helper Functions

A common technique, applicable in many cases, is to have a set offiag values; when
a Bag is set (that is, true) , a certain fact is true or a certain condition applies. This is
commonly done with #de f i n ed symbolic constants and the C bitwise operators.
15.4 Programming for Debuggi ng 585
(We discussed the use of bit flags and the bit manipulation operators in the sidebar in
Section 8.3.1, "POSIX Style: statvfs () and fstatvfs ( ) ," page 244.)
For example, gawk's central data structure is called a NODE . It has a large number of
fields, the last of which is a set of flag values. From the file awk . h:
typedef struct exp_node {
Lots of stuff omitted
unsigned short flags;
# define MALLOC 1 1* can be free'd * 1
# define TEMP 2 1* should be free'd *1
# define PERM 4 1* can't be free'd *1
# define STRING 8 1* assigned as string *1
# define STRCUR 16 1* string value is current * 1
# define NUMCUR 32 1* numeric value is current * 1
# define NUMBER 64 1* assigned as number *1
# define MAYBE_NUM 128 1* user input : if NUMERIC then
* a NUMBER *1
# define ARRAYMAXED 256 1* array is at max size * 1
# define FUNC 512 1* this parameter is really a
* function name; see awkgram . y * 1
# define FIELD 1024 1* this is a field * 1
# define INTLSTR 2048 1* use localized version * 1
} NODE;
The reason to use flag values is that they provide considerable savings in data space.
If the NOD E structure used a separate char field for each flag, that would use 12 bytes
instead of the 2 used by the unsigned short . The current size of a NODE (on an Intel
x86) is 32 bytes. Adding 10 more bytes would bump that to 42 bytes. Since gawk can
allocate potentially hundreds of thousands (or even millions) Of NODES,9 keeping the
size down is important.
What does this have to do with debugging? Didn' t we just recommend using enumS
for symbolic constants? Well, in the case ofOR'd values enums are no help, since they're
no longer individually recognizable!
Recommendation: Provide a function to convert flags to a string. If yo u have mul-
tiple independent flags, set up a general-purpose routine.
9 Seriously! People often run megabytes of data through gawk. Remember, no arbitrary Limits!
.1: NOTE What's unusual about these debugging functions is that application code
· never calls them. They exist only so that they can be called from a debugger. Such
•. functions should always be compiled in, without even a surrounding #ifdef,
• so that you can use them without having to take special steps . The (usually
I minimal) extra code size is justified by the developer's time savings.
First we'll show you how we did this initially. Here is (an abbreviated version of)
flags 2 s t r () from an earlier version of g a wk (3.0 .6):
/ * flags2str --- make a flags value readable * /
2
3 char *
4 flags 2str (flagval)
5 int flagval;
6
7 static char buffer [BUFSIZ] ;
8 char *sp;
9
10 sp buffer;
11
12 if (fl agva l & MALLOC) (
13 strcpy(sp, "MALLOC");
14 sp += strlen (sp);
15
16 if (f lagval & TEMP ) {
17 if (sp != buffer )
18 *sp++=' I' ;
19 strcpy(sp , "TEMP " ) ;
20 sp += strlen (sp ) ;
21
22 if ( flagva l & PERM) {
23 if (sp ! = buffer)
24 *sp++='i';
25 strcpy (sp, "PERM");
26 sp += strlen(sp);
27
. much more of the same, omitted for brevity ...
82
83 return buffer;
84
(The line numbers are relative to the start of the function.) The result is a string,
something like" MALL OC I PERM INUMBER " . Each flag is tested separately, and if present,
each one's action is the same: test if not at the beginning of the buffer so we can add
the ' I ' character, copy the string into place, and update the pointer. Similar function s
existed for formatting and displaying the other kinds of Hags in the program.
The code is bo th repetitive and error prone, and for gawk 3. 1 we were able to simplify
and generalize it. Here's how gawk now does it. Starting with this definition in awk. h :
/* for debugging purposes */
s truc t flagtab {
int val; Integer flag value
const char *name ; String name
} ;
This structure can be used to represent any set of Hags with their corresponding string
values. Each different group of Hags has a corresponding function that returns a printable
represen tation of the Hags that are currently set. From eval . c :
/ * flag s 2str - -- mak e a flags value readabl e */
const char *
flags2str ( in c flagval)
{
static const struct flagtab values[]
MAL LOC , " MALLOC " },
TEMP, "TEMP " },
PERM, " PERM " },
STRING, "STRING" },
STRCUR , " STRCUR " },
NUMCUR, "NUMCUR " },
NUMB ER , " NUMBER" },
MAYBE_NUM, "MAYBE_NUM" },
ARRAYMAXED, "ARRAYMAXED" },
FUNC, " FUNC" } ,
FIELD, " FIELD" } ,
INTLSTR, " INT LSTR " },
0, NULL},
};
r e turn genflags2str(flagval, values} ;
fl a g s2 s t r () defines an array that maps Bag values to strings. By convention, a 0

Hag value indicates the end of the array, The code calls g e nflags2str () ("general
Hags to string") to do the work. genflag s2str ( ) is a general-purpose routine that
converts a Hag value into a string. From eval . c:
1 /* genflags2str --- general routine to convert a flag value to a string * /

2
3 const char *
4 genflags2str(int flagval, const struct flagtab *tab )
5 {
6 static char buffer[BUFSIZ];
7 char *sp;
8 int i, space_left, space_needed;
9
10 sp = buffer;
11 space_left = BUFSIZ;
12 for (i = 0; tab[i] . name != NULL; i++) {
13 i f ((flagval & tab[i] .val ) != 0)
14 /*
15 * note the trick, we want 1 o r 0 for whether we need
16 * the ' I ' character .
17 */
18 space_needed = (strlen(tab[i] .name) + (sp != buffer )) ;
19 if (space_left < space_needed )
20 fata l (_ ( "buffer overfl ow in genflags2str" ) ) ;
21
22 if ( sp '= buffer)
23 * sp++ = 'I';
24 space_left--;
25
26 strcpy (s p, tab[i] . name ) ;
27 / * note ordering! * /
28 space_left -= strlen(sp);
29 sp += strlen (sp ) ;
30
31
32
33 re turn buffer;
34
(Line numbers are relative to the start of the function , not the file.) As with the pre-
vious version, the idea here is to fill in a static buffer with a string value such as
"MALLOC I PERM I STRING IMA YBE_NUM " and return the address of that buffer. We discuss
the reasons for using a static buffer shortly; first let's examine the code.
The sp pointer tracks the position of the next empty spot in the buffer, while
space_left tracks how much room is left; this keeps us from overflowing the buffer.
The bulk of the function is a loop (line 12) through the array of flag values . When
a flag is found (line 13), the code computes how much space is needed for the string
(line 18) and tests to see if that much room is left (lines 19-20) .
The test 'sp ! = buffer' fails on the first flag value found, returning o. On subse-
quent flags, the test has a value of 1. This tells us if we need the' I ' separator character
between values. By adding the result (l or 0) to the length of the string, we get the
correct value for space_needed. The same test , for the same reason, is used on line
22 to control lines 23 and 24, which insert the ' I ' character.
Finally, lines 26-29 copy in the string value, adjust the amount of space left, and
update the sp pointer. Line 33 returns the address of the buffer, which contains the
printable representation of the string.
Now, what about that static buffer? Normally, good programming practice dis-
co urages the use of functions that return the address of static buffers: It's easy to have
multiple calls to such a function overwrite the buffer each time, forcing the caller to
copy the returned data.
Furthermore, a static buffer is by definition a buffer of fixed size. What happened
to the GNU "no arbitrary limits" principle?
The answer to both of these questions is to remember that this is a debugging function.
Normal code never calls genflags2 str ( ) ; it's only called by a human using a debugger.
No caller holds a pointer to the buffer; as a developer doing debugging, we don ' t care
that the buffer gets overwritten each time we call the function .
In practice, the fixed size isn' t an iss ue either; we know that BUFSIZ is big enough
to represent all the flags that we use. Nevertheless, being experienced and knowing that
things can change, genflags2st r () has code to protect itself from overrunning the
buffer. (The space_left variable and the code on lines 18-20.)
As an aside, the use ofBUFSIZ is arguable. That constant should be used exclusively
for I/O buffers, but it is often used for general string buffers as well. Such code would
be better off defining explicit constants, such as FLAGVALS IZE, and using
's izeof (buffer)' on line II.
Here is an abbreviated GDB session showing flags 2st r () in use:

$ gdb gawk Start GOB on gawk

GNU gdb 5 .3
(gdb) break do-print Set a breakpoint

Breakpoint 1 at Ox805a584 : file builtin.c, line 1547.
(gdb) run 'BEGIN { print "hello, world" }' Start it running
Starting program: I home /a rnold/Gnu /gawk/g awk-3.1 . 4/gawk 'BEGIN { print
"hello, world" }'
Breakpoint 1, do-print (tree=Ox80955b8) at builtin . c:1547 Breakpoint hit

1547 struct redirect *rp = NULL;
(gdb) print *tree Print NODE
$1 = {sub = {nodep = {l = {lptr = Ox8095598 , param_name = Ox8095598 "xU\t\b",
11 134829464}, r = {rptr = OxO, pptr = 0, preg = OxO, hd OxO,
av = OxO, r_ent = O}, x = {extra = OxO, x l = 0, param_list = OxO} ,
name = OxO, number = 1, reflags = O}, val = {
fltnum = 6.6614606209589101e-316, sp = OxO, slen = 0, sref = 1,
idx = O}, hash = {next = Ox8095598, name = OxO, length = 0, value OxO,
ref = 1}}, type = Node_K-print, flags = 1}
(gdb) print flags2str(tree->flags) Print (lag value
$2 = Ox80918aO "MALLOC"
(gdb) next Keep going
1553 fp = redirect_to_fp(tree->rnode, & rp) ;
1588 efwrite(t[i]->stptr, sizeof(char) , t[i]->stlen, fp,

"print", rp, FALSE ) ;
(gdb) print *t[i) Print NODE again
$4 = {sub = {nodep = {l = {lptr = Ox8095598, param_name Ox8095598 "xU\t\b",
11 = 134829464}, r = {rptr = OxO, pptr = 0, preg = OxO, hd = OxO,
av = OxO, r_ent = O}, x = {extra = Ox8095ad8, xl = 134830808,
param_list = Ox8095ad8}, name = Oxc <Address Oxc out of bounds>,
number = 1, reflags = 4294967295}, val = {
fltnum = 6. 6614606209589101e-316, sp Ox8095ad8 "hello, world",
slen = 12, sref = 1, idx = -1}, hash {next = Ox8095598, name = OxO,
length = 134830808, value = Oxc, ref 1}}, type = Node_val, flags = 29}
(gdb) print flags2str(t[i)->flags) Print {lag value
$5 = Ox80918aO "MALLOClpERMISTRINGISTRCUR"
We hope you'll agree that the current general-purpose mechanism is considerably

more elegant than the original one, and easier to use.
Careful design and use of arrays of struc ts can often replace or consolidate repeti-
tive code.
15.4 Programmi ng for Debugging 59 1
1 SA.l.S Avoi d Un io ns Whe n Possi bl e

"There's no such t hing as a free lunch."
-Lazaru s Long-
The C union is a relatively esoteric facility. It allows you to save memory by storing
different items within the same physical space; how the program treats it depends on
how it's accessed:
/ * ch15-union .c brief demo of union usage . * /
int main (void )
union i_f (
int i;
float f ;
u;
u . f = 12 . 34; / * Assign a floating point value */

printf ('% f also looks like %# x \n' . u . f. u .i );
exit (0) ;
Here is what happens when the program is run on an Intel x86 GNU/Linux system:
$ ch1 5-u nion
12 . 340000 also looks like Ox414570a4
The program prints the bit pattern that represents a Boating-point number as a hexadec-
imal integer. The storage for the two fields occupies the same memory; the difference
is in how the memory is treated: u . f acts like a Boating-point number, whereas the
same bits in u. i act like an integer.
Unions are particularly useful in compilers and interpreters, which often create a tree
structure representing the structure of a source code file (called a parse tree). This
models the way programming languages are formally described: if statements, while
statements, assignment statements, and so on are all instances of the more genenc
"statement" type. Thus, a compiler might have something like this:
struct if_stmt { ... }; Structure for IF statement
struct while_stmt { ... }; Structure for WHILE statement
struct for_stmt { . . . } ; Structure for FOR statement
... structures for other statement types ..
typedef enum stmt_type {

IF. WHILE. FOR. . ..
} TYPE; What we actually have
/ * This contains the type and unions of the i nd i vidua l kinds of statements . * /
struct statement (
TYPE type;
union stmt
struct if_stmt if_st;
s truct while_stmt while_st ;
struct for_ stmt for_st ;
u;
) ;
Along with the union, it is conventional to use macros to make the components of
the union look like they were fields in a st r uct . For example:
#define if_s u. if st Socanuse s - >if_s instead of s - >u . if_s t
#define while_s u.while_s t And so on ...
At the level just presented, this seems reasonable and looks manageable. The real
world, however, is a more complicated place, and practical compilers and interpreters
often have several levels of nested s truc ts and unions. This includes gawk, in which
the definition of the NODE , its flag values, and macros for accessing un i on components
takes over 120 lines!l O Here is enough of that definition to give you a feel for what's
happening:
typedef struct exp_node
union (
struct
uni on
struct exp_node *lp t r ;
char *param_name ;
long 11;
1;
uni on (
) r ;
union (
) x;
char *na mei
s hort number;
unsigned lo n g re f lags ;
nodep ;
10 W e inh erited this design. In general it works, but it does have its problems. T he point of thi s section is to pass
on th e experience we've acquired working with unions.
struct {
AWKNUM fl tnum;
char *sp;
size_t slen ;
long sref;
int idx ;
val;
struct {
struct exp_node *next;
char *namei
size_t length ;
struct exp_node *value;
long ref ;
hash;
#define hnext sub . hash . next
#define hname sub . hash . name
#define hlength sub . hash . length
#define hvalue sub . hash . value
} sub;
NODETYPE type ;
unsigned short flags;
} NODE ;
#define vname sub . n odep .name

#define exec_count sub . nodep . reflags
#define lnode sub . nodep . l . lptr

#define nextp sub . nodep . l . lptr
#define source file sub . nodep . name
#define source - line sub.nodep . number
#define par am_ cnt sub . nodep . number
#define param sub . nodep . l . param_name
#define stptr sub . val . sp

#define stlen sub . val . slen
#define stref sub . val . sref
#define stfmt sub . val . idx
#define var_value lnode
The NODE has a uni on inside a s truct inside a un i on inside a st r uct ! (Ouch.) On
top of that, multiple macro "fields" map to the same struc t /un i on components, de-
pending on what is actually stored in the NODE ! (Ouch, again. )
The benefit of this complexity is that the C code is relatively clear. Something like
'NF_ node - >var_ v a lue - >slen' is straightforward to read.
There is, of course, a price to pay for the flexibility that unions provide. When your
debugger is deep down in the guts of your code, you can' t use the nice macros that
appear in the source. You must use the real expansion. 1 1 (And for that, you have to find
the definition in the header file.)
For example, compare 'NF _node - >var_ v a lue - >slen' to what it expands to:
'NF_node - >s ub.nodep . l . l ptr- >sub . val. sl en'! You must type the latter into GDB
to look at your data value. Look again at this excerpt from the earlier GDB debugging
sessIOn:
(gdb) print *tree Print NODE
$1 = {sub = {nodep = {l = {lptr = Ox8 095598, param_name = Ox8095598 "xU\t\b",
11 134829464}, r = {rptr = OxO, pptr = 0, preg = OxO, hd OxO,
av = OxO, r_ent = O}, x = {ex tra = OxO, xl = 0, param_list = OxO},
name = OxO, number = 1, reflags = O}, val = {
fltnum = 6.6614606209589101e-316, sp ~ OxO, slen = 0 , sref = 1,
idx = a}, hash = {next = Ox8095598, name = OxO, length = 0 , value = OxO,
ref = I}}, type = Node_K-print, flags = I}
That's a lot of goop. However, GDB does make this a little easier to handle. You
can use expressions like ' ($1) . sub . val . s len' to step through the tree and list data
structures.
There are other reasons to avoid unions. First of all, unions are unchecked. Nothing
bur programmer attention ensures that when you access one part of a union, you are
accessing the same part that was last stored. We saw this in ch1 5- un i on . c, which ac-
cessed both of the union's "identities" simultaneously.
A second reason , related to the first , is to be careful of overlays in complicated nested
stru ct /union combinations . For example, an earlier version of gawk 12 had this code:
/* n->lnode overlays the array size, don't unref it if array */
if (n->type != Node_var_array && n->type != Node_array_ref)
unref(n->lnode) ;
11 Again,Gee 3. 1 or newer and GDB 5 can let you use macros direcrly, bur onl y if you're using th em rogeth er,
with specific option s. This was described earlier, in Secti on 15.4.1.2, "Avoid Expression Macros If Possible,"
page 580.
12 This part of the code has since been revised, and th e example lines are no longer there.
Originally, there was no i f, just a call ro unref () , which frees the NODE pointed ro
by n->lnode . However, it was possible ro crash g a wk at this point. You can imagine
how long it rook, in a debugger, ro track down the fact that what was being treated as
a pointer was in reality an array size!
As an aside, unions are considerably less useful in C++. Inheritance and object-ori-
ented features make data structure management a different ball game, one that is con-
siderably safer.
Recommendation: Avoid uni ons if possible. If not, design and code them carefully!
15.4.2 Runtime Debugging Code

Besides things you add ro your code at compile time, yo u can also add extra code ro
enable debugging features at runtime. This is particularly useful for applications that
are installed in the field, where a cusromer's system won ' t have the source code installed
(and maybe not even a compiler!).
This section presents some runtime debugging techniques that we have used over
the years, ranging from simple ro more complex. Note that our treatment is by no
means exhaustive. This is an area where it pays ro have some imagination and ro use it!
15.4.2.1 Add Debugging Options and Variables

The simplest technique is ro have a command-line option that enables debugging.
Such an option can be conditionally compiled in when you are debugging. Bur it's
more flexible ro leave the option in the production version of the program. (You may
or may not also wish ro leave the option undocumented as welL This has various rradeoffs:
Documenting it can allow your cusromers or clients ro learn more about the internals
of your sys tem, which you may not want. On the other hand, not documenting it seems
rather sneaky. If you're writing Open Source or Free Software, it's better ro document
the option.)
If your program is large, you may wish your debugging option (0 take an argument
indicating what subsystem should be debugged. Based on the argument, you can set
different flag variables or possibly different bit flags in a single debugging variable. Here
is an outline of [his technique:
struct option options[) = {

( "debug", required_argument, NULL, 'D' ),
int main ( int argc, char **argv)
int c;
while (( c = getopt_long(argc, argv, " . .. D : " )) != - 1 ) (

switch (c ) (
case 'D':
parse_debug (optarg ) ;
break;
The pars e _ d e bug () function reads through the argument string. For example, it
could be a comma- or space-separated string of subsystems, like" file, memo r y , ip c ".
For each valid subsystem name, the function would set a bit in a debugging variable:
extern int debugging;
void parse_debug (c o nst char *subsystems )

(
char *sp;
for (sp = subsystems ; *sp != ' \0 ' ; ) (

if (strncmp(sp, "file", 4 ) == 0) (
debugging 1= DEBUG_FILE;
sp += 4;
else if ( strncmp (sp, "memory", 6 ) 0) {
debugging 1= DEBUG_MEM;
sp += 6;
else i f (strncmp (sp, "ipc" , 3) 0) (
debugging 1= DEBUG_ I PC;
sp += 3;
while (*sp , , II *sp ',' )

SP++i
Finally, application code can then test the flags:

if ((debugging & DEBUG_F IL E) != 0) .. . In the I/ O part of the program
if (( debugging & DEBUG_MEM) != 0) ... In the memory manager

It is up to you whether to use a single variable with Hag bits, separate variables, or
even a debugging array, indexed by symbolic constants (preferably from an enum) .
The cost of leaving the debugging code in your production executable is that the
program will be larger. Depending on the placement of your debugging code, it may
also be slower since the tests are always performed, but are always false until debugging
is turned on. And, as mentioned, it may be possible for someo ne to learn about your
program, which you may not want. Or worse, a malevo lent user co uld enable so much
debugging that the program slows to an unusable state! (This is called a denial of
service attack. )
The benefit, which can be great, is that your already installed program can be rein-
voked with debugging turned on, without requiring you to build, and then download,
a special version to your custo mer site. When the software is installed in remote places
that may not have people around and ail yo u can do is access the system remotely
through the Internet (or worse, a slow telephone dial-in!), such a feature can be
a lifesaver.
Finally, you may wish to mix and match: use conditionally compiled debugging code
for fine-grained , high-detail debugging, and save the always -present code for a coarser
level of output.
15.4.2.2 Use Special Environment Variables

Another useful trick is to have your application pay attention to special environment
variables (documented or otherwise). This can be particularly useful for testing. Here's
another example from our experience with gawk, but first, some background.
g a wk uses a function named optimal_b u fsi z e () to obtain the optimal buffer size
for I/O. For small files, the function returns the file size. Otherwise, if the filesystem
defines a size to use fo r I/O, it returns that (the st_b l k size member in the s t ruc t
s t a t , see Section 5.4.2, "Retrieving File Information, " page 141). If that member isn't
available, o ptimal_bufsize() returns the BUFSIZ constant from <stdio . h >. The
original function (in posix / gawkmi sc. c ) looked like this:
/ * optimal_bufsize --- determine optimal buffer size */

2
3 int
4 optimal_bufsize(fd, stb ) int optimal_bufsize(int fd, struct stat *stb);
5 int fd;
6 struct stat *stb;
7
8 / * force all members to zero in case OS doesn ' t use all of them . * /
9 memset (stb, '\0', sizeof(struct stat) ) ;
10
11 /*
12 * System V.n, n < 4, doesn't have the file system block size in the
13 * stat structure. So we have to make some sort of reasonable
14 * guess . We use stdio's BUFSIZ, since that is what it was
15 * meant for in the first place.
16 */
17 #ifdef HAVE_ST_BLKSIZE
18 #define DEFBLKSIZE (stb->st_blksize stb->st blksize BUFSIZ )
19 #else
20 #define DEFBLKSIZE BUFSIZ
21 #endif
22
23 if (isatty(fd) )
24 return BUFSIZ;
25 if ( fstat ( fd, stb ) == -1 )
26 fatal ( "can't stat fd %d (%s ) ", fd, strerror (errno )) ;
27 if (lseek ( fd, (o ff_t )O , 0 ) == - 1 ) / * not a regular file * /
28 return DEFBLKSIZE;
29 if (stb->st_size > 0 && stb->st_size < DEFBLKSIZE) /* small file */
30 return stb->st_ size;
32 }
The constant DEFBLKSIZE is the "default block size"; that is, the value from the
struct stat, or BUFSIZ . For terminals (line 23) or for files that aren't regular files
(lseek () fails , line 27), the return value is also BUFSIZ. For regular files that are small,
the file size is used. In all other cases, DEFBLKSIZE is returned. Knowing the "optimal"
buffer size is particularly useful on filesystems in which the block size is larger
than BUFSIZ.
We had a problem whereby one of our test cases worked perfectly on our development
GNU/Linux system and every other Unix system we had access to. However, this test
would fail consistently on certain other systems.
For a long time, we could not get direct access to a failin g system in order to run
GOB . Eventually, however, we did manage to reproduce the p ro blem ; it turned out to
be rel ated to the size of the buffer gawk was us ing for reading d ata files: On the failin g
sys tems, the buffer size was larger than fo r o ur development sys tem .
We wanted a way to be able to reproduce the problem on our development machine:
The failing sys tem was nine time zo nes away, and running GOB interactively across
the Atlantic Ocean is painful. We reproduced the problem by having
optimal_bufsi ze () look at a special environment variable, AWKBUFSI ZE. When the
value is " exact ", o ptimal_bufsize() always returns the size of the file, whatever
that may be. If the value of AWKBUFSIZ E is some integer number, the function returns
that number. Otherwise, the fun ction falls back to the previo us algorithm. This allows
us to run tests without having to constantly recompile gawk . For example,
$ AWKBUFSIZE=42 make check
This runs the gawk tes t suite, using a buffer size of 42 bytes. (The test suite passes.)
Here is the modified version of o ptima l_bufsize () :
1 / * optimal_buf size --- determine optimal buffer size * /
2
3 /*
4 * Enhance this for debugg ing purposes, as follow s:
5
6 * Al way s stat the file, stat buffer is used by higher-level code .
7
8 * if (AWKBUF SIZE == "exact")
9 return the file size
10 * else if (AWKBUFSIZE == a number)
11 always return that number
12 * el se
13 if the size is < defaul t blocksiz e
14 return the si ze
15 else
16 return default blocksize
17 end if
18 * endif
19
20 * Hair comes in an eff o rt to only deal with AWKBUFSIZE
21 * once , the first time this routine is called, ins tead of
22 * e very time . Perf ormance , dontyaknow .
23 */
24
25 size_t
26 optimal_bufsi ze ( f d, stb)
27 int fd ;
28 struct stat *stb;

29 {
30 char *va l ;
31 static size_t env_val = 0 ;
32 static short first TRUE;
33 static short exact = FALSE;
34
35 1* force all members to zer o in case OS doesn't use all of them. * 1
36 memset (stb, ' \ 0 ' , sizeof (struct stat));
37
38 1* always stat, in case stb is used by higher level code . * 1
39 if (fstat ( fd, stb ) == -1 )
40 fatal("can't stat fd %d ( %s ) ", fd, strerror (errno));
41
42 if ( first )
43 first = FALSE;
44
45 if (( va l = getenv ( "AWKBUFSIZE")) ! = NULL ) {
46 if ( strcmp (va l , "exact" ) == 0 )
47 exact = TRUE;
48 else if ( ISDI GIT ( *val )) {
49 for (; *va l && ISDIGIT ( *val ) ; val++ )
50 env_val = (env_val * 1 0) + *va l - ' 0 ';
51
52 return env_val;
53
54
55 e l se if ( ! exact && env_val > 0)
56 return env_va l ;
57 1* else
58 fall through *1
59
60 1*
61 * System V.n, n < 4, d o esn't have the file system block size in the
62 * stat structure. So we have t o make some sort o f reas o nab le
63 * guess . We use stdi o 's BUFSIZ, since that is what it was
64 * meant f o r in the fir st p l ace .
65 *1
66 #ifdef HAVE_ST_BLKSIZE
67 #define DEFBLKSI ZE (stb->st_blksize > 0 ? stb->st_ blksize BUFSIZ )
68 #e l se
69 #define DEFBLKSIZE BUFSIZ
70 #endi f
71
72 if (S_ISREG ( stb->st_mode ) 1* regular file * 1
73 && 0 < stb->st_ size 1* non - zero size * 1
74 && (stb->st_size < DEFBLKSIZE 1* small file * 1
75 II exact )) 1* o r debugging * 1
76 return stb->st_size; 1* use file size * 1
77
79
The comment on lines 3-23 explains the algorithm. Since searching the environment
can be expensive and it only needs to be done once, the function uses several sta tic
variables to collect the appropriate information the first time.
Lines 42-54 execute the first time the function is called, and only the first time. Line
43 enforces this condition by setting firs t to false. Lines 45-54 handle the environment
variable, looking for either" exact " or a number. In the latter case, it converts the
string value to decimal, saving it in env_va l. (We probably should have used strt oul ()
here; it didn't occur to us at the time.)
Line 55 executes every time but the first. If a numeric value was given, the condition
will be true and that value is returned (line 56). Otherwise, it falls through to the rest
of the function.
Lines 60-70 define DEFBLK SIZE; this part has not changed. Finally, lines 72- 76 re-
turn the file size if appropriate. If not (line 78), DEFBLKS IZE is returned.
We did fix the problem,1 3 but in the meantime, we left the new version of
optirnal_bufsiz e () in place, so that we could be sure the problem hasn't reoccurred.
The marginal increase in code size and complexity is more than offset by the increased
flexibility we now have for testing. Furthermore, since this is production code, it's easy
to have a user in the field use this feature for testing, to determine if a similar problem
has occurred. (So far, we haven' t had to ask for a test, but it's nice to know that we
could handle it if we had to.)
15.4.2.3 Add Logging Code

It is often the case that your application program is running on a system on which
you can't use a debugger (such as at a customer site). In that case, yo ur goal is to be
able to examine the program's internal state, but from the outside. The only way to do
that is ro have the program itself produce this information for you.
There are multiple ways to do this:
• Always log information to a specific file. This is simplest: The program always
writes logging information. You can then look at the file at your convenience.
13 By rewriring rhe buffer management code!

The disadvantage is that at some point the log file will consume all available disk
space. Therefore, you should have multiple log files, with yo ur program switching
to a new one periodically.
Brian Kernighan recommends naming the log files by day of the week:
myapp. log. sun, myapp. log . mon, and so on. The advantage here is that you
don't have to manually move old files out of the way; you get a week's worth of
log files for free.
• Write to a log file only if it already exists. When your program starts up, if the log
file exists, it writes information to the log. Otherwise, it doesn't. To enable logging,
first create an empty log file.
• Use a fixed-format for messages, one that can be easily parsed by scripting languages
such as awk or Perl, for summary and report generation.
• Alternatively, generate some form ofXML, which is self-describing, and possibly
convertible to other formats. (We're not big fans ofXML, but you shouldn't let
that stop you.)
• Use sysl og () to do logging; the final disposition of logging messages can be
controlled by the system administrator. (sysl og ( ) is a fairly advanced interface;
see the syslog(3) manpage.)
Choosing how to log information is, of course, the easy part. The hard part is
choosing what to log. As with all parts of program development, it pays to think before
you code. Log information about critical variables. Check their values to make sure
they're in range or are otherwise what you expect. Log exceptional conditions; if
something occurs that shouldn't, log it, and if possible, keep going.
The key is to log only the information you need to track down problems, no more
and no less.
15.4.2.4 Runtime Debugging Files

In a previous life, we worked for a startup company with binary executables of the
product installed at customer sites. It wasn't possible to attach a debugger to a running
copy of the program or to run it from a debugger on the customer's system. The main
component of the product was not started directly from a command line, but indirectly,
through shell scripts that did considerable initial setup.
15.4 Programmi ng for Debuggi ng 603
To make the program start producing logging information, we came up with the
idea of special debugging fil es. When a file of a certain name existed in a certain direc-
tory, the program would produce informational messages to a log file that we could
then download and analyze. Such code looks like this:
struct sta t sbuf;
e xcern inc do_logging; / * initialized to zer o * /
if ( sta t( " /pat h / to / magic/ . file ", &sbuf) == 0)

d o_logging = TRUE ;
if (do_ logging) {
logging code h ere : open fil e , write info , close file, etc .
}
The call to s ta t ( ) happened for each job the program processed. Thus, we could
dynamically enable and disable logging without having to stop and restart the
application!
As with debugging options and variables, there are any number of variations on this
theme: different files that enab le logging of information about different subsystems,
debugging directives added into the debugging file itself, and so on. As with all features ,
you sho uld plan a design for what yo u will need and then implement it cleanly instead
of hacking out some quick and dirty code at 3:00 A .M . (a not uncommon possibility in
startup companies, unfortunately).
~
g NOTE All that g litters is not gold. Special d e bugging file s are but one example
~ of tec hniques known as back doors-one or more ways for d evelopers to do
@
ill undocumented things with a program, usually for nefarious purposes. In our
II instance, the back door was entire ly benign . But an unscrupulous developer
' could just as easily arrange to generate and download a hidden copy ofa
•
" customer list, personnel file , or other sensiti ve data. For this reason alone, you
I should think extra hard about whether this technique is usable in your
% application.
R
15.4.2.5 Add Special Hooks for Breakpoints

Often, a problem may be reproducible, but only after your program has first processed
many megabytes of input data. Or, while you may know in which function your program
is failing, the failure occurs only after the function has been called m any hundreds, or
even thousands, of times .
This is a big problem when you're working in a debugger. If you set a breakpoint in
the failing routine, yo u have to type the continue command and press ENTER hun-
dreds or thousands of times to get your program into the state where it's about to fail.
This is tedious and error prone, to say the least! It may even be so difficult to do that
you'll want to give up before starting.
The solution is to add special debugging "hook" functions that your program can
call when it is close to the state you're interested in.
For example, suppose that you know that the check_ sa lary () function is the one
that fails, but only when it's been called 1,427 times. (We kid you not; we've seen some
rather strange things in our time.)
To catch check_salary () before it fails , create a special dummy function that does
nothing but return , and then arrange for chec k_ salary () to call it just before the
1,427th time that it itself is called:
/ * debug_dummy --- debugging h oo k funct i on * /
v o i d debug_dummy (v oid) { return; }
struct salary *check_salary( void )

{
... real variable declarations here.
s tatic int count = 0; / * for debugging * /
if (++count == 1426)
debug_dummy ( ) ;
... rest of the code here.
Now, from within GDB, set a breakpoint in debu9_durruny ( ), and then run the
program normally:
(gdb ) break debug_ du=y Set breakpoint for dummy fun ction
Breakpoint 1 at Ox 80558 85 : file whizprog .c, line 3137 .
(gdb ) run Start program running
Once the breakpoint for debu9_durruny () is reached, you can set a second breakpoint
for check_salary () and then continue execution:
(gdb ) run Start program running
Starting program : /home / arn old/ whizprog
Breakpoi nt 1, debug_dummy () at whizprog . c, line 31 37

3137 void debug_dummy (void ) { return; } Breakpoint reached
(gdb ) break check_ salary Set breakpoint for fun ction of interest
Breakpoin~ 2 at Ox 8057 913 : file whizprog. c, li ne 314 0 .
(gdb) cont
15.5 Debugging Tools 605
When the seco nd breakpoint is reached, the program is about to fail and yo u can
single-step through it, doing whatever is necessary to track down the problem.
Instead of using a fixed constant ('++count = = 1426'), yo u m ay wish to have a
global variable that can be set by the debugger to whatever value you need. This avoids
the need to recompile the program.
For gawk, we have gone a step further and brought the debugging h ook facility in to
the language, so the hook function can be called from the awk ptogram. When compiled
for debugging, a special do-nothing function named s topme ( ) is available. This function
in turn calls a C function of the sam e name. This allows us to put calls to stopme ( )
into a failing awk program right before things go wrong. For example, if gawk is pro-
ducing bad results for an awk program on the 1,200th input record, we can add a line
like this to the awk program:
NR == 1198 ( stopme()} # Stop for debugging when Number of Rec ords == 1198
... rest ofawk program as before ..

Then, from within C DB, we can set a breakpo int on the C function stopme () and
run the awk program. Once that breakpoint fires, we can then set breakpoints on the
other parts of gawk where we suspect the real problem lies.
The hook-fun ction technique is useful in and of itself. However, the abi li ty to bring
it to the application level multiplies its useful ness, and it has saved us un to ld hours of
debugging time when tracking down obscure problems.
15.5 Debugging Tools

Besides GOB and w hatever source code hooks you use for general debugging, there
are a number of useful packages that can help find different kinds of problems. Because
dynamic memory management is such a difficult task in large-scale program s, many
tools focus on that area, often acting as d rop-in replacements for malloc () and free ( ) .
There are commercial too ls that do many (or all) of the same things as the programs
we describe, but not all of them are available for GN U/Linux, and many are quite ex-
pensive. All of the packages discussed in this section are freely available.
15.5.1 The dbug Library - A Sophisticated printf ( )

The first package we examine is the d bug library. It is based on the idea of condition-
ally compiled debugging code we presented earlier in this chapter, but carries things
much further, providing relatively sophisticated runtime tracing and conditional debug
output. It implements many of the tips we described, saving you the trouble of imple-
menting them yourself.
The dbu g library, written by Fred Fish in the early 1980s, has seen modest enhance-
ments since then. It is now explicitly in the public domain, so it can be used in both
free and proprietary software, without problems. It is available from Fred Fish's FTP
archive,1 4 as both a compressed tar file , and as a ZIP archive. The documentation
summarizes dbu g well :
dbug is an example of an internal debugger. Because it requires internal in-
strumentation of a program, and its usage does not depend on any special
capabilities of the execution environment, it is always available and will exe-
cute in any environment that the program itself will execute in. In addition,
since it is a complete package with a specific user interface, all programs
which use it will be provided with similar debugging capabilities. This is in
sharp contrast to other forms of internal instrumentation where each devel-
oper has their own, usually less capable, form of internal debugger. ..
The dbug package imposes only a slight speed penalty on executing programs,
typically much less than 10 percent, and a modest size penalty, typically 10
to 20 percent. By defining a specific C preprocessor symbol both of these
can be reduced to zero with no changes required to the source code.

The following list is a quick summary of the capabilities of the dbug package.
Each capability can be individually enabled or disabled at the time a program
is invoked by specifying the appropriate command line arguments.
• Execution trace showing function level control Bow in a semi-graphical
manner using indentation to indicate nesting depth.
• Output the values of all, or any subset of, key internal variables .
14 ftp: // ftp.n inemoons . com / pub / dbug /

15.5 Debugging Too ls 607
• Limit actions to a specific set of named functions.

• Limit function trace to a specified nes ting depth .
• Label each output line with source file name and line number.
• Label each output line with name of current process.
• Push o r pop internal debugging state to allow execution with built-in de-
bugging defaults.
• Redirect the debug output stream to standard output (s t dout) o r a named
file. The default output stream is standard error (s tde rr) . T he redirection
mechanism is completely independent of normal co mmand line redirection
to avoid output conflicts.
T he dbug package requires you to use a certain discipline when writing your code.
In particular, yo u have to use its macros when doing a functi on return or calli ng
setjmp () and longjmp ( ). You have to add a single macro call as the first executable
statement of each function and call a few extra macros from main ( ) . Finally, you have
to add a debugging command-line option: By convention this is -#, which is rarely, if
ever, used as a real option. In return for the extra wo rk, you get al l the benefits just
outlined. Let's look at the example in the manual:
1 #include <stdio .h>
2 #include "dbug . h"
3
4 int
5 main (argc , argv )
6 int argc;
7 cha r *argv[J;
8 (
9 regi ster int result, ix ;
10 extern i nt factorial () , atoi () ;
11
12 DBUG_ENTER ("main");
13 DBUG_PROCESS (a rgv[ OJ) ;
14 DBUG_PUSH_ENV ( " DBUG" ) ;
15 for ( ix = 1; ix < argc && argv[ixJ [OJ '-' ; ix++ ) {
16 switch (argv[ixJ [ lJ ) {
17 case '#' :
18 DBUG_PU SH (&(argv[i x J [2J ) );
19 break ;
20
21
22 for ( ; ix < argc; ix++ ) {

23 DBUG_PRINT (" args ", ( "argv[%d] = 'lis", i x , argv [ix] )) ;
24 r esult = facto ria l (ato i (argv [ix])) ;
25 prin tf ("%d\n", r esult ) ;
26 fflush (stdout) ;
27
28 DBUG_RETURN (0) ;
29
This program illustrates most of the salient points. The DBUG_ENTER () macro
(line 12) must be called after any variable declarations and before any other code. (This is
because it declares some private variables of its own. 15 )
The DBUG_ PROC E SS () macro (line 13) sets the name of the program, primarily for
use In output messages from the library. This macro should be called only once,
from ma in ( ) .
The DB UG_PUSH_ENV () macro (line 14) causes the library to look at the named en-
vironment variable (DBUG in this case) for control strings. (The dbug control strings
are discussed shortly.) The library is capable of saving its current state and using a new
one, creating a stack of saved states. Thus, this macro pushes the state obtained from
the given environment variable onto the stack of saved states. As used in this example,
the macro creates the initial state. If there is no such environment variable, nothing
happens. (As an aside, D BUG is rather generic; perhaps something like GAWK_ DBUG [for
gawk] would be better.)
The DBUG_PUS H () macro (line 18) passes in the control string value obtained from
the - # command-line option. (New code should use get opt () or getopt_l ong ()
instead of manual argument parsing.) This is normally how debugging is enabled, but
using an environment variable as well provides additional flexibility.
The DBUG_PRINT () macro (line 23) is what produces output. The second argument
uses the technique we described earlier (see Section 15.4.1.1 , "Use Debugging Macros,"
page 577) of enclosing the entire printf () argument list in parentheses, making it a
single argument as far as the C preprocessor is concerned. Note that a terminating
newline character is not provided in the format string; the dbug library provides the
newline for you.
15 C 99, which allows vari able declarations mixed with execurable code, makes this less of a problem, bur remember
that thi s package was designed for K&R C.
When printing, by default, the dbug library outputs all DBUG_PRINT () statements.
The first argument is a string that can be used to limit the output just to DBUG_PRI NT ( )
macros using that string.
Finally, the DBUG_RE TURN () macro (lin e 28) is used instead of a regular ret ur n
statement to return a value. There is a corres ponding DBUG_ VO ID_RETURN m acro for
use in v oid functions.
The rest of the program is completed with the fact orial () function:
1 # include <std i o .h>
2 # include "dbu g . h"
3
4 int fact o rial (va lu e )
5 register int v a lu e;
6
7 DBUG_ENTER ( " fa cto r ial" ) ;
8 DBUG_PRINT ( " find", ( "find %d facto rial " , v a lue )) ;
9 i f (value > 1) {
10 value *= fac tor i al (value - 1) ;
11
12 DBUG_PRI NT ( "result", ( " result is %d ", val ue)) ;
13 DBUG_R ETURN (valu e) ;
14
Once the program is compiled and linked with the dbug lib rary, it can be run nor-
mally. By default, the program produces no debugging output. With debugging enabled,
though, di ffe rent kinds of output are available:
$ factorial 1 2 3 Regular run, no debugging
1
2
6
$ f a ctorial - #t 1 2 3 Show function call trace, note nesting
I >factorial
I <factorial
1 Regular output is on stdout
I >fact o rial
I I >factorial
I I <factor ia l Debugging output is on stderr
I <factorial
2
I >fact o ria l
I I >fact o ria l
I I I >fact o ri al
I I I <fact o r ial
I I <fact o ri al
I <factorial
6
<? fun c?
$ factorial -#d 1 2 Show debugging messages from OBUG_ PRINT()
?func? : args : a rgv[2] = 1
factorial : find : find 1 fa c t o r ial
factorial : result : result is 1
1
?func? : args : argv[3] 2
facto r ial: find : find 2 fac t orial
f acto r ial : find : fi nd 1 facto ri al
factorial : re s ult : r esult i s 1
f a ctorial : r e su lt : r e s ul t i s 2
2
The - # option controls the d bug library. It is "special" in the sense that DBUG_PUSH ( )
will accept the entire string, ignoring the leading ' -# ' characters, although you could
use a different option if you wish, passing DBUG_PU SH () just the option argument
string (this is o p t arg if you use getop t () ).
The control suing consists of a set of options and arguments. Each group of options
and arguments is separated from the others by a colon character. Each option is a single
letter, and the arguments to that option are separated from it by commas. For example:
$ myprog -#d,mem,ipc:f,check_ salary,check_ start _ date -f infile -0 outfile
The d option enables DBUG_ PR INT ( ) output, but only if the first argument string
is one of " mem" or " ipc ". (With no arguments, all DBUG_PRINT ( ) messages are
printed.) Similarly, the f option limits the function call trace to just the named functions:
check_salary () and check_ s t a r t _ d a te ().
The following list of options and arguments is reproduced from the dbug library
manual. Square brackets enclose optional arguments. We include here only the ones
we find useful; see the documentation for the full list.
d[ , k eywor d s ]
Enable output from macros with specified keywords . A null list of keywords im-
plies that all keywords are selected.
F
Mark each debugger output line with the name of the soutce file containing the
macro causing the output.
i
IdentifY the process emitting each line of debug or trace output with the process
ID for that process.
L
Mark each debugger output line with the source-file line number of the macro
causing the ourput.
0[. file]
Redirect the debugger outpur stream to the specified file. The default output
stream is stderr . A null argument list causes output to be redirected to stdout .
t[ , N]
Enable function control Bow tracing. The maximum nesting depth is specified by
N, and defaults to 200.
To round out the discussion, here are the rest of the macros defined by the dbug
library.
DBUG_EXECUTE(string, code)
This macro is similar to DBUG_PRINT ( ) : The first argument is a string selected
with the d option, and the second is code to execute:
DBUG_EXECUTE ( "abo rt", ab ort ( ) ) ;
DBUG_FILE
This is a value of type FILE *, for use with the <stdi o . h> routines. It allows
yo u to do your own ourput to the debugging file stream.
DBUG_LONGJMP(jmp_buf env, int val)
This macro wraps a call to longjmp ( ) , taking the same arguments, so that the
dbug library will know when you've made a nonlocal jump.
DBUG_POP ()
This macro pops one level of saved debugging state, as created by DBUG_PUSH ( ) .
It is rather esoteric; yo u probably won't use it.
DBUG_SETJMP(jmp_buf env)
This macro wraps a call to setjmp ( ), taking the same argument. It allows the
dbug library to handle nonlocal jumps.
In a different incarnation, at the first startup company we worked for , 16 we used the
dbug library in our product. It was invaluable during development, and by omitting
the -DDBUG on the final build, we were able to build a production version, with no
other source code changes.
To get the most benefit out of the dbug library, you must use it consistently,
throughout your program. This is easier if you use it from the beginning of a project,
but as an experiment, we found that with the aid of a simple a w k script, we could in-
corporate the library into a 30,000 line program with a few hours work. If you can afford
the overhead, it's best to leave it in the production build of your program so that you
can debug with it without first having to recompile.
We find that the dbug library is a nice complement to external debuggers such as
GDB ; it provides an organized and consistent way to apply instrumentation to C code.
It also rather nicely combines many of the techniques that we outlined separately, ear-
lier in the chapter. The dynamic function call trace feature is particularly useful, and
it proves invaluable for help in learning about a program's behavior if you're unfamiliar
with it.
15.5.2 Memory Allocation Debuggers

Ignoring issues such as poor program design, for any large-scale, practical application,
the C programmer's single biggest challenge is dynamic memory management (by
mal loc () , rea l l oc ( ) , and fr ee ( ) ).
This fact is borne out by the large number of tools that are available for debugging
dynamic memory. There is a fair amount of overlap in what these tools provide. For
example:
• Memory leak detection: memory that is allocated and then becomes unreachable.
• Unfreed memory detection: memory that is allocated but never freed. Never-freed
memory isn' t always a bug, but detecting such occurrences allows you to verify
that they're indeed OK.
• Detection of bad frees: memory that is freed twice, or pointers passed to free ( )
that didn't come from mall o c ( ) .
16 Although we should have learned our lesson after the first on e, we went to a second one. Since then we've figured
it our and generally avoid startup companies. Your mil eage may vary, of course.
• Detection of use of already freed memory: memory that is freed is being used
through a dangling pointer.
• Memory overrun detection: access ing or sroring into m em ory outside the bounds
of what was allocated.
• Warning about the use of uninitialized memory. (Many compilers can warn
about this.)
• Dynamic function tracing: When a bad memory access occurs, you get a traceback
from where the memory is used to where it was allocated.
• Tool co ntrol through the use of environment variables.
• Log files for raw debugging information that can be pos tprocessed to prod uce
useful reports.
Some rools merely log these events. Others arrange for the application program ro
die a horrible death (through SIGS EGV) so that the offending code can be pinpointed
from within a debugger. Additionally, most are designed to work well with GOB.
Some tools require so urce code modificatio n, such as calling special fun ctions, or
using a special header file , extra #de fi nes, and a static library. Others work by using
special Linux/Unix shared library mechanisms ro transparently install themselves as
replacements for the standard library versions of malloe () and fre e ( ) .
In this section we look at three dynamic m emory debuggers, and then we provide
pointers to several others.
15.5.2.1 GNUjLinux mtrace

GNU/Linux systems using GLIBC provide two functions for enabling and disabling
memory tracing at runtime:
# include <mcheck . h > CUBe
vo id mtrace (void } ;
v o id muntrace( void } ;
When mtraee () is called, the library looks at the envuonment variable

MALLOC_ TRAC E. It is expected that this names a writable file (existing or not). The library
opens the file and begins logging information about memory allocations and frees. (No
logging is done if the file can't be opened. The file is truncated each time the program
runs.) When munt ra ee () is called, the library closes the file and does not log any further
allocations or frees.
The use of separate functions makes it possible to do memory tracing for specific
parts of the program; it's not necessary to trace everything. (We find it most useful to
enable logging at the start of the program and be done, but this design provides flexibil-
iry, which is nice to have.)
Once the application program exits, you use the mt ra ee program to analyze the log
file. (The log file is ASCII, but the information isn't directly usable.) For example, gawk
turns on tracing if TIDYMEM is defined:
$ export TIDYMEM=l MALLOC_ TRACE=mtrace.out Export environment variables
$ ./gawk 'BEGIN {print "hello, world"}' Run the program
hello , world
$ mtrace ./gawk mtrace.out Generate report
Memory not f r eed :
Address Size Caller

Ox 08085858 Ox 20 at /home/arnold/Gnu/gawk/gawk-3 . 1 . 3/main . c : l102
Ox 08085880 Ox c80 at /home/arnold/Gnu/gawk/gawk-3 .1 .3/node . c:398
Ox 08086508 Ox2 at /home/arnold/Gnu/gawk/gawk-3 . 1 . 3/node . c:337
Ox08086518 Ox6 at Ihome/arnold/Gnu/gawk/gawk-3 . 1 . 3/node . c : 337
Ox 08086528 OxlO at Ihome/arn old/Gnu/ga wk/gawk-3.1 . 3/eval . c : 2082
Ox 08086550 Ox3 at /home/arnold/Gnu/g a wk/gawk -3 . 1 . 3/node . c : 337
Ox 08086560 Ox 3 at Ihome/a r nold/Gnu/gawk/gawk-3.1 . 3/node . c : 337
Ox 080865eO Ox 4 at Ihome/arnold/Gnu/gawk/gawk-3 . 1 . 3/field . c : 76
Ox08086670 Ox78 at Ihome/ a r n old/Gnu/gawk/gawk-3 . 1 . 3/awkgram . y : 1369
Ox 08086700 Ox e at /home/arn old/Gnu/gawk/gawk -3 . 1 . 3/node . c : 337
Ox 08086718 Oxlf at Ihome/arnold/Gnu/gawk/gawk -3 . 1 . 3/awkgram . y : 1259
The output is a list of locations at which gawk allocates memory that is never freed.
Note that permanently hanging onto dynamic memory is fine if it's done on purpose.
All the cases shown here are allocations of that sort.
15.5.2.2 Electric Fence

In Section 3.1, "Linux/Unix Address Space," page 52, we described how dynamic
memory comes from the heap, which can be made to grow and shrink (with the b r k ( )
or sbrk () calls , described in Section 3.2.3, "System Calls: b r k () and sbrk ( ) ,"
page 75).
Well, the picture we presented there is a simplified version of reality. More advanced
system calls (not covered in this volume) make it possible to add additional, not neces-
sarily contiguous, segmeim of memory into a process's address space. Many malloe ( )
High Address
Program Stack
STACK SEGMENT
Stack grows downward
Hole - Don't touch here l
Malloc'd memory
Hole - Don't touch here l
Malloc'd memory
Hole - Don 't touch here l
Heap grows upward
Heap
BSS DATA SEGMENT
Data
Low Address
Executable code
(shared)
TEXT SEGMENT
FIGURE 15.1
Linux/ Unix process address space, including special areas
debuggers work by using these system calls to add a new piece of address space for every
allocation. The advantage of this scheme is that the operating system and the computer's
memory-protection hardware cooperate to make access to memory outside these discon-
tiguous segments invalid, generating a SIGSEGV signal. The scheme is depicted in
Figure 15.1.
T he first debugging package to implement this sch eme was Electric Fence. Electric
Fence is a drop-in replacement for malloc () et al. It works on m any Unix sys tems
and GNU/Linux; it is available from its author's FTP archive. 17 Many GNU/Linux
distributions supply it, although you may have to choose it explicitly when you install
your system.
Once a program is linked with Electric Fence, any access that is out of bounds gen-
erates a S IGSEGV . Electric Fence also catches attempts to use memory that has already
been freed. Here is a simple program that illustrates both problems:
1 /* ch15-badmeml.c --- d o bad thing s wi th memo ry * /
2
3 #include <s tdio.h>
4 #include <stdlib.h>
5
6 int main ( int argc , char **argv )
7
8 char *p ;
9 int i ;
10
11 p = malloc ( 3 0) ;
12
13 strcpy (p , "not 30 by tes " ) ;
14 pr intf ( "p <%s>\ n", p);
15
16 i f (a r gc == 2 ) {
17 if (strcmp (argv[l), "-b" ) == 0)
18 p[42) = 'a'; / * touch outside the bounds * /
19 else if (st r cmp (argv[lJ, "-f") == 0)
20 fr ee (p) ; /* f r ee memory and then u se i t */
21 p [0) = ' b' ;
22
23
24
25 /* fr ee (p) ; */
26
27 return 0 ;
28
This program does simple command-line option checking to decide how to misbe-
have: - b touches m emory out of bounds, and - f attempts to use freed memory. (Lines
18 and 21 are the dangerous ones, respectively.) Note that with no options, the pointer
is never freed (line 25); Electric Fence doesn't catch this case.
One way to use Electric Fence, a way that is guaranteed to work actoss Unix and
GNU/Linux systems, is to statically link your program with it. The program should
then be run from the debugger. (The Electric Fence documentation is explicit that
17 ft p: // ft p . perens.com/pub/Elect ri cFence
l5 .5 Debugging Tools 617
Electric Fence should not be linked with a production binary.) The followi ng session
demo nstrates this procedure and shows what happens fo r both command-line options:
$ cc -g ch15-badmeml.c -lefence -0 c h15-badmeml Comp ile; link statically
$ gdb ch15-badmeml Run it from the debugger
GNU gdb 5.3
( gdb) run -b Try -b option

St arting program : / home/arnold/ progex/code/ ch1 S/ch1S-badmem1 -b
[New Thread 81 92 (LWP 28021 )]
El e ctric Fence 2 . 2 . 0 Copyr ight (C) 198 7 -1 999 Bruce Pe r e ns <bruce@perens . com>
p = <not 30 byte s>
Prog r am r eceived si gnal SIGSEGV , Segmenta tion fa ult . SIGSEG V: GOB prints where
[Sw itching to Thr ead 8192 (LWP 28021)]
Ox08048Sb6 i n main (argc=2 , argv=Oxbffff8 a4) a t chlS-badmeml . c : 18
18 p[ 42] = 'a '; / * touch out side the bounds * /
(gdb) run - f Now try the -f option
The pr ogram being debugg e d has been s tart ed alre ady.
Start it from the beginn ing? (y or n ) y Yes, really
Starting p r ogram : /home/ arnold/progex /code/chlS/c h lS - badme ml -f

[N e w Thread 8192 (LWP 28024 )]
Electric Fe nce 2 . 2 . 0 Copyright (C ) 1987-1999 Bruce Perens <bruce@perens.com>

p = <noc 30 bytes >
Program re cei ved signa l SIGSEGV , Segmentation fault . SIGSEGVagain

[ Switch ing t o Thread 8 192 (LWP 2802 4)]
Ox08048Se8 i n main (argc =2, argv=O xbffff 8a4) at chlS-badmem1 . c :2 1
21 prO] = 'b' ;
On systems that support shared libraries and the LD_P RE LOAD environment variable
(including GNU/Linux), you don't need to explicitly link in the e fence library. Instead,
the ef shell script arranges to run the program with the proper setup.
Although we haven't described the mechanisms in detail, GNU/Linux (and o ther
Unix systems) support shared libraries, special versions of the library routines that are
kept in a single file on disk instead of copied into every single executable program's bi-
nary file. Shared libraries save som e space on disk and can save system memory, since
all programs using a shared library use the same in-memory copy of the library. The
cost is that program startup is slower because the program and the shared library have
to be hooked together before the program can start running. (This is usually transparent
to yo u, the user.)
The LD_PRELOAD environment variable causes the system's program loader (which
brings executable files into memory) to link in a special library before the standard li-
braries. The ef script uses this feature to link in Electric Fence's version of the malloc ( )
suite. Thus, relinking isn't even n ecessary. This example demonstrates ef:
$ cc -g ch15-badmeml.c -0 ch15 - badmeml Compile normally
$ ef ch15-badmeml -b Run using ef, dumps core
Electric Fence 2 . 2 . 0 Copyright (C) 1987-1999 Bruce Perens <bruce@perens.com>

p = <not 30 bytes >
/usr/bin/ef : line 20 : 28005 Segmentation fault (core dumped )
( export LD_PRELOAD =libefence . so . O. O; exec $* )
$ ef chlS-badmernl -f Run using ef, dumps core again
Electric Fence 2 . 2 . 0 Copyright (C) 198 7-1999 Bruce Perens <bruce@perens . com>
p = <not 30 bytes>
/us r /bin/ef: line 20 : 28007 Segmentation fault (co re dumped )
( export LD_PRELOAD=libefence.s o . O. O; exe c $*
$ Is -1 core· Linux gives us separate core files
-rw------- 1 arnold devel 217088 Aug 28 15 : 40 core.28 00 5
-rw --- --- - 1 arn old devel 212992 Aug 28 15:40 core . 28007
GNU/Linux creates core files that include the process ID number in the file name.
In this instance this behavior is useful because we can debug each core file separately:
$ gdb chlS-badmeml core .28005 From the ·b option
GNU g db 5 . 3
Core was generat ed by 'ch15 -ba&neml -b' .

Prog ram terminated with signal 11, Segmentation fault .
#0 Ox 08048466 in main (argc =2, argv= Oxbffff8c4 ) at ch15-badmeml.c : 18

18 p[42] = ' a'; /* touch outside the bounds * /
(gdb) quit
$ gdb chlS-badmeml core.28007 From the -f option

GNU gdb 5. 3
Core was generated by 'ch15-ba&neml -f' .

Progr am terminated with signal 11, Segmentati on fault .
#0 Ox08048498 in main (argc=2 , argv=Oxbffff8c4) at ch15-badmeml . c :2 1

21 prO] = 'b';
The efence(3) m anpage describes several environment variables that can be set to
tailor Electric Fence's behavior. The following three are the most notable.
EF_PROTECT BELOW
Setting this variable to 1 causes Electric Fence to look for underruns instead of
overruns. An overrun, accessing memory beyond the allocated area, was demon-
15.5 Debuggin g Tools 619
strated previously. An underrun is accessing memory located in front of the allo-

cated area.
EF PROTECT_FREE
Setting this variable to 1 prevents Electric Fence from reusing memory that was
correctly freed . This is helpful when yo u think a program may be accessing freed
memory; if the freed memory was subsequently reallocated, access to it from a
previously dangling pointer would otherwise go undetected.
EF_ ALLOW_MALLOC 0
When given a nonzero value, Electric Fence allows calls of 'malloe (0)' . Such
calls are technically valid in Standard C, but likely represent a sofrware bug. Thus,
by default, Electric Fence disallows them.
In addition to environment variables, Electric Fence supplies similarly named global

variables. You can change their values from within a debugger, so you can dynamically
alter the behavior of a program that has already started running. See efence(3) for
the details.
15.5.2.3 Debugging Malloe: dmall o c

The dmall oe library provides a large number of debugging options. Its author is
Gray Watson, and it has its own web site. 18 As with Electric Fence, it may already be
installed on yo ur system, or you may have to retrieve it and build it yo urself.
The dmall oe library examines the DMALLOC_ OPTI ONS environment variable for
control information. For example, it might look like this:
$ echo $DMALLOC_ OPTIONS
debug =Ox 4e40503, i nt er =100,log =dm- log
The 'debug ' part of this variable is a set of OR'd bit flags which is nearly impossible
for most people to manage directly. Therefore, the documentation describes a rwo-stage
process for making things easier to use.
The first step is to define a shell function named dmall oc that calls the dmalloc
driver program:
18 h ttp : //www . dma lloc . com

$ dmalloc () {
> eval ' command dmalloc -b $* ' The 'command' command bypasses shell functions
> }
Once that's done, you can pass options to the function to set the log file (-1), specifY
the number of iterations after which dmalloe should verifY its internal data structures
(- i), and specifY a debugging level or other tag ('low') :
$ dmalloc -1 dm-log -i 100 low
Like Electric Fence, the dmall oe library can be statically linked into the application
or dynamically linked with LD_PRELOAD , The following example demonstrates the
latter:
$ LD_ PRELOAD=libdmalloc.so ch15-badmeml -b Run with checking on
p = <n o t 30 byte s> Normal output shown
* NOTE Do not use 'export LD_PRELOAD=li bdma lloc . so'! If you do, every
1 program you run , such as ls, w ill run with mallo e () checking turned on . Your
'" system will become unusable, quickly. If you do this by accident, you can use
. 'unset LD_PRELOAD ' to restore normal behavior.
"
The results go into the dm- I o g file , as specified:

$ cat dIn-log
10620781 74 : 1 : Dmalloc v e r s i o n '4 . 8.1 ' fr om ' http : //dmalloc .com /'
1 062078 1 7 4 : 1 : flags = Ox4e4 05 0 3 , logfil e 'dm- log'
106207817 4 : 1 : interv al = 1 00, addr = 0, seen # = 0
1062078174 : 1 : starting time = 1062078174
1062 0781 74 : 1 : free buc ke t count/b its : 63/6
1062078174 : 1 : basic-blo c k 4096 bytes, alignmen t 8 bytes, he ap grows up
1 06207817 4 : 1 : heap : Ox8 04a OOO to Ox 804dOOO , size 1 2288 byt es (3 blocks)
10 620781 74 : 1: heap checked 0
1 062078 1 7 4 : 1 : alloc calls: mallo c 1, ca lloc 0 , real loc 0, free 0
1062078174 : 1 : a110c calls: recallo c 0, memalign 0, valloc 0
1 06207817 4: 1: total memo r y allo cated : 30 byte s ( 1 pnts )
1 062078 17 4: 1: max i n use at one time : 30 by tes ( 1 pnts )
1 0620 7 8 174 : 1 : ma x a l loced wi th 1 c al l : 30 b ytes
106 2 078174 : 1 : ma x a l lo c rounding loss : 34 bytes (53% )
1 062078 174 : 1: max memory space was te d : 39 98 byt es ( 9 8%)
10 62 078174 : 1 : fina l user memory s pace : ba s ic 0 , divided 1 , 4062 bytes
10 6 2078174 : 1 : fi nal admin o verhead : b asic 1, divided 1 , 8192 bytes ( 66% )
106 2 078174 : 1 : fin al external s pace : 0 bytes (0 blocks )
1 0 62 0 7 8174 : 1 : top 10 allo cations:
10620781 7 4 : 1 : tota l-size count in - use- si ze coun t sourc e
1062 0781 74 : 1 : 30 1 30 1 ra= Ox 804841 2
106 2 0 7 8174 : 1 : 30 1 30 1 To t a l of 1
106 2 078 1 74 : 1 : dump ing not-f reed pointers c hanged sinc e 0 :

10 62078174 : 1: no t: freed : 'O x 80 4c008Is1' ( 30 bytes ) f r om ' r a= Ox80484 12,
1062078174 : 1: total -size co un t source
10620 7817 4 : 1: 30 1 ra=Ox80 484 12 Allocation is here
10 62078174 : 1: 30 1 Total o f 1
106 2078174 : 1 : u nknown memo r y : 1 p ointer , 3 0 by tes
106207 8174 : 1 : end i ng time = 1 0 62 078174, e l ap s ed since star t = 0 : 00 : 00
The output includes many statistics, which we 're not interested in at the m oment.
The line that is interesting is the one indicating memory that wasn't freed, with a return
address indicating the fun ctio n that allocated the memOlY (' ra =Ox8 048 41 2') . The
dmal l oc documentation explains how to get the source code location for this address,
using GOB:
$ gdb ch15-badmernl Start GOB
GNU gdb 5 . 3
( gdb) x Ox8048412 Examine address

Ox804 84 12 <main+26 > : Ox8910c 48 3
( gdb ) info line * (Ox8048412) Get line information
Li ne 11 of "ch15 -badmeml.c " s tarts at addre ss Ox 804840 8 <mai n+16>
a nd ends at Ox 80484 18 <mai n+ 32> .
T his is painful, but workable if you have no other choice. However, if you include
the "dma 11 oc . h " header file in yo ur program (after all other # incl ude statements) ,
you can get source code information directly in the report:
10 62080258 : 1 : top 10 a llocation s :

106 2 0 802 58 : 1: tot a l-s ize count in-use- size c ount s o ur ce
10 620 80258 : 1: 30 1 30 1 c h 15 - badmem2 . c : 13
10 620 802 58 : 1: 30 1 30 1 To ta l of 1
10 6208025 8 : 1 : d ump ing not-f re e d p o i nter s changed si n ce 0 :
1 0620 802 58 : 1: not f r eed : ' Ox8 04c 008Is 1 ' (30 by tes ) fro m 'ch15- badmem2 . c : 13 '
1 0620 80258 : 1: total -s iz e coun t sour ce
10 62 0 8025 8 : 1: 30 1 ch15-badme m2 . c : 13
10620 802 58 : 1: 30 1 Total o f 1
(The ch15 -badmem2 . c file is the same as c h15-badmeml. c, except that it includes
"dma11oc. h", so we haven't bothered to show it.)
Individual debugging features are enabled or disabled by the use of tokens- specially
recognized identifiers-and the -p option to add a token (feature) or -m option to re-
move one. There are predefined combinations, 'low', 'me d', and 'high' . You can see
what these combinations are with 'dma11 oc -LV' :
$ dmalloc low Set things to low

$ dmalloc -Lv Show settings
Debug Malloc Utility : http : //dmalloc . com/
For a list of the command-line options enter: dmalloc - - u sage
Debug - Flags Ox4e40503 (82052355) (low) Current tokens
log-stats, log-non-free, log-bad-space, log-elapsed-time, check-fence,
free-blank, error-abort, alloc-blank, catch-null
Address not-set
Interval 100
Lock-On not-set
Logpath 'log2'
Start-File not-set
The full set of tokens, along with a brief explanation and each token's corresponding
numeric value, is available from 'drnall oc -DV':
$ dmalloc - DV
Debug Tokens:
none (nil) -- no functionality (0)
log-stats (1st) -- log general statistics (Oxl)
log-non-free (lnf) - - log non-freed pointers (Ox2 )
log-known (lkn) log only known non-freed (Ox4)
log-trans (ltr) -- log memory transactions (Ox8)
log-admin (lad) -- log administrative info (Ox20)
log-blocks (lbl) - - log blocks when heap-map (Ox40)
log-bad-space (lbs) -- dump space from bad pnt (OxIOO)
log-nonfree-space (Ins) -- dump space from non-freed pointers (Ox200)
log-elapsed-t i me (let) -- log elapsed-time for allocated pointer (Ox40000)
log-current-time (lct) -- log current-time for allocated pointer (Ox80000)
check-fence (cfe) -- check fence-post errors (Ox400)
check-heap (che) -- check heap adm structs (Ox800 )
check-lists (cli) check free lists (OxlOOO)
check-blank (cbl) -- check mem overwritten by alloc-blank, free-blank (Ox2000)
check-funcs (cfu) -- check functions (Ox4000 )
force-linear (fli) -- force heap space to be linear (OxlOOOO)
catch-signals (csi) -- shutdown program on SIGHUP, SIGINT, SIGTERM (Ox20000)
realloc-copy (rco) -- copy all re-allocations (OxlOOOOO )
free-blank (fbI) -- overwrite freed memory space with BLANK_CHAR (Ox200000)
error-abort (eab) -- abort immediately on error (Ox400000 )
alloc-blank (abl) -.- overwrite newly alloced memory with BLANK_CHAR (Ox800000)
heap-check-map (hcm) -- log heap - map on heap-check (OxlOOOOOO )
print-messages (pme) -- write messages to stderr (Ox2000000)
catch-null (cnu) -- abort if no memory available (Ox4000000 )
never-reuse (nre) -- never re-use freed memory (Ox8000000)
allow-free-null (afn ) -- allow the frees of NULL pointers (Ox20000000 )
error-dump (edu) -- dump core on error and then continue (Ox40000000 )
By now you should have a feel for how to use drnallo c and its flexibility. drnall oc
is overkill for our simple demonstration program, but it is invaluable fo r a larger scale,
real-world application.
15.5.2.4 Valgrind: A Versatile Tool

The tools described in the previous section all focus on dynamic memory debugging,
and indeed this is a significant problem area for many programs. However, dynamic
memory problems aren't the only kind. The CPL-licensed Valgrind program catches
a large variety of problems, including those that arise from dynamic memory.
The Val grind manual describes the program as well or better than we can, so we'll
quote (and abbreviate) it as we go along.
Valgrind is a Rexible tool for debugging and profiling Linux-x86 executables.

The tool consists of a core, which provides a synthetic x86 CPU in software,
and a series of "skins", each of which is a debugging or profiling tool. The
architecture is modular, so that new skins can be created easily and withom
disturbing the existing structure.
The most useful "skin" is memeheek:
The memehe e k skin detects memory-management problems in your programs.

All reads and writes of memory are checked, and calls to mallo e/ne w/
free/de le t e are intercepted. As a result, meme he e k can detect the following
problems:
• Use of uninitialized memory.
• Reading/writing memory after it has been f ree'd.
• Reading/writing off the end of ma lloe ' d blocks.
• Reading/writing inappropriate areas on the stack.
• Memory leaks-where pointers to mall oe 'd blocks are lost forever.
• Mismatched use of malloe/new/new [ 1 vs free/delete/dele te [ 1.
• Some misuses of the POSIX pthreads API.
Problems like these can be difficult to find by other means, often lying unde-
tected for long periods, then causing occasional, difficult-to-diagnose crashes.
Other skins are more specialized:
• c a chegr ind performs detailed simulation of the I1, D 1, and L2 caches

in your CPU and so can accurately pinpoint the sources of cache misses
in your code.
• The addrcheck [skin] is identical to memcheck except for the single detail
that it does not do any uninitialized-value checks. All of the other
checks-primarily the fine-grained address checking-are still done. The
downside of this is that you don't catch the uninitialized-value errors that
memcheck can find.
But the upside is significant: Programs run about twice as fast as they do
on memcheck, and a lot less memory is used. It still finds reads/writes of
freed memory, memory off the end of blocks and in other invalid places,
bugs which you really want to find before release!
• helgrind is a debugging skin designed to find data races in multithreaded
programs.
Finally, the manual notes:

Valgrind is closely tied to details of the CPU, operating system and to a
lesser extent, compiler and basic C libraries. This makes it difficult to make
it portable, so we have chosen at the outset to concentrate on what we believe
to be a widely used platform: Linux on x86s. Valgrind uses the standard Unix
'. / configure', 'make', 'make install' mechanism, and we have attempted
to ensure that it works on machines with kernel 2.2 or 2.4 and glibc 2.1.x,
2.2.x or 2.3.1 . This should cover the vast majority of modern Linux instal-
lations. Note that glibc-2.3.2+, with the NPTL (Native POSIX Thread Li-
brary) package won't work. We hope to be able to fix this, bur it won't
be easy.
If you're using GNU/Linux on a different platform or if you're using a commercial
Unix system, then Valgrind won' t be of much help to you. However, as x86
GNU/Linux systems are quite common (and affordable) , it's likely you can acquire one
on a moderate budget, or at least find one to borrow! What's more, once Valgrind has
found a problem for you, it's fixed for whatever platform your program is compiled to
run on. Thus, it's reasonable to use an x86 GNU/Linux system for development, and
some other commercial Unix system for deployment of a high-end product. 19
19 Increasingly, GNU/Linux is being used for high-end product deployment, too!

Although the Valgrind manual might lead you to expect that there are separate
commands named memcheck, addrc he ck, and so on, this isn ' t the case. Instead, a
driver shell program named v a lgrind runs the debugging core, with the appropriate
"skin" as specified by the - - s kin= option. The default skin is memche ck; thus, running
a plain valgrind is the same as 'valg rind -- skin=memcheck' . (This provides com-
patibility with earlier versions of Val grind that only did m emory checking, and it also
makes the most sense, si nce the memche ck skin provides the most information. )
Valgrind provides a number of options. We refer yo u to its documentation for the
full detai ls. The op tions are split into groups ; of those tha t ap ply to the core (that is,
work for all skins), these are likely to be m ost useful:
--gdb-att a ch=no !yes

Start up with a GOB attached to the process , for interac tive debugging. The default
IS no.
- -help
List the options.
-- l ogfile=fil e
Log m essages to f i l e. pid.
-- num - callers =number
Show num callers in stack traces. The default is 4.
-- s kin = skin
Use the skin named skin. Default is memchec k.
-- trac e- ch ildren=no! ye s
Also run the trace o n child processes . The default is no.
-v, --v erbose
Be more verbose. This includes listing the libraries that are loaded, as well as the
co unts of all the different kinds of errors.
Of the o ptions for the memche c k skin , these are the ones we think are most useful:
- -leak-chec k=no!yes
Find memory leaks once the program is finished. The default is 'no ' .
--show-reachable=nolyes
Show reachable blocks when the program is finished. If -- show- reaehable =yes
is used, Valgrind looks for dynamically allocated memory that still has a pointer
pointing to it. Such memory is not a memo ry leak, but it may be useful to know
about anyway. The default is 'no' .
Let's take a look at Valgrind in action. Remember ehlS - badmem . e ? (See Sec-
tion 15.5.2.2, "Electric Fence," page 61 4.) The -b option writes into memory that is
beyond the area allocated with mall oe () . Here's what Valgrind reports:
$ val grind ch15-badmeml -b
1 == 8716= = Memcheck, a.k.a. Valgrind, a memory error detector for x86-linux .
2 ==8716== Copyright (C ) 2002-2003 , and GNU GPL'd, by Julian Seward .
3 == 8716== Using valgrind-20030725, a program supervision framework for x86-linux.
4 == 8716== Copyright (C ) 2000-2003, and GNU GPL'd , by J ulian seward .
5 ==8716=~ Estimated CPU clock rate is 2400 MHz
6 == 8716== For more details, rerun with : -v

7 ==8716= =
8 p = <not 30 bytes>
9 ==8716== Invalid write of size 1
10 ==8716== at Ox8048466 : main (ch15-badmeml . c:18 )
11 ==87 1 6== by Ox420158D3: __ libc_ start_ main (in / lib / i686 / libc-2 . 2 . 93.so )
12 ==8716== by Ox8048368 : (within /home/arnold/progex / code/ch15 / ch15-badmem1 )
13 ==8716== Address Ox41 04804E is 12 bytes after a block of size 30 alloc'd
14 ==8716== at Ox40 025488: malloc (vg_replace_malloc . c : 153 )
15 ==8716== by Ox8048411: main ( ch15-badmeml . c : l1 )
16 ==8716== by Ox420158D3 : __ libc_start_main (in / lib/i686/libc-2 . 2 . 93 . so )
17 ==8716== by Ox80483 68 : (wi thin /home / arnold/progex / code/ ch15 / ch15-badmeml)
18 ==8716==
19 ==8716== ERROR SUMMARY : 1 errors from 1 contexts (suppressed : 0 from 0)
20 ==8716 == malloc / free : in use at exit : 30 bytes in 1 blocks .
21 ==8716== malloc / free : 1 allocs, 0 frees, 30 bytes allocated .
22 ==8716== For a detailed leak analysis, rerun with: --leak-check=yes
23 == 8716== For counts of detected errors, rerun with : -v
(Line numbers in the output were added to aid in the discussion.) Line 8 is the output
from the program; the others are all from Valgrind, on standard error. The error report
is on lines 9-1 7. It indicates how many bytes were incorrectly written (line 9) , where
this happened (line 10), and a stack trace. Lines 13-17 describe where the memory was
allocated from. Lines 19-23 provide a summary.
The - f option to chl 5 - badmeml frees the allocated memory and then writes into
it through a dangling pointer. Here is what Valgrind reports for this case:
$ valgrind ch15-badmeml - f
==8719 == Memchec k, a. k.a. Valg rind, a memory err or det ector f or x86-linux .
p = <no t 30 byte s>

== 871 9== Invalid writ e of size 1
==8 719== at Ox80 484 98 : main (ch15 -badmem1 . c : 21)
==871 9== by Ox42015 8D3: __ l ibc_star t_main ( in / lib/i 686 /1ibc-2 .2.93 . so )
==8719== by Ox8 0 4 8368 : (within / home / arnold / progex / code / ch15 /c h1 5-badmem1 )
==8 719= = Addr ess Ox4 10480 24 is 0 bytes inside a blo ck of siz e 30 free 'd
==8 719== at Ox4 0025722 : free (vg_rep1ac e_mall oc . c : 220)
== 8719== by Ox80 48491 : main ( ch15 -badmem1 . c : 20)
==8 719== by Ox4 20158D3 : __ libc_start_main (in /lib/i6 86/l ibc-2 . 2 . 93 . so)
==8 719== by Ox80483 68 : (with in / home/arnold/progex/code/ch15/ch1S-badmem1 )
This time the re port indicates that the write was to free d memo ry and that the call
to free () is on line 20 of ch 15-badmeml. c .
When called with no options, ch15-ba dmeml. c allocates memory and uses it but
does not release it. The --leak-check =yes option reports this case:
$ valgrind --leak-check=yes ch15-badmeml
1 ==8720== Memcheck, a.k . a . Valgrind , a memory erro r detec to r for x8 6-linux .
8 p = <not 30 bytes>
9 == 8720= =
10 == 8720 == ERROR SUMMARY : 0 error s from 0 contexts ( s uppressed : 0 fr om 0)
11 ==8720== malloc /free : in use at e xit : 30 bytes in 1 blocks .
12 ==87 20= = malloc/ free : 1 all ocs , 0 frees, 30 bytes all ocated.
16 == 8720= =
17 == 8720== 30 byt es in 1 blocks are d efini tely lo st in loss reco rd 1 o f 1
18 ==872 0 == at Ox4 002 5488: mall oc (vg_r eplace_ma lloc . c : 153)
19 ==8720== by Ox80484 11 : ma in (c h15 -badmem1 . c : 11)
20 ==8720== by Ox420158 D3 : __ libc_start_main (in /l ib/i 686/libc-2 . 2 . 93.so)
21 ==8720 == by Ox80483 68 : (within / home/arnold /progex/c ode /ch15/ch1 S - badmem1)
22 ==8720==
23 ==87 20= = LEAK SUMMARY :
24 == 8720 == defin itely lost : 30 byt es in 1 blocks .
25 == 8720== possibly lo st: 0 byt es in 0 blocks .
26 ==87 20 == still reac hable: 0 byt es in 0 blocks .
27 ==8720== suppre ssed : 0 bytes in 0 blocks .
28 ==87 20 == Reachabl e bloc ks (those to which a pointer was found) are not shown .
29 ==8720== To see them, rerun with: - - show-re a chab le=ye s
Lines 17-29 provide the leak report; the leaked memory was allocated on line 11 of
ch15 - b admeml . c .
Besides giving reports on misuses of dynamic memory, Valgrind can diagnose uses
of uninitialized memory. Consider the following program, ch15-badmem3 . c:
1 /* ch15 -badmem3 . c - - - do bad things with nondynami c memory * /

2
3 #include <s tdio . h>
4 #include <s tdli b . h>
5
6 int main( int argc, cha r **argv)
7
8 int a _ var; /* Both of these are uninitial ized * /
9 int b_ var;
10
11 /* Valgrind won ' t f lag this ; see text. * /
12 a_va r = b_var;
13
14 / * Use uninitialized memory ; this is flagg ed. * /
15 printf("a_var = %d\n", a_var);
16
17 retur n 0 ;
18 )
W hen run , Valgrind produces this (abbreviated) report:

==29650== Memcheck , a.k.a. Valgrind, a memo ry error det ec tor for x86 - linux.
== 29650= = Use of u n initi al i s e d value of si z e 4

== 29 650= = at Ox 42049 D2A : _I O_vfprintf_internal (in /lib/i686/1ibc - 2 . 2 . 93 . so )
==2 965 0== by Ox 42 0523C1 : _I O-printf (in /lib/i686/1 ibc-2 .2.93 .s o )
==29650 == by Ox 804834D: main (ch15-badme m3.c:15 )
==29650== by Ox 420158D3: _ _ libc_start_main (in / lib/ i686 /1ibc-2 . 2 . 93 . so)
==29650==
==2 9 65 0== Condi tional jump or mov e depends on uninit ia l is ed v aluer s )
==2 965 0== at Ox4 2049D32 : _IO_vfprintf_internal (in /l i b/i686/1 i bc - 2 . 2 . 93 . so)
==29650== by Ox 42 0523C1 : _IO-printf (in /lib/i686 / 1ibc - 2.2.93.so)
==2965 0== by Ox 804834D : main (ch15-badmem3. c:15 )
==29650== by Ox42 0158D3 : libc_start_ma in ( in / lib / i68 6 / 1ibc -2 . 2 . 93.s o)
a_va r = 1107341000
==29650==
==2965 0== ERROR SUMMARY: 25 e rr ors from 7 c ontexts (suppr essed : 0 f r om 0)
==29650== malloc / free: in use at exit: 0 bytes in 0 bloc ks.
==29650== malloc / free: 0 allo c s , 0 frees, 0 bytes allo ca ted.
==29650== For a d e tai led lea k a nalys is, rerun with : --leak- check=yes
==2965 0== For counts of detec ted errors , rerun with : - v
T he Valgri nd d ocumentation explains that copying of uninitialized data doesn 't

p rod uce any reports. T he memchec k skin notes the status of th e d ata (un initialized)
and keeps track of it as data are m oved around. T hus, a_ var is considered uninitialized,
since its value came from b _ var, which started o ut uninitialized.
I t is only when an un initialized val ue is used th at memcheck reports a problem. Here,
the use occurs d own in the C lib rary CrO_vfprintC internal () ), which has to
convert the value to a string; to do so it does a computation with it.
U nfo rtunately, although Valgrind can detect the use of uninitialized memory, all
the way d own to the bit level, it cannot do array b o unds checking for local and global
variables . (Valgrind can d o bounds checking for dyn amic memory since it handles such
memo ry itself and therefore knows the stan and en d of each regio n .)
In concl usion, Valgrind is a powerful memory debugging tool. It h as been used o n
large-scale, multithreaded producti o n p rograms such as KDE 3, O pen Office, and th e
Konquerer web browser. It rivals several commercial offerings, and a variant version
has even been used (together with the WINE Emulator 2o) to debug programs written
fo r M icrosoft Windows , using V isual c ++! You can get Valgrind fro m its web site. 2 1
15 .5 .2.5 Other Malloc Debuggers

Two articles by Cal Erickson in LinuxJournal describe mtraee and drnalloe, as well
as most of the other too ls listed below. T hese articles are Mem ory Leak Detection in
Embedded Systems, Iss ue 10 1,22 September 2002, and M emory Leak D etection in C ++ ,
Iss ue 110,23 June 2003. Both articles are available o n the Linux Jou rnal web site.
The other tools are similar in nature to those described earlier.
eemalloe
A mall oe () replacement lib rary that does not need special compilation and that
can be used w ith C++. See http: // www . inf . ethz . eh / persona llb iere /
p r o jects /ccmalloe .
Mark Moraes's malloc
An early but full-featured mall oe () replacem ent library that provides profiling,
tracing, and debugging features. You can ge t it from ft p : / / ft p . es. to r on -
to . edu / pub /moraes/mallo e- l . 18 . t a r . gz .
mpatrol
A highly configurable memory debuggin g and testmg package. See
h ttp: // www.cbmamiga . demon . eo . uk / mpat r ol .
20 http : //www . wineh q . com

21 http : //valgrind . kde . o r g
22http : //www .linux journal . com/arti cle .php?sid= 6059
23 http : // www.linuxjour nal . coffi/article.php?sid= 6556
memwatch
A package that requires the use of a special header file and compile-time options.
See http: //www .linkdata.se / sourcecode . html.
njamd
Not Just Another Malloc Debugger. This library doesn't require special linking
of the application; instead, it uses LD_PRELOAD to replace standard routines. See
http : //sour cefo rge.net / projects /n jamd.
yamd
Similar to Electric Fence, but with many more options. See
http : // www3 . hme. edu /-neldredge /yamd.
Almost all of these packages use environment variables to tune their behavior. Based
on the Linux Journal articles, Table 15.1 summarizes the features of the different
packages.
TABLE 15.·1
Memory tool features summary
Tool as Header Module/ Program Th read safe

eemalloe Multivendor No Program No
dmalloe Multivendor Optional Program Yes
efenee Multivendor No Program No
memwateh Multivendor Yes Program No
Moraes Multivendor Optional Program No
mpatrol M ultivendor No Program Yes
mtraee Linux (G LIBC) Yes Module No
njamd Multivendor No Program No
val grind Linux (GLIBC) No Program Yes
yamd Linux, DJGPP No Program No
As is clear, a range of choices is available for debugging dynamic memory problems.

On GNU/Linux and BSD systems, one or more of these tools are likely to already be
installed, saving yo u the trouble of downloading and building them.
It is also useful to use multiple too ls in succession on yo ur program. For example,

mtrac e to catch un freed m emory, and Electric Fence to ca tch invalid memory accesses .
15.5.3 A Modern lin t

In Original C, the compiler couldn't check whether the parameters passed in a
function call matched the parameter list in the function's definiti on ; there were no
prototypes. This often led to subtle bugs since a bad function call might produce only
mildly erroneous results , which went unnoticed during testing, or might not even get
called at all during testing. For example:
if (argc < 2)
fprintf ( " usage : %s [ optio ns] files\n" , argv [O]) ; stderr is missing
If a program containing this fragment is never invoked with the wrong number of ar-
guments, the fprintf ( ) , which is missing the initial FILE * argument, is never called.
The V7 lin t program was designed to solve such problems. It made two passes over
all the files in a program, first collecting information about function arguments and
then comparing function calls to the gathered information. Special "lint library" files
provided information about the standard library functions so that they could be checked
as well. lint also checked other questionable constructs .
With protorypes in Standard C, the need for lint is decreased, but not eliminated,
since C89 still allows old-style function declarations:
exc ern int some_func () ; Argument list unknown
Additionally, many other aspects of a program can be checked statically, that is, by
analysis of the source code text.
The splint program (Secure Programming Lint 24 ), is a modern lint replacement.
It provides too many options and facilities to list here, but is worth investigating.
One thing to be aware of is that lint-like programs can produce a flo od of warning
messages. Many of the reported warnings are really harmless. In such cases, the tools
allow you to provide special comments that indicate "yes, I know about this, it's not a
problem." splint works best when yo u provide lots of such annotati ons in yo ur code.
spl int is a powerful but complicated tool; spending so me time learning how to use
it and then using it frequen tly will help you keep your code clean.
24http : //www . splint . o rg

15.6 Software Testing

Software development contains elements of both art and science; this is one aspect
of what makes it such a fascinating and challenging profession. This section introduces
the topic of software testing, which also involves both art and science; thus, it is some-
what more general and higher level (read: "handwavy") than the rest of this chapter.
Software testing is an integral part of the software development process. It is very
unusual for a program to work 100 percent correctly the first time it compiles. The
program isn't responsible for being correct; the author of the program is. One of the
most important ways to verify that a program functio ns the way it's supposed to is to
test It.
One way to break down the different kinds of testing is as follows:
Unit tests
These are tests you write for each separate unit or functional component of your
program. As part of this effort, you may also need to write scaffolding-code de-
signed to provide enough supporting framework to run the unit as a stan-
dalone program.
It is important to design the tests for each fun ctional component when you design
the component. Doing so helps you clarify the feature design; knowing how you'll
test it helps you define what it should and shouldn't do in the first place.
Integration tests
These are tests you apply when all the functional components have been written,
tested, and debugged individually. T he idea is that everything is then hooked into
place in the overall framework and the whole thing is tested to make sure that the
interactions between the components are working.
Regression tests
Inevitably, you (or your users!) will discover problems. These may be real bugs,
or design limitations, or failures in weird "corner cases." Once you've been able
to reproduce and fix the problem, keep the original failing case as a regression test.
A regression test lets you make sure that when you make changes, you haven't
reintroduced an old problem. (This can happen easily.) By running a program
through its test suite after making a change, you can be (more) confident that
everything is working the way it's supposed to.
15.7 Debugging Rules 633
Testing sho uld be automated as much as possible. This is particularly easy to do for
non-CUI programs written in the style of the Linux/Unix (Ools: programs that read
standard inpu t or named files, and write to standard Output and standard error. At the
very least, testing can be done with simple shell scripts. More involved testing is usually
done with a separate test subdirectory and the make program.
Software testing is a whole subfield in itself, and we don 't expect to do it justice here;
rather our point is to make yo u aware that testing is an integral part of development
and often the m otivating fac(Or for using your debugging skills! Here is a very brief
summary list:
• Design the test along with the feature.

• Test boundary conditions: Make sure the feature works both inside and at valid
boundaries and that it fai ls correctly o utside them. (For example, the sqr t ( )
function has (0 fai l when given a negative argument. )
• Use assertions in yo ur code (see Section 12 . 1, "Assertion Statements: ass ert ( ) ,"
page 428), and run yo ur tests with the assertions enabled.
• Create and reuse test scaffolding.
• Save failure cases for regression testing.
• Automate testing as much as possible.
• Print a co unt of fa iled tests so that success or fai lure, and the degree of failure, can
be determined easily.
• Use code coverage tools such as gcov (0 verify that your test suite exercises all of
your code.
• Test early and tes t often.
• Study software-testing literature to Improve your ability to develop and test
software.
15.7 Debugging Rules

Debugging isn ' t a "black art." Its principles and techniques can be learned, and
co nsistently applied, by anyone. To this end, we highly recommend the book Debugging
by David]. Agans (ISBN: 0-8144-7 168-4). The book has a web site25 that summarizes
the rules and provides a downloadable poster for you to print and place on your
office wall.
To round off our discussion, we present the following material. It was adapted by
David Agans, by permission, from Debugging, Copyright © 2002 David J. Agans,
published by AMACOM,26 a division of American Management Association, New
York, New York. We thank him.
1. Understand the system. When all else fails , read the manual. You have to
know what the troubled system and all of its parts are supposed to do, if you
want to figure out why they don't. So read any and all documentation you can
get your hands (or browser) on.
Knowing where functional blocks and data paths are, and how they interact,
gives you a roadmap for failure isolation. Of course, you also have to know
your domain (language, operating system, application) and your tools (compiler,
source code debugger).
2. Make it fail. In order to see the bug, you have to be able to make the failure
occur consistently. Document your procedures and start from a known state,
so that you can always make it fail again. Look at the bug on the system that
fails , don't try to simulate the ptoblem on another system. Don't trust statistics
on intermittent problems; they will hide the bug more than they will expose
it. Rather, try to make it consistent by varying inputs, and initial conditions,
and timing.
Ifit's still intermittent, you have to make it look like it's not. Capture in a log
every bit of information you can, during every run; then when you have some
bad runs and some good runs, compare them to each other. If you've captured
enough data you'll be able to home in on the problem as if you could make it
fail every time. Being able to make it fail every time also means you'll be able
to tell when you've fixed it.
3. Quit thinking and look. There are more ways for something to fail than you
can possibly imagine. So don 't imagine what could be happening, look at
25h ttp : //www . debugg i ngru les . com

26 http://www . amacombooks . org
15 .7 Debugging Rules 635
it-put instrumentation on the system so you can actually see the failure
mechanism. Use whatever instrumentation you can-debuggers, printf ( ) s,
asser t ( ) s, logic analyzers , and even LEDs and beepers. Look at it deeply
enough until the bug is obvious to the eye, not just to the brain.
If you do guess, use the guess only to focus the search-don't try to fix it until
you can see it. If yo u have to add instrumentation code, do it, but be sure to
start with the same code base as the failing system, and make sure it still fails
with your added code running. Often, adding the debugger makes it stop failing
(that's why they call it a debugger).
4. Divide and conquer. Everybody knows this one. You do a successive approx-
imation-start at one end, jump halfway, see which way the error is from there,
and jump half again in that direction. Binary search, you're there in a few
jumps. The hard part is knowing whether you're past the bug or not. One
helpful trick is to put known, simple data into the system, so that trashed data
is easier to spot. Also, start at the bad end and work back toward the good:
there are too many good paths to explore if you start at the good end. Fix the
bugs you know about right away, since sometimes two bugs interact (tho ugh
you'd swear they can' t) , and successive approximation doesn ' t work with two
target values.
5. Change one thing at a time. If you 're trying toimprove a stream-handling
module and you simultaneously upgrade to the next version of the operating
system, it doesn't matter whether you see improvement, degradation, or no
change-you will have no idea what effect your individual changes had. The
interaction of multiple changes can be unpredictable and confusing. Don't do
it. Change one thing at a time, so you can bet that any difference you see as a
result came from that change.
If you make a change and it seems to have no effect, back it out immediately.
It may have had some effects that you didn't see, and those may show up in
combination with other changes. This goes for changes in testing as well as
in coding.
6. Keep an audit trail. Much of the effectiveness of the above rules depends on
keeping good records. In all aspects of testing and debugging, write down what
you did, when you did it, how you did it, and what happened as a result. Do
it electronically if possible, so that the record can be emailed and attached to

the bug database. Many a clue is found in a pattern of events that would not
be noticed if it wasn't recorded for all to see and compare. And the clue is
likely to be in the details that you didn' t think were important, so write it
all down.
7. Check the plug. Everyone has a story about some problem that turned out to
be "it wasn't plugged in." Sometimes it's literally unplugged, but in software,
"unplugged" can mean a missing driver or an old version of code you thought
you replaced. Or bad hardware when you swear it's a software problem. One
story had the hardware and software engineers pointing fingers at each other,
and it was neither: The test device they were using was not up to spec. The
bottom line is that sometimes you 're looking for a problem inside a system,
when in fact the problem is outside the system, or underlying the system, or
in the initialization of the system, or you ' re not looking at the right system.
Don't necessarily trust your tools, either. The tool vendors are engineers, too ;
they have bugs, and you may be the one to find them.
8. Get a fresh view. There are three reasons to ask for help while debugging.
The first reason is to get fresh insight-another person will often see something
just because they aren ' t caught up in it like you are. The second reason is to
tap expertise-they know more about the system than you do. The third reason
is to get experience-they've seen this one before.
When you describe the situation to someone, report the symptoms you 've seen,
not your theories about why it's acting that way. You went to them because
your theories aren ' t getting you anywhere-don't pull them down into the
same rut you' re stuck in.
9. If you didn't fix it, it ain't fixed. So you think it's fixed? Prove it. Since you
were able to make it fail consistently, set up the same situation and make sure
it doesn 't fail. Don' t assume that just because the problem was obvio us, it's all
fixed now. Maybe it wasn' t so obvious. Maybe your fix wasn' t done right.
Maybe your fix isn't even in the new release! Test it! Make it not fail.
.Are you sure your code is what fixed it? Or did the test change, or did some
other code get in there? Once yo u see that your fix works, take the fix out and
15.8 Suggested Reading 637
make it fail again. Then put the fix back in and see that it doesn't fail. This
step assures you that it was really yo ur fix that solved the problem.
More information about the book Debugging and a free downloadable debugging
rules poster can be found at h ttp : // W.NW . d e buggi ng r ules . com.

The following books are excellent, with much to say about both testing and debug-
ging. All but the first relate to programming in general. They're all worth reading.
1. Debugging, David J. Agans. AMACOM, New York, New York, USA 2003.
ISBN: 0-8144-7168 -4.
We highly recommend this book. Its to ne is light, and amazi ng as it sounds ,
it's fun reading!
2. Programming Pearls, 2nd edition, by Jon Louis Bendey. Addiso n-Wesley,
Reading, Massachusetts, USA, 2000. ISBN: 0-201-65788-0. See also this book's
web site. 27
Chapter 5 of this book gives a good discussion of unit testing and building test
scaffolding.
3. Literate Programming, by Donald E. Knuth. Center for the Study of Language
and Information (CSLI), Stanford U niversity, USA, 1992. ISBN:
0-9370-7380-6.
This fascinating book contains a number of articles by Donald Knuth on literate
programming-a programming technique he invented, and used for the creation
ofTEX and Metafont. Of particular interest is the article entitled "The Errors
of TEX, " wh ich describes how he developed and debugged TEX, including his
log of all the problems found and fixed.
4. Writing Solid Code, by Steve Maguire. Microsoft Press, Redmond, Washington,
USA, 1993. ISBN: 1-55615-551-4.
27http : // www . cs . bell - l a b s . com / cm /c s / pearls /

5. Code Complete: A Practical Handbook 0/ Software Construction, by Steve

McConnell. Microsoft Press, Redmond, Washington, USA, 1994. ISBN:
1-55615-484-4.
6. The Practice o/Programming, by Brian W. Kernighan and Rob Pike. Addison-
Wesley, Reading, Massachusetts, USA, 1999. ISBN: 0-201-61585-X.
15.9 Summary
• Debugging is an important part of software development. Good design and devel-
opment practices should be used to minimize the introduction of bugs, but debug-
ging will always be with us.
• Programs should be compiled without optimization and with debugging symbols
included to make debugging under a debugger more straightforward. On many
systems, compiling with optimization and compiling with debugging symbols are
mutually exclusive. This is not true ofGCC, which is why the GNU/Linux devel-
oper needs to be aware of the issue.
• The GNU debugger GDB is standard on GNU/Linux systems and can be used
on just about any commercial Unix system as well. (Graphical debuggers based
on GDB are also available and easily portable.) Breakpoints, watchpoints, and
single-stepping with next, step, and cont provide basic control over a program
as it's running. GDB also lets you examine data and call functions within
the debuggee.
• There are many things you can do when writing your program to make it easier
when you inevitably have to debug it. We covered the following topics:
• Debugging m acros for printing state.
• Avoiding expression macros.
• Reordering code to make single-stepping easier.
• Writing helper functions for use from a debugger.
• Avoiding unions .
• Having runtime debugging code in the production version of a program and
having different ways to enable that code's output.
15.10 Exercises 639
• Adding dummy functi ons to make breakpoints easier to set.

• A number of tools and libraries besides just general-purpose debuggers exist to
help with debugging. The dbug library provides a nice internal debugger that uses
many of the techniques we described, in a consistent, coherent way.
• Multiple dynamic memory debugging libraries exist, with many similar features.
We looked at three of them (mtrac e, Electric Fence, and dmalloc ), and provided
pointers to several others. The Valgrind program goes further, finding problems
related to uninitialized memory, not just dynamic memory.
• splint is a modern alternative to the venerable V7 lint program. It is available
on at least one vendor's GNU/Linux sys tem and can be easily downloaded and
built from so urce.
• Besides debugging tools, software testing is also an integral part of the software
development process. It should be understood, planned for, and managed from
the beginning of any software development project, eve n personal ones.
• Debugging is a skill that can be learned. We recommend reading the book D ebug-
ging by David J. Agans and lea rning to apply his rules.
Exercises
1. Compile one of your programs wi th Gee, using both - g an d - 0 . Run it under

GOB , setting a breakpoint in main ( ) . Single-step through the program, and
see how closely execution relates (or doesn ' t relate) to the original source code.
This is particularly good to do with code using a while or for loop.
2. Read up on GDB's conditional breakpoint feature. How does that simplify
dealing with problems that occur only after a certain number of operations
have been done?
3. Rewrite the parse_debug () function from Section 15.4.2.1 , "Add Debugging
Options and Variables," page 595, to use a table of debugging option strings,
flag values, and string lengths.
4. (Hard.) Study the gawk source code; in particular the NODE structure in a wk . h.
Write a debugging helper function that prints the contents of a NOD E based on
the value in the t ype field.
5. Take one of your programs and modify it to use the dbu g library. Compile it
first without -DDBUG , ro make sure it compiles and works OK. (Do you have
a regression test suite for it? Did your program pass all the tests?)
Once you're sure that adding the dbug library has not broken your program,
recompile it with -DDBUG. Does your program still pass all its tests? What is
the performance difference with the library enabled and disabled?
Run your test suite with the - # t option to see the function-call trace. Do you
think this will help you in the future when you have to do debugging? Why
or why not?
6. Run one of your programs that uses dynamic memory with Electric Fence or
one of the other dynamic memory testers. Describe the problems, if any, that
you found.
7. Rerun the same program, using Valgrind with leak checking enabled. Describe
the problems, if any, that you found .
8. Design a set of tests for the mv program. (Read mv(l): make sure you cover all
its options.)
9. Search on the Internet for software testing resources. What interesting things
did you find ?
In this chapter
• 16.1 Project Descri pti on page 642

641
F or the first half of this book, we tied together everything that had been present-
ed, rather neatly, by looking at the V7 ls . c. However, as much as we would
like to have it, there is no single program, small enough to present here, for tying
together the concepts and APIs presented starting with Chapter 8, "Filesystems and
Directory Walks," page 227.
16.1 Project Description

In day-to-day use, the one program that does use just about everything in the book
is the shell. And indeed, there are Unix programming books that write a small but
working shell to illustrate the principles involved.
Real shells are large and messy creatures. They must deal with many portability issues,
such as we've outlined throughout the book, and above and beyond that, they often
have to work around bugs in different versions of Unix. Furthermore, to be useful,
shells do many things that don't involve the system call API , such as maintaining shell
variables, a history of saved commands, and so on. Providing a complete tour of a full-
featured shell such as Bash, ksh9 3, or z sh would take a separate volume.
Instead, we suggest the following list of steps for writing your own shell, either as a
(large) exercise to cement your understanding or perhaps as a cooperative project if
you're in school.
1. Design your command "language" so that it will be easy to interpret with
simple code. While compiler and interpreter technology is valuable when
writing a production shell, it's likely to be overkill for you at this stage.
Consider the following points:
• Are you going to use i 18n facilities?
• What commands must be built in to the shell?
• To be useful, your shell will need a command search path mechanism,
analogous to $PATH of the regular shell. How will you set it?
• What I/O redirections do you wish to support? Files only? Pipes too? Do
you wish to be able to redirect more than file descriptors 0, 1, and 2?
• Decide how quoting will work: single and double quotes? Or only one kind?
How do you quote a quote? How does quoting interact with 110 redirections?
642
l6.1 Projecc Description 643
• How will you handle putting commands in the background? What about
waiting for a command in the background to finish?
• Decide whether you will have shell variables.
• What kind of wildcarding or other expansions will you support? How do
they interact with quoting? With shell variables?
• You should plan for at least an if and a while statement. Design a syntax.
We will call these block statements.
• Decide whether or not you wish to allow I/O redirection for a block state-
ment. If yes, what will the syntax look like?
• Decide how, if at all, your shell language should handle signals .
• Design a testing and debugging framework before you start to code.
2. If you ' re going to use il8n facilities, do so from the ourset. Retrofitting them
in is painful.
3. For the real work, start simply. The initial version should read one line at a
time and break it into words to use as separate arguments. Don't do any
quoting, I/O redirection , or anything else. Don't even try to create a new process
to run the entered program. How are you going to test what you have so far?
4. Add quoting so that individual "words" can contain whitespace. Does the
quoting code implement your design?
5. Make yo ur built-in commands work. (See Section 4.6, "Creating Files, "
page 106, and Section 8.4.1, "Changing Directory: chdir () and fchdir ( ) ,"
page 256, for at least two necessary built-in commands.) How are you going
to test them?
6. Initially, use a fixed search path, such as " I bin: lu sr I bin: l u s r I local Ibin" .
Add process creation with fork () and execution with exec () (see Chapter 9,
"Process Management and Pipes, " page 283). Starting out, the shell should
wait for each new program to finish.
7 . Add backgrounding and, as a separate command, waiting for process completion
(see C hapter 9, "Process Management and Pipes," page 283).
8. Add a user-settable search path (see Section 2.4, "The Environment," page 40).
644 Chapter 16 • A Projecr Thar Ties Everyrhing T ogerher
9. Add I/O redirection for files (see Section 9.4, "File Descriptor Management,"
page 320) .
10. Add shell variables. Test their interaction with quoting.
11 . Add wildcard and any other expansions (see Section 12.7, "Metacharacter Ex-
pansions," page 461). Test their interaction with shell variables. Test their in-
teraction with quoting.
12. Add pipelines (see Section 9.3, "Basic Interprocess Communication: Pipes and
FIFOs," page 315). At this point, real complexity starts to settle in . You may
need to take a hard look at how you're managing data that represents commands
to be run.
You could stop here with a legitimate feeling of accomplishment if you get a
working shell that can do everything mentioned so far.
13. If you're up for a further challenge, add if andlor whil e statements.
14. Add signal handling (see C hapter 10, "Signals," page 347).
15 . If you'd like to use your shell for real work, explore the GNU Readline lib rary
(type 'info readl ine' on a GNU/Linux system or see the source for the Bash
shell). This library lets you add either Emacs-style or vi-style command-line
editing to interactive programs.
Keep two things constantly in mind: always be able to test what you're doing; and
"no arbitrary limits!"
Once it's done, do a post-mortem analysis of the project. How would you do it dif-
ferently the second time?
Good luck!

1. The UNIX Programming Environment, by Brian W. Kernighan and Rob Pike.
Prentice-Hall, Englewood C liffs, New Jersey, USA, 1984. ISBN:
0-13-937699-2.
This is the classic book on Unix programming, describing the entire gestalt of
the Unix environment, from interactive use, to shell programming, to program-
mingwith the <stdio . h> functions and the lower-level system calls, to program
16.2 Sugges red Reading 645
development with make , yacc , and l e x, and documentation with nr o ff

and t r o ff .
Although the book shows its age, it is still eminently wo rth reading, and we
highly recommend it.
2. The Art o/UNIX Programming, by Eric S. Raym ond. Addison-Wesley, Reading,
Massachusetts, USA, 2004. ISBN: 0-13-1 42901-9.
This is a higher-level book that focuses on the design iss ues in Unix program-
ming: how Unix programs work and how to design yo ur own programs to fit
comfortably into a Lin ux/U ni x environment.
While we don't always agree wi th much of what the author has to say, the book
does have considerable important material and is worth reading.
Appendix A Teac h Yourself Progra mmin g in Ten Years page 649
Appendix B Caldera Ancient UNIX License page 655
Appendix C GNU General Public Lice nse page 657
647
Teach Yourself
Programming
in Ten Years
" Experience , n: Something you don 't get until just after you
need it. "
- Olivier-
This chapter is wntten by and Copyright © 2001 by Peter Norvig.
Reprinted by permission. The original article, including hyperlinks, is at
ht tp: // www.norvig . com /2 1-days . html. We have included it because we believe
that it conveys an important message. The above quote is one of our long-time favorites,
and as it applies to the point of this appendix, we've included it too .
Why Is Everyone in Such a Rush?

Walk into any bookstore, and yo u'll see how to Teach Yourself Java in 7 Days
alongside endless variations offering to teach Visual Basic, Windows , the Internet, and
so on in a few days or hours. I did the following power search at Amazon.com:
649
650 Appendix A • Teach Yourself Programming in Ten Years
pubdate : afte r 1992 a n d t i tle: days and

( title : l earn or title : teach yourself )
and got back 248 hits. The first 78 were computer books (number 79 was Learn Bengali
in 30 days). I replaced "days" with "houts" and got remarkably similar results: 253 more
books, with 77 computer books followed by Teach Yourself Grammar and Style in 24
H ours at number 78. Out of the top 200 total, 96% were computer books.
The conclusion is that either people are in a big rush to learn about computers , or
that computers are somehow fabulously easier to learn than anything else. There are
no books on how to learn Beethoven, or Quantum Physics, or even D og Groo ming in
a few days.
Let's analyze what a title like Learn Pascal in Three Days co uld mean:
• Learn: In 3 days yo u won't have time to write several significant programs, and
learn from your successes and fail utes with them. You won't have time to work
with an experienced programmer and understand what it is like to live in that
environment. In short, you won't have time to learn much. So they can only be
talking about a superficial familiarity, not a deep understanding. As Alexander
Pope said, a little learning is a dangerous thing.
• Pascal: In 3 days you might be able to learn the syntax of Pascal (if you already
knew a similar language) , but you couldn't learn much about how to use the
syntax. In short, if you were, say, a Basic programmer, you could learn to write
programs in the style of Basic using Pascal syntax, but you couldn 't learn what
Pascal is actually good (and bad) for. So what's the point? Alan Pedis once said:
"A language that doesn 't affect the way you think about programming, is not
worth knowing. " One possible point is that you have to learn a tiny bit of Pascal
(or more likely, something like Visual Basic or JavaScript) because you need to
interface with an existing tool to accomplish a specific task. But then you're not
learning how to program; you're learning to accomplish that task.
• in Three Days: Unfortunately, this is not enough, as the next section shows.
Teach Yourself Programming in Ten Years

Researchers (Hayes, Bloom) have shown it takes about ten years to develop expertise
in any of a wide variety of areas, including chess playing, music composition, painting,
piano playing, swimming, tennis, and research in neuropsychology and topology. There
Teach Yourself Programming in Ten Years 651
appear to be no real shoncuts: even Mozart, who was a musical prodigy at age 4, took
13 more years before he began to produce world-class music. In another genre, the
Bearles seemed to burst onto the scene, appearing on the Ed Sullivan show in 1964.
But they had been playing since 1957, and while they had mass appeal early on, their
first grea t critical success, Sgt. Peppers, was released in 1967. Samuel Johnson thought
it took longer than ten years: "Excellence in any department can be attai ned only by
the labo r of a lifetime; it is not to be purchased at a lesser price." And Chaucer com-
plained "the lyf so shon, the craft so long to Ierne."
Here's my recipe for programming success:
• Get interested in programming, and do some because it is fun . Make sure that it
keeps being enough fun so that you will be willing to put in ten years.
• Talk to other programmers; read other programs. This is more important than
any book or training course.
• Program. The best kind oflearning is learning by doing. To put it more technically,
"the maximal level of performance fo r individuals in a given domain is not attained
automatically as a function of extended experience, but the level of performance
can be increased even by highly experienced individuals as a result of deliberate
efforts to improve" (p. 366) and "the most effective learning requires a well-defined
task with an appropriate difficulty level for the particular individual, informative
feedback, and opportunities for repetition and corrections of errors. " (p. 20- 21)
The book Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life
is an interesting reference for this viewpoint.
• If you want, put in four years at a college (or more at a graduate school). This will
give you access to some jobs that require credentials, and it will give you a deeper
understanding of the field, but if you don't enjoy schooL you can (with some
dedication) get similar experience on the job. In any case, book learning alone
won't be enough. "Computer science education cannot make anybody an expen
programmer any more than studying brushes and pigment can make somebody
an expert painter" says Eric Raymond, author of The New Hacker's Dictionary.
One of the best programmers I ever hired had only a High School degree; he's
produced a lot of great sofrware, has his own news group, and through stock op-
tions is no doubt much richer than I'll ever be.
652 Appendix A • Teach Yourself Programming in Ten Years
• Work on projects with other programmers. Be the best programmer on some

projects; be the worst on some others. When you're the best, you get to test your
abilities to lead a project, and to inspire others with your vision. When you're the
worst, you learn what the masters do, and you learn what they don't like to do
(because they make yo u do it for them).
• Work on projects after other programmers. Be involved in understanding a program
written by someone else. See what it takes to understand and fix it when the
original programmers are not around. Think about how to design your programs
to make it easier for those who will maintain it after you.
• Learn at least a half dozen programming languages. Include one language that
supports class abstractions (like Java or C++), one that supports functional abstrac-
tion (like Lisp or ML), one that supports syntactic abstraction (like Lisp), one that
supports declarative specifications (like Prolog or C++ templates), one that supports
coroutines (like Icon or Scheme) , and one that supports parallelism (like Sisal).
• Remember that there is a "computer" in "computer science. " Know how long it
takes your computer to execute an instruction, fetch a word from memory (with
and without a cache miss) , read consecutive words from disk, and seek to a new
location on disk. (Answers below.)
• Get involved in a language standardization effort. It could be the ANSI C ++
committee, or it could be deciding if your local coding style will have 2 or 4 space
indentation levels. Either way, you learn about what other people like in a language,
how deeply they feel so, and perhaps even a little about why they feel so.
• Have the good sense to get off the language standardization effort as quickly as
possible.
With all that in mind, its questionable how far you can get just by book learning.
Before my first child was born, I read all the How To books, and still felt like a clueless
novice. 30 months later, when my second child was due, did I go back to the books for
a refresher? No. Instead, I relied on my personal experience, which turned out to be
far more useful and reassuring to me than the thousands of pages written by experts.
Fred Brooks, in his essay No Silver Bullets identified a three-part plan for finding
great software designers:
1. Systematically identify top designers as early as possible.
Answers 653
2. Assign a career mentor to be responsible for the development of the prospect

and carefully keep a career file.
3. Provide opportunities for growing designers to interact and stimulate each
other.
This assumes that some people already have the qualities necessary for being a great
designer; the job is to properly coax them alo ng. Alan Perlis put it more succinctly:
"Everyone can be taught to sculpt: Michelangelo would have had to be taught how not
to. So it is with the great programmers."
So go ahead and buy that Java book; you'll probably get some use out of it. But you
won't change your life, or yo ur real overall expertise as a programmer in 24 hours , days,
or even months.
References
Bloom, Benjamin (ed.) Developing Talent in Young People, Ballantine, 1985.
Brooks, Fred, No Silver Bullets, IEEE Computer, vol. 20, no. 4, 1987, p. 10-19.
Hayes, John R. , Complete Problem Solver, Lawrence Erlbaum, 1989.
Lave, Jean, Cognition in Practice: Mind, Mathematics, and Culture in Everyday Life,
Cambridge University Press, 1988.
Answers
T iming for various operations on a typical 1 GHz PC in summer 2001:
execute single instruction 1 nsec = (1/1 ,000,000,000) sec
fetch word from Ll cache memory 2 nsec
fetch word from main memory 10 nsec
fetch word from consecutive disk location 200 nsec
fetch word from new disk location (seek) 8,000,OOOnsec = 8msec
654 Appendix A • Teach Yourself Programm ing in Ten Years
Footnotes
This pagel also available in Japanese translation 2 thanks to Yasushi Murakawa and
in Spanish translation 3 thanks to Carlos Rueda.
T. Capey points out that the Complete Problem Solver page on Amazon now has the
Teach Yourself Bengali in 21 days and Teach Yourself Grammar and Style books under
the "Customers who shopped for this item also shopped for these items" section. I guess
that a large portion of the people who look at that book are coming from this page.
I This appendix is quoted verbatim from th e web page cited at its beginning.
2 http : // wwwl . neweb.ne . jp/wa/yamdas/column/technique / 21-daysj.html
3 http : // loro .s f . net/notes/21-dias . html
Caldera Ancient
UNIX License
655
656 Appendix B • Caldera Ancient UNIX License
Ci~
CALDERA.
240 West Ce nt er Street
Orern, Utah 84057
801-765-4999 Fax 801-765-4481
January 23, 2002
Dear UNIX® enthusiasts,

Caldera International, Inc, hereby grants a fee free li cense that inc ludes the rights lise, modify and distribute this named
source code, including crea ting derived binary products created from the source code, The source code for which Caldera
International, Inc. grants ri ghts are limited to the following UN IX Operating Systems that operate on the 16-B it PDP- II
CPU and earl y versions of the 32 -Bit U IX Operating System, with specific exclusion of UN IX System III and UN IX
System V and successor operating systems:
32-bit 32V UN IX
16 bit UNIX Versions 1,2,3, 4, 5, 6, 7
Caldera International , Inc. makes no guarantees or commitments that any source code is available fro m Caldera
International, Inc .
The following copyright noti ce applies to the so urce code files for which this li cense is gra nted.
Copyright(C) Caldera International Inc. 2001-2002. All right s rese rved.
Redistribu tion and use in source and binary forms , with or without modification, are permitted provided that the
following conditions are met:
Redistributions of source code and documentati on must retain the above copyright notice, this list of co nditions and the
following di sclaimer. Redi stribution s in binary form must reproduce the above copyri ght notice, this list of conditions
and the following disclaimer in the documentation and/or other materials provided with the distribution.
All adverti sing materials mentioning features or use of this software must display the following acknowl edgement:
This product includes software deve loped or owned by Caldera International. Inc.
Neither the name of Caldera International, Inc. nor the names of ot her contributors may be used to endorse or promote
products derived from thi s software without speci fic prior written permission.
USE OF THE SOFTWARE PROVIDED FOR UNDE R THIS LI CENSE BY CALDERA INTERNATIONAL, INC.
AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRA NT IES, INCLUDING , BUT NOT
LIMIT ED TO, THE IMPLI ED WARRANTI ES OF MERCHANTAB ILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLA IMED. IN NO EVENT SHALL CALDERA INTERNATIONAL. INC. BE LIAB LE FOR
ANY DIR ECT, INDIRECT INCIDENTAL, SPECIAL, EXEM PLARY , OR CONSEQUENTIAL DAMAG ES
(INCLUD ING , BUT NOT LIMITED TO, PROCU REMENT OF SUBSTITUTE GOO DS OR SERVI CES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUS ED AND ON ANY THEORY OF
LIABILITY , WHETHER IN CONTRACT, STRICT LIABILITY , OR TORT (IN CLUDING NEG LIGENCE OR
OTH ERWIS E) ARISING IN ANY WAY OUT OF THE USE OF TH IS SOFTWARE, EVEN IF ADV IS ED OF THE
POSSIBILITY OF SUC H DAM AGE.
Very truly yours,

/signedi Bill Broderick
Bill Broderick
Director, Licensing Services
• UN IX is a registered tradema rk of The Open Group in the US and other cou ntries.
GNU
General Public
License
Version 2, June 1991
Copyright © 1989 , 1991 Free Software Foundation, Inc.

59 Temple Place, Suite 330, Bosto n, MA 02111 , USA
Everyone is permitted to copy and distribute verbatim copies

of this license document, but changing it is no t allowed.
Preamble
T he licenses for most software are designed to take away yo ur freedom to share and
change it. By contrast, the GNU General Public License is intended to guarantee your
freedom to share and change free software-to make sure the software is free for all its
users. T his General Public License applies to most of the Free Software Foundation's
software and to any other p rogram whose authors commit to using it. (Some other Free
657
658 Appendix C • GNU General Public License
Software Foundation software is covered by the GNU Library General Public License
instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General
Public Licenses are designed to make sure that you have the freedom to distribute copies
of free software (and charge for this service if you wish), that you receive source code
or can get it if you want it, that you can change the software or use pieces of it in new
free programs; and that yo u know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you
these rights or to ask you to surrender the rights. These restrictions translate to certain
responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee ,
you must give the recipients all the rights that you have. You must make sure that they,
toO, receive or can get the source code. And you must show them these terms so they
know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you
this license which gives you legal permission to copy, distribute an d/or modify the
software.
Also, for each author's protection and ours, we want to make certain that everyone
understands that there is no warranty for this free software. If the software is modified
by someone else and passed on, we want its recipients to know that what they have is
not the original, so that any problems introduced by others will not reflect on the
original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to
avoid the danger that redistributors of a free program will individually obtain patent
licenses, in effect making the program proprietary. To prevent this, we have made it
clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
Terms and Conditions for Copying, Distribution and Modification

O. This License applies to any program or other work which contains a notice
placed by the copyright holder saying it may be distributed under the terms of
this General Public License. The "Program", below, refers to any such program
Terms and Condirions for Copying, Disrriburion and Modificarion 659
or work, and a "work based on the Program" means either the Program or any
derivative work under copyright law: that is to say, a work containing the
Program or a portion of it, either verbatim or with modifications and/or
translated into another language. (Hereinafter, translation is included without
limitation in the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not covered
by this License; they are outside its scope. The act of running the Program is
not restricted, and the output from the Program is covered only if its contents
constitute a work based on the Program (independent of having been made by
running the Program). Whether that is true depends on what the Program
does.
1. You may copy and distribute verbatim copies of the Program's source code as
you receive it, in any medium, provided that you conspicuously and appropri-
ately publish on each copy an appropriate copyright notice and disclaimer of
warranty; keep intact all the notices that refer to this License and to the absence
of any warranty; and give any other recipients of the Program a copy of this
License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may
at your option offer warranty protection in exchange for a fee .
2. You may modifY your copy or copies of the Program or any portion of it, thus
forming a work based on the Program, and copy and distribute such modifica-
tions or work under the terms of Section 1 above, provided that you also meet
all of these conditions:
a. You must cause the modified files to carry prominent notices stating that
you changed the files and the date of any change.
b. You must cause any work that you distribute or publish, that in whole or
in part contains or is derived from the Program or any part thereof, to be
licensed as a whole at no charge to all third parties under the terms of this
License.
c. If the modified program normally reads commands interactively when run,
you must cause it, when started running for such interactive use in the most
ordinary way, to print or display an announcement including an appropriate
copyright notice and a notice that there is no warranty (or else, saying that
you provide a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this License.
(Exception: if the Program itself is interactive but does not normally print
such an announcement, your work based on the Program is not required
to print an announcement.)
These requirements apply to the modified work as a whole. If identifiable sec-

tions of that work are not derived from the Program, and can be reasonably
considered independent and separate works in themselves, then this License,
and its terms, do not apply to those sections when you distribute them as sep-
arate works. But when you distribute the same sections as part of a whole which
is a work based on the Program, the distribution of the whole must be on the
terms of this License, whose permissions for other licensees extend to the entire
whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest YOut rights
to work written entirely by you; rather, the intent is to exercise the right to
control the distribution of derivative or collective works based on the Program.

In addition, mere aggregation of another work not based on the Program with
the Program (or with a work based on the Program) on a volume of a storage
or distribution medium does not bring the other work under the scope of this
License.
3. You may copy and distribute the Program (or a work based on it, under Section
2) in object code or executable form under the terms of Sections 1 and 2 above
provided that you also do one of the following:
a. Accompany it with the complete corresponding machine-readable source
code, which must be distributed under the terms of Sections 1 and 2 above
on a medium customarily used for software interchange; or,
b. Accompany it with a written offer, valid for at least three years, to give any
third parry, for a charge no more than your cost of physically performing
source distribution, a complete machine-readable copy of the corresponding
source code, to be distributed under the terms of Sections 1 and 2 above
on a medium customarily used for software interchange; or,
c. Accompany it with the information you received as to the offer to distribute
corresponding source code. (This alternative is allowed only for noncom-
Terms and Conditions for Copying, Distribution and Modi ficat ion 661
mercial distribution and only if yo u received the program in object code

or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making
modifications to it. For an executable work, co mplete source code m eans all
the so urce code for all modules it contains , plus any associated interface defini-
tion files, plus the scripts used to co ntrol compilation and ins tallation of the
executable. However, as a special exception, the source code distributed need
not include anything that is normally distributed (in either so urce or binary
form) with the m ajor components (compiler, kernel, and so on) of the operating
system on which the executable runs, unless that co mponent itself accompanies
the executable.
If distribution of executable or object code is made by offering access to co py
from a designated place, then offering equivalent access to copy the source code
from the same place counts as distribution of the source code, even though
third parties are not compelled to copy the source along with the object code.
4. You may not copy, modifY, sublicense, or distribute the Program except as ex-
pressly provided under this License. Any attempt otherwise to copy, modi fY,
sublicense or distribute the Program is void, and will automatically terminate
YOut rights under this License. However, parties who have received copies, or
rights, from you under this License will not have their licenses terminated so
long as such parties remain in full co mpliance.
5. You are not required to accept this License, since yo u have not signed it.
However, nothing else grants you permission to modifY or distribute the Pro-
gram or its derivative works. These actions are prohibited by law if you do not
accept this License. Therefore, by modifYing or distributing the Program (o r
any work based on the Program), you indicate yo ur acceptance of this License
to do so, and all its terms and conditions for copying, distributing or modifYing
the Program or works based on it.
6. Each time you redistribute the Program (o r any work based on the Program),
the recipient automatically receives a license from the original licensor to copy,
distribute or modifY the Program subj ect to these terms and conditions. You
may not impose any further restrictions on the recipients ' exercise of the rights
granted herein. You are not responsible for enforcing compliance by third
parties to this License.
7. If, as a consequence of a court judgment or allegation of patent infringement
or for any other reason (not limited to patent issues), conditions are imposed
on you (whether by court order, agreement or otherwise) that contradict the
conditions of this License, they do not excuse you from the conditions of this
License. If you cannot distribute so as to satisfY simultaneously your obligations
under this License and any other pertinent obligations, then as a consequence
you may not distribute the Program at all. For example, if a patent license
would not permit royalty-free redistribution of the Program by all those who
receive copies directly or indirectly through you, then the only way you could
satisfy both it and this License would be to refrain entirely from distribution
of the Program.
If any portion of this section is held invalid or unenforceable under any partic-
ular circumstance, the balance of the section is intended to apply and the section
as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or
other property right claims or to contest validity of any such claims; this section
has the sole purpose of protecting the integrity of the free software distribution
system, which is implemented by public license practices. Many people have
made generous contributions to the wide range of software distributed through
that system in reliance on consistent application of that system; it is up to the
author/donor to decide if he or she is willing to distribute software through
any other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a
consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in certain countries
either by patents or by copyrighted interfaces, the original copyright holder
who places the Program under this License may add an explicit geographical
distribution limitation excluding those countries, so that distribution is permit-
ted only in or among countries not thus excluded. In such case, this License
incorporates the limitation as if written in the body of this License.
NO WARRANTY 663
9. The Free Software Foundation may publish revised and/ or new versions of the
General Public License from time to time. Such new versions will be similar
in spirit to the present version, but may differ in detail to address new problems
or concerns.
Each version is given a distinguishing version number. If the Program specifies
a version number of this License which applies to it and "any later version",
you have the option of following the terms and conditions either of that version
or of any later version published by the Free Software Foundation. If the Pro-
gram does not specify a version number of this License, you may choose any
version ever published by the Free Software Foundation.
10. If yo u wish to incorporate parts of the Program into other free programs whose
distribution conditions are different, write to the author to ask for permission.
For software which is copyrighted by the Free Software Foundation, write to
the Free Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status of all
derivatives of our free software and of promoting the sharing and reuse of
software generally.
NOWARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE
IS NO WARRANTY FOR THE PROGRAM , TO THE EXTENT PERMIT-
TED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED
IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING , BUT NOT
LIMITED TO , THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK
AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU
ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR
AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR
ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE

THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL
OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMIT-
ED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE
ORLOSSES SUSTAINED BY YOU OR THIRD PARTIES ORA FAILURE
OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH H OLDER OR OTHER PARTY HAS BEEN ADVISED
OF T H E POSSIBILITY OF SUCH DAMAGES.
END OFTERMS AND CONDITIONS
How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to
the public, the best way to achieve this is to make it free software which everyone can
redis tribu te and ch ange under these terms .
To do so, attach the fo llowi ng notices to th e program. It is safest to attach them to
the start of each source file to most effectively convey the exclusion of warranty; and
each file should have at least the" copyright" line and a pointer to where the full notice
is fo und.
on e line to give the program 's n ame and a n i dea of wha t i t does .
Copyr ight (C) y ear name of author
Thi s p r ogram is fre e softwa re; you can redis tribute it a nd / or

mod ify i t under the terms o f the GNU General Public License
a s publi shed b y the Free So ftware Foundation; e i ther versi on 2
o f the License, o r (a t your option ) any later v ersion .
This program is distribut e d in the hope that it wi ll be useful,

buc WITHOUT ANY WARRANTY; without eve n the impli e d warranty o f
MERCHANTAB ILI TY or FI TNESS FOR A PARTI CULAR PURPOSE. See the
G~~ Gener a l Public Lic ense for mor e detai ls .
You should have receive d a copy of the GNU General Public Lic ense
along with this program; if not, wri te to the Free Software
Foundation, Inc . , 59 Temple Place, Sui te 330, Bo s ton, MA 02111, USA.
Example Use 665
Also add information on how to contact yo u by electronic and paper mail.

If the program is interactive, make it o utput a short no tice like this when it starts in
an interactive mode:
Gnomov i sion versio n 69, Copyrigh t (C) year name of author
Gnomovision comes with ABSOLUT ELY NO WARRANTY; for details
type ' show w'. Thi s is free software, and you are welcome
to redis tribute it under cert ain condi t ions; type 'show c'
for details .
The hypothe tical commands 'show w' and 'sh ow c ' should show the appropriate
parts of the General Public License. Of co urse, the commands yo u use may be called
something o ther than 'show w' and 's how c '; they could even be mouse-clicks or menu
items-whatever suits your program.
You should also get your employer (if you work as a programmer) or your school, if
any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample;
alter the names :
Yoyody n e, Inc . , he r eby discla ims a l l copy ri g ht
inte rest in the pr ogram 'Gnomovisi on '
(wh ich makes passe s at compilers) written
by James Hacker .
signatur e of Ty Co on, 1 April 1989

Ty Coon, President of Vice
T his General Public License does not permit inco rpora ting your program into pro-
prietary programs . If yo ur program is a subrourine library, yo u m ay co nsider it more
useful to permit linking pro prietary applications with the library. If this is what yo u
want to do , use the GNU Lesser General Public License instead of this License.
Example Use
This section is not part of the GPL. Here we show the copyright comment from the
GNU env program:
/ * env - run a program in a modified environment
Copyright (C) 1986, 1991-2002 Free Software Found ation, Inc .
This pro gram is free s o f t ware ; you can redist ribu te it and/o r modify
it unde r the terms o f the GNU Gen e ral Public Licens e as p u b lished by
the Free So f tware Foun d a tion ; either version 2, or (at your op tion)
any lat e r versi on .
This prog ram is dist ributed in the hope that it will be use f ul ,
but WI THOUT ANY WARRANTY; without even the impl i ed warranty of

MERCHANTAB ILITY or FI TNESS FOR A PARTICULAR PURPOSE . See t he
GNU Gene r al Public Li cense for mor e details .
You shou ld h ave re c e ived a copy of the GNU Genera l Publ ic Li cense
along wi t h t his program; if not, wr ite to the Free Software Foundation,
Inc . , 59 Temple plac e - Suite 330, Boston, MA 02111-1307, USA . * /
This is typical usage. It contains the following, essentially boilerplate items:

• A one-line comment naming and describing the program. In a larger program, it
would name and describe the purpose of the file within the proram.
• A copyright statement.
• Two paragraphs of explanation and disclaimer.
• Where to get a copy of the GPL.
Index
, (single quore), in formar specifie rs, 500-502, 516

Symbols ( ) (parenrheses), in regular express ions, 474-475
- (dash) [1 (square brackers), in regular ex press ions, 47 5, 493
as filen ame, 85, 97, 476, 478 {} (braces), in regul ar ex pression s, 475
in nl_ l anginf o ( ) , 506 $ (dollar sign)
in oprions, 24-25 , 28 , 34 as currency sign, 48 8, 496 ,499
in permissio ns, 5, 139 in formar spec ifie rs, 514- 515
in regular exp ress ions, 493 as pro mpr, xxv i
- - (dash-dash) in regular expressions, 473-474
in long oprions, 27 \ (backslash), for co nrin uarion lines, 68, 71
as special argum enr, 26 # (hash)
_ () macro, 512- 513, 5 16 , 5 18, 527 as co mmenr spec ifi er, 238
(comma) in formar specifiers, 500
as decimal po inr, 496 as prompr, 236
in opri o n argum enrs, 26 # ! , in scriprs, 8, 294
(sem ico lon) % (per cenr sign), in formar specifiers, 499
in g etopt_l o ng ( ) , 37 + (pi us sign)
in nl_langinfo ( ) , 504 in format specifiers, 499
(co lon ) in nl _ langinfo () , 506
in getopt ( ) , 3 1-33 = (equal sign), in op rion argumenrs , 27
in n l _ langinfo ( ) , 504 > (g rearer-rhan)
in opt string , 30 as operaro r, 1 10
in PATH vari abl e, 296 as prompr, xxv i
in regular express ions, 493, 524-525 > > operaror, 110
? (quesrion mark), in get o pt () , 31 - 33 I (vertical bar)
! (forward slash), as roor direcrory, 10, 162,229-231, as Rag separaror, 586-5 89
276 as pipe consrruc r, 12
(dor) I &, in gawk, 337
as currenr working direcrory, 10, 125, 130, 132, 136,
162, 208, 463, 515 A
as dec im al poinr, 489, 496 a . ou t (Assembler 0 UT pur) fo rmar, 8
in filen ames, 463 abo rt (),363, 373,429, 445-446 , 48 1,572
in formar specifiers, 500 acc e pt () , 364
in nl_ lang inf o ( ),506 access ri me, 143- 144, 208,247
in regula r expression s, 473 chan gin g, 157- 161 , 163
.. (dor-dor), parenrd irecrory, 125, 130, 132, 136, 162, formarring, 176
208,277,463 rerrieving, 545, 562
in rh e roor of a mounred filesysrem, 229 acc es s ( ), 364, 410- 412, 425
A (har), in regular exp ress ions, 473-474 Acorn Advanced Disc Filing Sysrem, 233
667
668 I ndex
action_ handler (), 372 Au toopts library, 50

actions, see signal actions awk program, 331, 337, 472, 480, 605
adb debugger, 570 GNU version of, see gawk
addmntent (), 24 1
address space, see memory B
Agans, David ]., xxxi, 633 , 637 b, in permissions, 10, 140
aio_error () , 364 back doo rs, 603
aio_return () , 364 Bash shell , xxvi, 326, 343, 574
aio_suspend (), 364 bc command, 105
alarm clocks, 382-383, 546 Beebe, Nelson H.F., xxxi, 480
alarm (), 364, 382- 383, 398, 401, 546 beepers, 635
alloca (), 76--79, 81 Bell Labs, 21
man page of, 77 BeOS Filesysrem, 233, 235
alphasort (), 188- 191 , 204 Berry, Karl, xxxi
"always check the rerum value" principle, 95, 358 bg command , 349, 383
Amiga Fast File Sysrem, 233 binary data, 106
Andrew File Sysrem, 233 binary execurables, 7-9,52-55 , 80, 240,342, 414
ANSI (American Narional Standards Insrirute) , xxii, binary trees, 552-563
652 deprh of, 552
arbitrary-length lines, 67-73, 80 insertions in, 553- 555
archives, 158 lookups in , 553-557
arg library, 50 nodes of, 552, 558
argc parameter, 28-34 , 49 pointers in, 554-557
Argp library, 50 removals in, 553, 561 - 562
arguments, 24-33 subtrees of, 552
invalid, 88 traversals in, 553, 557- 561
lists of, 87 bind (), 364
missing, 3 1-33 bindtextdomain ( ),46, 515- 516, 521, 527
optional, 26- 27, 33 binfmt_misc filesys tem, 233
whitespace in, 24, 26-27 bitwise operarors, 251, 584
Argv library, 50 blocks, 120, 228
argv parameter, 28-34 , 49, 294-298,342 bad , 23 1
arrays boor, 244
compared to rrees, 551 comparing, 434
element count computation, 105 copying, 433-434
searching, 191-195 fragments of, 245
sorting, 181-191 functions for, 432
ASCII encoding, 521-522, 526 indirecr, 208
asctime () , 170- 171 ,204,504 number of, 143,208, 217,245,278
Assembler OUTput, see a. out size of, 142, 245
assert ( ), 428-432, 446, 549, 579,635 su perblock, 244
assertions, 428-432, 481 Bloom, Benjamin, 650, 65 3
AT&T, xxvii, 30, 406 Bostic, Keirh, 341
atexi t ( ), 46, 302- 305,34 2,454 Bourne shell, xxvi, 350
atoi ( ), 456 break sratement, 448-449
atomic write, 334 breakpoints, 574, 584, 603- 605 , 638
Autoconf, 15, 20, 502, 521 inside macros, 581
au to f s filesystem, 233 Bren nan, Michael, xxxi
Automake, 502,521 brk (), 7 5-76, 78, 81
automounter daemon , 233 Brooks, Fred, 652-653
Index 669
BSD Fast Filesystem, 232-235, 245 c++ language (co ntinued)

BSD Unix, 15, 114, 126, 128, 140 sorting arrays of objects in, 187
core dumps in, 308 rype of character constanrs in, 220
debuggers in , 570-57 1 Caldera Ancienr UNIX License, xx, 655-656
d irecro ries in , 4 12 cal loc(),56-57, 65- 66,80,273
dirfd ( ), 257 Capey, T. , 654
fchown () and fchmod () , 162 carriage reru rn character, 70
file locking in, 531 cat program, 84-86, 97-99
file ownership in, 155 GNU version of, 101
fi lesyste ms in, 133, 232 V7 vers ion of, 99-101, 150
fts (), 268 catdir program, 134- 136
getpgrp (), 314 catgets (), 487, 504, 526
group sets in , 405 ccmallo c library, 629-630
network databases in , 196 cd command , 11,256,328
se treuid () and setregid () , 418 CD- ROM s, 9, 232, 234, 236, 239-240
s ignal () , 356 cfgetispeed (), 364
signa ls in, 358, 365, 367, 385, 399 cfget ospeed (), 364
sorting functions in, 188 cfse tispeed (), 364
st_blocks field, 218 cfse tospeed (), 364
timez one (), 179 char rype, 58
wait3 () and wa it4 (), 31 0, 343 character sets, 487, 522, 527
bsd_ signal (), 356, 361, 365, 399 cha racters
bsearch ( ), 191- 195 , 198, 204 classes of, 493, 524-5 25
BSS (B lock Starred by Symbol), 53 lowercase vs. uppercase, 488, 493,524-525
BSS areas, 53-55 o rder of, 522
BSS sections, 53-5 5 wide, 523
buffer cache, 111-113, 140,444 chdir() ,256-258 ,276, 278, 364
buffers "c heck every call for errors" prin cip le, 18,72
ma inraining strategy, 67- 73 check_s alary (), 604
ove rrunning, 61, 69,589 chgrp program, 5
size of, 73-74,152- 154,163,258,589,597 c hi ld ( ), 395
chmod program, 5, 107,406
c chmod ( ), 156, 161 , 163, 364
c, in permiss io ns, 10, 140 chown program, 5, 260
C language, xvi i chown (), 155- 156, 161, 163, 198,364,4 14
1990, 1999 Standards, xxi, 19-20 chroot (), 276-278
conrinuation lines in, 68 m anpage of, 276
NULL constanr in, 58 cleanup (), 357
p reprocessor in, 580 clear env( ), 42
type of characte r co nstanrs in, 220 man page of, 43
see also Original C, Standard C clock_gettime (), 364
C++ language close() , 84, 94, 109,288,316,3 18,343,364,
1998, 2003 Standards, xxi, 20 538-539
assignmenr of a poinrer value in, 59 closedir (), 133-135, 162
cons t items in, 55 c1ose-on-exec Hag, 329-331, 336, 341, 344
fu nction proro rypes in, 12 Cocker, Gai l, xxxi i
GNU programs in , 15 coda filesystem, 233
main(),301 code formarring, 16,652
names in, 571 COFF (Co mmon Object File Format), 8
preprocessor in , 580 Coherenr filesystem, 234-235
670 Index
collating sequences, 491 ddd debugger, 570

Collyer, Geoff, xxxi, 59, 93 , 379 deadlocks, 87, 537
command line processing, 16 debug_ dummy ( ), 604
command substitution, 470, 482 debuggers, 349, 430, 445, 568-570, 605-631,635,
Com mon filesystems on x86 hardware, 232, 235, 277 639
Common Object File Format, see COFF graphical, 570, 638
compar (), 224 internal, 606
compatibiliry with standards, 15 machine-level vs. source-level, 570
compile-patt ern () , 477-478 debugging, 568- 639
compilers, 591-592, 631 compilation for, 569-570, 574
conditions, 582 mactos for, 579- 580,638
logging, 602 rules of, 633- 637, 639
using variables for, 580 runtime, 595-605 , 638
connect (), 364 debugging files, 602-603
const keyword, 14,55 debugging symbols, 569, 578, 638
consumers, 318, 343 decimal point, 489, 4%
cont command (GOB), 576, 584, 638 delete operator, 56, 59
continuation lines, 71 OeMaille, Akim, xxxi
conti nue statement, 448 demand paging, 415
Coordinated Un iversal Time, see UTC denial of service attack, 597
coprocesses, see pipes, fWo-way determinism, 454
copyright(), 512 I dev I fdl xx files, 326- 328, 343
co re dumps, 56, 308, 349,430,445-446,481, 571 , I dev I random fil e, 460
6 18 I dev / urandom file, 460-461
Cox, Russ, xxxi devfs filesystem, 233
cp program, 298 device numbers, 147-148, 229
cpio program, 149, 156,158 devices, 9- 10, 21, 119, 140
cramfs filesyste m, 233, 235 block, 10, 14 0, 142, 148, 163,240
creat(), 106, 109- 111 , 115, 122,287,322,364 , busy, 87
412 character, 10, 140, 142, 148, 163,240
flags fo r, 332-334 loopback, 236, 239
critical sections, 367, 393 masks for, 146
cryptography, 454 , 460 slow, 358
csh, man page of, xxvi rypes of, 147
ctime () , 170- 17 1, 179,204,216,503 devpt s fi lesystem, 233
currency symbols, 488, 4%,499, 505- 506 df ptogram, 244, 250
dgettext (), 508
D diff program, 19
d, in permissions, 5 difftime () , 167, 184,203
daemons, 278, 3 19 dir ec t srruct, 132
data access model , 18, 21 directories, 6-7,1 19-122,139, 162
data sections, 52-55 changing, 256- 258
data segments, 52-55 creating, 130-132
dates, 166 current position in, 138-139, 162
current, 167 current roOt, 10, 162, 229,276- 278,285
form atting, 488, 503 current working, 10- 12,21, 125,237, 256,
daylight-saving time, see OST 276-278,285
dbug library, 606-612, 639 absolute path name to, 258
dbx debugger, 570 information about, 208
dcgett ext ( ), 508 mask for, 146
Index 671
directories (continued)
moving, 128 E
parenr, 125 EBCDIC encoding, 521
read ing, 132-139 echo program, 24, 29, 33
re movi ng, 128, 130 ed ed itor, 4 16, 447, 472
sy mbolic links to, 128 ef script, 617- 618
sysrem roor, 229- 231,276-277 e fen ce (Elecrric Fence) library, 614--619, 630- 631,
fo r remporary fil es, 443, 481 639
walking, 260-276, 278 manpage of, 618-619
directory enrries, 120-122,125,128, 133- 138, 162, efs filesystem, 233
208 egrep program, 472
file rypes in, 138 ELF (Extensible Linking Format), 8, 294
lengrh of, 246 Emacs ed itor, 16, 18,472,574,644
read ing, 221 emp_name_id_ compare () , 183, 558- 559
sorring, 188-191 ,2 13 empl oyee muct, 183- 187, 192-195 , 535 , 556,
directory permission s, see permissions, direcrory 558
dirent srruct, 133- 138, 162, 188,204,466 ENABL E_NLS constant, 511, 513, 516
d i rfd (),257 encodings, 487, 522, 527
discardin g dara, 9 multibyte, 523-524
d ispositions, see signal acrions, default sel f-correcring, 524
dmall oc library, 619-622 , 629-630, 639 endmntent (), 241
do_input (), 583 e ndpwent (), 197- 198
do_statfs (), 255 e nr ropy pool, 460
do_statvfs () , 248 e nv program, 42 , 44-49
Drepper, U lrich, xxxi, 379 environ variable, 43--44, 48-49, 294
DST (dayligh t-saving time), 170,178-180 e nvironmenr va riabl es, 11, 40-42
du program, 260 adding, 42, 44
GNU vers ion of, 264 , 269- 276, 278 for deb ugging, 597- 601,613,6 18- 6 19
dup() , 321- 326, 343-344, 364,420 w irh empry va lu es, 41
dup2 (), 321- 326, 331 , 343-344,364 expansion of, 470, 482
DVDs, 234, 239 for locales, 488--490
dynamic data srrucrures, 552 ra ndom order of, 43
dynamic memory, 18,52,56- 80 removing, 42, 44, 48
access in g afrer sh rinking, 63 e nvironmenrs , 11 ,40--49,285
accessingourside the bounds, 61,568,613,6 16, clearing, 42, 44
623,631 E poc~ 157, 166, 203, 543
aligned , 530-531 , 562 e ras, 504-506
caicularing size of, 58 , 65 Er ickson, Cal, 629
changing size of, 62-65 , 76, 614-619 errno variable, 58 , 86, 94,97,99, 11 5,123,127,
debugging, 612-63 1, 639 129- 130,133,153, 161 ,167,202,257-258,
freed, 60, 80, 472, 612, 616, 623 293-294,302,304,306,334, 338-339,
inirially allocarin g, 58-60,408-409 357-3 58,360,365 , 399,412,422,464, 530,
leaks of, 61 , 63, 80, 188,437,451-453, 612,623, 537,550
627 examinar ion of, 90
releasing, 60- 62, 80 man page of, 90
tracing, 61 3- 614 valu es for, 87-90
unfreed, 612, 63 1 error messages, 16, 18,72,90- 91,99,115,445-446
uninirialized use of, 613, 623, 627- 629, 639 diagnostic idenrifiers for , 90-91
zero-filling, 62 , 65, 432 handlin g, 32
672 Index
errors, 86 FIFO (first-in first-out) files, 140,319-320,343,379

reponing func tio ns for , 90 creating, 320
Ersoy, Alper, xxxi empty,334
I etc I fstab file, 238-24 3, 278 nonblocking 110, 333-336
I etc Imtab file, 238-239, 241-243, 249- 250, 278 removing, 320
l et c / vfstab file, 238 file descriptors, 91 - 101, 115, 141 ,162, 202,315
euidac c ess () , 412, 425 attributes of, 328- 337, 344
--exclude optio n, 461 bad , 87
exec ( ), 293- 300,329, 342, 398-399, 40 1, 420, closing, 323, 446
423 duplicating, 33 1- 332
execl (), 295- 297, 339 functions for , 11 4
exe cle (), 295-297, 364 leaks of, 338
execlp ( ), 295- 297 , 308 , 423 lowest acceptable value of, 331
executable code, 52-55 , 80 new, 93, 95
execv ( ), 295- 297,465 obtaining, 257
execve (), 294-295 ,364 for open files, 285- 289, 316,331,342,442
man page of, 294 copying, 321 - 326, 336, 343- 344
execvp (), 49, 295- 297,325,423, 465 shared , 287, 342
exit status, see rerum values file mod es, see permissions, fil e
ex~ t () , 302- 305 , 307, 342, 445 , 538 file permissions, see permissions, file
_exi t () , 302- 305 , 325, 342, 364 file starus Aags, 330-333, 336, 344
_ Exi t ( ), 302- 305, 342 , 364 file table, 287
EXIT_FAILURE co nstanr, 300 File Transfer Protocol, see FTP
EXIT_SUCCESS consranr, 300 fil e_interes ting () , 463
ext filesystem, 235 filenam es, 7
e xt2 filesystem , 233, 235, 240, 41 4 basing program's behavior upon, 298
e xt3 filesystem, 233, 235, 240, 4 14 changing, 125-126, 157
Extensible Linking Fo rmat, see ELF fun ctions fo r, 114
Extensible Markup Language, see XML ge nerating, 437-441
length of, 7, 88 , 121, 133, 246
F filen o () , 95 , 101,113 ,538
fa c tor ial (), 609 files , 4- 10, 21
FAT filesystem, 232, 234-235 , 277 attributes of, 328-337, 344
fchd ir (), 257, 278 byte positions within , 102, I ll , 538
fc hmod(), 156, 161 - 162,364,542 closing, 94- 96, 288
f c hown (), 155, 161-162, 198, 364 copying onto th emselves, 101
f cl os e ( ), 24 1 creating, 106-113, 115 , 122- 123
fcntl () , 328- 337 , 344, 364, 532, 533-539, 540, existing, 87, 110
562 informarion about, see metadata
Aags for , 332- 334 locking, 531-533, 562
man page of, 328, 333, 533, 535 openi ng, 88, 93-96, 113, 115
FD_ CLOEXEC Aag, 330 reading, 94, 96- 99, 1 11-113, 11 5
f datasync () , 113, 115,364 regular, 139, 358
f dopen (), 340 mask for, 146
fflu s h (), 113, 578 removing, 126- 128
fg command, 349, 383 resto ring fro m arch ives, 156, 158
fget s () , 69 shared, 96
fgrep program, 24 size of, 88,114- 115, 142,161,260
Index 673
files (continued) ftp program, xxviii

truncating, 110, 143 f trun ca te () , 11 4- 115, 161, 364
types of, 138-145, 149 fts (), 268
macros for, 147- 148, 163 FTW stru ct, 263, 270
masks for, 145- 147 ft w(),26 1
w ritin g, 94, 96-99, 110-1 13,115, 143,1 57 F T W_CHD IR Rag, 262, 266-267
fi lesystems, 119, 133, 162,228- 236, 277 full _write (), 161
busy status of, 237 funcrions, xix, xxi
debugging, 208 callback, 263, 302, 342,557,5 59
inform at ion about , 244 debugg ing, 586
jo urnal in g, 235, 277 decl arati o ns of, 14, 28
mounting, 136, 142, 228, 236-238, 278 , 410, 536, helper, 584, 638
540, 562 low-level, 75, 81
read-only, 89, 234- 235, 239 , 246- 247 naming co nvenrions for, 114
unmounr ing, 228, 237 recu rsive, 54, 65, 268
find program, 149,260,461 wrapper, 72
manpage of, 150 futi me () , 162
fin i sh command (G OB), 575, 584
fi rst-i n firs t-ou t, see FIFO files G
Fish, Fred, xxxi, 606 ga rbage collectors, 19
Hags, 36, 251, 361, 584 gawk program, xxiii , 16-1 7,37,67 ,331, 337-341 ,
co nverting ro a su ing, 585-590 408,448,489,574-595,597-601,605,614
flags2str(),586-5 90 n umeric val ues formatring in , 501-503
flock muct, 329 , 533- 536, 542 r,vo-way pipes, 337
flock(), 532, 539- 540, 562 GC C (GNU Compiler Collection), 15,37,77, 569,
manpage of, 540 638
Hoppy d isks, 406, 541 macros in, 580- 581, 594
fnrnatch (), 462--464, 482 GOB (GNU Debugge r), 19,570- 577, 580- 584,
fold ers, see directori es 605,613,62 1,62 5,638
fopen (), 22 1,442 d isuib utio ns of, 577
fork ( ), 284--289, 293 , 308-309, 329, 339,342, macros in, 58 1, 594
364,385,398-399, 401,423 Gemmellaro, Anth o ny, xxxii
forrnat_num () , 387, 394 Gemmellaro, Faye, xxxii
fo r tune program , 454 General Public Lice nse, see GNU G P L
fpathconf () , 364 generali ty, 18
fprintf () ,90,355 ,578 genfl ags2str ( ), 587-589
fpsync () , 113 getcwd() , 258- 260, 278
F ree Softwa re Fo un dat io n, see FSF man page of, 260
fre e ( ), 57, 59, 60-62, 63, 66-67, 80, 154, 188, g e tde l im (), 73- 74, 80
259,437,453,530-531 , 562,605,612-613, g e tdent s ( ), 137
623 ,627 manpage of, 137
Free BSD fil esysre m , 235 getdtab l esi z e (), 92, 96,1 15,267,287
FSF (F ree Softwa re Foundation ), xxii, xx ix, 57 1 get eg i d (), 364, 407 , 425
code fo rmarting style of, 16 ge tenv (), 41, 49
fst at () , 101, 14 1,145 , 150-151,161,258,288, ge teu i d (), 364, 407, 425
364,4 11 getgid (),364, 407,425
fstatfs (), 252- 256, 278 getgroups(),364 , 407-408,425
f statvfs (), 245 , 250,278 getit imer ()' S46-5S0,562
fsync( ), 113,115,364 manpage of, 547
FTP (F ile T ransfe r Protoco l), 277 get line () , 7 3- 74, 80 ,479
674 Index
getmntent (), 241-244,278 globerr (), 469

getname ( ), 2 I 7 globfree (), 464, 467 , 470
getopt (), 26, 30-33, 49, 608 glyphs , 521
GNU version of, 26, 30, 33- 34, 39, 49 GMT (Greenwich Mean Time), 157
manpage of, 34 gmtime (), 168-170 , 204
getopt_l ong (), 16, 26- 27,30, 34--40 , 44, 46, GNOME Project, 203
49, 516, 608 GNU C Library, see GUBC
GNU version of, 39 GNU Coding Standards, 15- 2 I, 28,60 , 66, 91, 298
getopt_long_only (), 34, 49 GNU C ompiler Collection , see GCC
getpeername (), 364 GNU Coreutils, xxviii , 30, 45 , 115, 123
getpgid () , 314, 343 distribu tion of, 47
getpgrp (), 314, 343, 364 du , 269-276, 278
getpid ( ), 289- 291 , 342 , 364 fts ( ), 268
getppid () , 289- 291 , 342, 364 install,308
getpwent (), 197-199 pathehk, 41 I
getpwnam ()' 197- 199 safe_ read () and safe_ write () , 359-360
getpwuid () , 197- 199 sort, 357
getresgid (), 421-422 utime (), 159
getresuid (), 421-422 we, 435-436
getsid (), 315 xreadl ink ( ), 153
getso ekname (), 364 GNU Debugger, see GDB
getsoekopt () , 364 GNU Gengeropt library, 50
get text program, 486-488, 507- 521, 527 GNU GPL (G eneral Public License), xxix , 14,
get text .h file, 511 - 51 3 657- 666
get text (), 508- 509, 5 10- 515 , 527 GNU Lesser General Public License, 39
g ettext_noop (), 512, 518 GNU programs, xxiii , 14-19, 39
gettimeo fday(), 544-545, 562 long options in , 27
man page of, 544 wrapper function s in , 72
get ty program, 420 GNU Proj ect, xxii
getuid ( ), 364 , 407, 425 GNU / Linux, xvii , xxxi
GID (group 10), 5-6, 21 , 108, 130, 142, 157, 196, block size in , 142
199, 204 , 404 e 9 9 command, 301
effective, 405-42 5 ehroot (), 277
mask fo r, see setgid bit elearenv (), 42
real, 405-412, 415-425 core dumps in, 308
saved set, 405 , 408 , 420-422 , 424-425 debuggers in, 571
gid_t rype, 143 / dey / fd / xx files in , 326-328
G lib library, 203 direcrories in , 413
GUBC (GNU C Lib rary), 15, 20, 39, 73 dirfd (), 257
errno values, 87- 90 d istributions of, xxviii , 233, 236, 349, 430, 487,
euidaeeess (), 412 525 , 616
f_flag values, 247, 25 I Epoch in, 157
FTW_CHDIR flag, 262 , 266 file formats in, 8
glob () extensions, 465-467 file rypes in, 145
libintl.h,513 filesystems in, 119, 133, 233-238, 277
rand () ,457 ftw () , 261
su peruser in, 404 gee command, 30 1
TEMP _FAILURE_RETRY (), 360-361 inode numbers in, 231
glob ( ), 464--470, 482 locales in , 489
globalization , 486 mo u nting in , 136, 229
Index 675
GNU/Li nux (continued) heap, 53- 55, 453

numeric values formaning in, 50 I Heisenberg, Werner, 431
preemprive mulrirasking in, 291 heisenbugs, 432
/ proc/self / cwd, 260 --help option, 16,27,48
remove(), 128, 162 Hes iod,l96
renaming operarion in, 126 Hierarchical File System, 233
rsync, xxvii High Performance File Sysrem, 233
rusage, 311 Hoare, C.A.R., xx, xxxi, 18t, 428, 431
s ignal () , 356 Hoare's law, xx
signals in, 350-352, 365, 400 holes, 104, 106, 142
sra ndard fun ctions in, 75 HOME enviro nment variable, I I
stat fs () and fstatfs (), 252- 256 HURD kernel , 15
superuser in , 404
time slicing in, 439
time _ t rype, 166 I/O, 84, 99
time-zo ne info rmation in, 180 asy nchronous, 373
version sort (), 188 blocking, 1 11
wai t3 () and wa i t4 ( ) , 310, 343 non blocking, 333- 336
_ GNU_SOURCE consranr, 73 rando m access, 102, 106, 115
Gold, Yosef, JOCxi sequential, 102
goto statemenr, 446 standard functions for, 95
Gree nwich M ea n Time, see GMT sy nchrono us, 1 12, 115, 247
grep program, 472, 474, 476-481 i 18n , see intern ationa lization
gro f f program, 15 IBM , 234, 469, 521
group, category of users, 5-6, 106-109, 404 IEEE Standard 1003.1-2001, xxi
changing, 155 #ifdef, 20,250 , 298,352,488, 578 , 586
databases of, 195, 199-202, 204 ifind(), 191
IDs of, see G ID IFS enviro nment variab le, 423
lists of users of, 199 ind entat ion , 268
masks for, 145-146 index nodes, see inodes
names of, 199, 204, 208, 217 indexing, 65, 70
passwords of, 199 inetd program, man page of, 379
group (s truct), 200 infinite loops, 453
grou p sets, 199,405,412,424 ini t process, 11, 289, 377, 385, 420
changing, 416, 425 init_gr oupset (), 408-409
numb er of groups in, 408-409, 425 in itialized data, 52
retrieving, 408 initstate ( }' 457-458
gstat ( ), 213, 221-222 inode change time, 143-144,208
GTK+ project, 203 changing, 157
gz ip prog ram , xxix in ode numbers, 120- 122,1 25,132,142,150,162,
208
H for roOt directory, 231 , 277
handl er ( ), 355, 358 inodes (index nodes), 7, 119- 122, 162, 228
handlers, see signal handlers number of, 246, 278
hard links, 122- 125, 128-130, 162 Insight debugger, 571
(Q directori es, 122, 125 i n s tall program, 308
(Q root directory, 230 interfaces, 20
hash tables, 272 InterMezzo fllesystem, 233
hasmntopt () , 241 International Organization for Standardization , see
Hayes, John R., 650,653 ISO
676 Index
internationalization (i18n). 486.526-527.643

interpreters. 8. 591-592 L
interval ti mers. see ti mers IIOn . see localization
Introdu cti on. man page of. 30. 90. 137 LANG environment variable. 489
invar iants. 428 Lave. Jean. 653
_I O_ vfprint f _ internal(). 628 lbuf StruCt. 210. 213-214 . 221 -226
IPC (interprocess communication). 140.315.379 invalid. 215
using signals for. 379. 400 LC_AL L environment variable. 489
Irix system. 233 lchmod(). 156
i satty () . 202- 204 lch own ( ). 155. 162
man page of. 202 lconv mucr. 494-498. 501. 503- 504
ISO (Internatio nal Organization for Standardization). ld program. 8
XXII LD_PRELOAD environment variable. 6 I 7-618. 620
ISO 9660 CD-ROM filesystem. 232. 234-235. 277 LDAP (Lightweighr Directory Access Protocol). 196
ISO C. see Standard C Lechli rner. Randy. xxxi
ISO/IEC International Standard 9899. xxi LEOs. 635
ISO/IEC International Standard 14882. xx i Lehman . Manny. xxxi
iswalnum ().523 libin tl.h file. 513
iswlower (). 523 libraries. 19
i timerval stmcr. 547- 51 9. 562 general-purpose. 356
PO SIX standard for. 20
J shared. 56.33 1. 61 7
Java language. 12 Lightweight Directory Access Protocol. see LDAP
Jedi Knight. xxxi limi ts.hfi le.133
job control. 312-314.343.349.383 line feed characrer. 70
job con trol shells. 313. 378. 401 line program. 288
Johnson. Steve. 4 link count. 122. 126. 142. 162
JOlJrnal ed File Sysrem. 234 link program. 123- 124
Jo urn aled Flash Filesysrem. 234 link (). 123- 124. 162.364
links. see hard li nks
K lint program. 631. 639
K&R C. see Original C Linux. see GNU/Li nux
K&R sryle of C0de formatt ing. 16 Linux Journal. 14. 629
Karels. Michael J .• 341 li st co mmand (GOB). 574
KDE 3. 629 li sten (). 364
Kenobi. Obi-Wan. xxx In program. 122-123. 129.298
Kerberos network darabase. 196 locale program. 487
Kernigan. Brian W .• 638. 644 man page of. 524
Kernighan. Brian W .• xvi i. xxxi. 12. 16. 65. 602 localeconv (). 494--498.526
kill program. 352. 384 locales. 486-487. 524-526
kill(). 340. 363-364. 373-3 74. 376-379.400 carego ries of. 487-490. 526
killpg ( ). 376-377. 379. 400 default. 487
Kirsanov. Dmitry, xxxii setting. 489-490
IG rsanova. Alina. xxxii localization (lIOn). 174.486. 526-527
Knuth. Donald E.. 203. 480. 637 local time (). 168- ]70. 179.204
Konqu erer web browser. 629 loc kf ().532. 533- 539. 562
ksh (Korn) shell. 337. 343 locks. 53 1-533. 562
ksh88 shell. 326 advisory. 532-533. 538-539. 562
ksh93 shell. xxvi. 326 exclusive. 539
mandatory. 247.533 .540-543. 562
Index 677
locks (continued) Mani. Don, xxxi

obraining. 536-538 mblen (), 524
range. 537 mbrlen (), 524
read. 532. 535. 562 mbrtowc (). 524
record. 531 mbsrtowcs (), 524
releasing. 536-539 mbstowcs (), 524
shared . 539-540 mbtowc (), 524
whole file. 532. 539 McGary. Greg, xxx i
wrire. 532. 535. 562 McIlroy, Do ug, xxx i
log files. 601 - 603. 613. 634 McKusick, M ars hall Kirk. 34 1
logi c a nalyze rs. 635 memalign(), 530- 531. 562
login p rog ram. 4 19 memccpy ( ), 433 , 48 1
longjmp (). 264. 447--449. 450--454.481.607. memchr ( ), 435- 436. 481
611 memcmp ( ), 434-435 , 48 1
ls command. 139- 140. 148-149.203 .463 memcpy (). 433- 434. 481
mod ern versions of. 208-209. 226 Memishian, Peter, xxxi
V7 version of. 208-226 ms~ove (), 20,433--434 ,481
manpage of. 208. 218 memory, 10-11.54
lsearch () . 428--432 dynamically allocated, see dynamic memory
lseek(). 102- 104.11I, 114- 115, 161,218,287, overlapping areas of. 433
315,364 read-o nly, 55
whence values fo r, 102 sett ing, 432
ls tat () , 141 , 145, 15 1- 152, 154, 163,260,364 use of, 18-19
lutime (), 162 memset(),59, 65 , 432 .481
memwatch library. 630
M message cata logs, 507, 510, 515, 527
MacOS X Nednfo nerwork darabase, 196 message object files, 520, 527
magic numbers, 8, 228 metacharacter expansions, see wildcard expansions
convening [0 prinrable srrings. 255 meradata, 7,119,1 26,141-143,162-163
main program, 301-302 modification time of, see inod e change time
main (), 10,30.46,49,304-305,307,343, 538 Meyering, J im, xxxi, 30, 41, 154. 268
declaring, 28 Mi crosoft Windows
process ex ir status in , 85 co nvent ion of line end in g in , 70
ma j o r(), 148, 163,2 16 Epoch in, 157
make program, 521 filesysrems in , 119,232,234
G NU version of, 67-73 mingetty program, 420
makedev ( ), 148 minicompurers, 5, 9, 3 13
Makef ile.68 Minix filesysrem, 234-235
makename (). 220 . 226 minor (), 148. 163,216
malloc replacemenr libraries, 629- 630 mkdir(), 130- 132, 162,364
malloc (). 53, 56--60,61-67.72,76,80, 152-154, manpage of. 130
188,222.258-259.409,451,493,531.552, mke2 f s program, 119
555.605.612-613,618,623,629 mkfif o program. 140
castin g the return value, 58 mkfifo (), 320, 343, 364
manpage of. 57 manpage of, 320
MALLOC_TRAC E env ironment variable, 6 13 mkf s program . 119
man co mmand , xix, xxvi mknod program. ma n page of, 148
manage(),394-397 mks t emp(), 441--443.48 1
manifest constanrs, 93 mk temp (), 437--441 , 442
manpages (ma nual pages), xxvi, 634 mkt i me( ), 176--178, 179,204
678 Index
nmten t struct, 241- 244 nl_l anginf o () , 504-506, 516, 526

modifi cation time, 143-144 , 208 NLS (native language support), 486, 526
changing, 157-161, 163 "no arbitrary limits" principle, 17- 18, 21,52,67-68 ,
formatting, 176 585 , 589,644
rerrieving, 545, 562 nodots (), 190
sorting by, 208 non local gOtoS, 446, 481, 611
monetary formatt ing, 488, 494--499, 526 Norvig, Peter, xxxi, 649
Moraes, Mark, 629 ovell, 234
mount points, 229, 277 NTFS filesystem , 234-235
moun t program, 229, 234, 236- 241, 277, 406, 541 NUL character, 7,17-18 , 69,71
man page of, 235 , 240 NULL constant, 57-58, 80,430
mount ( ), 229 numbers, 35
mounting, see filesys tems, mounti ng forma tting, 488 , 494-503, 526
mpatro llibrary, 629-630 grouping digits within, 496, 499, 501-5 02
MS-DOS, 232
msdos fi lesystem, 234-235 o
mtrace program, 61 3-614 , 629- 631, 639 O_NONBLOCK flag, 332, 334-335, 344
mtrace () , 613-614 object fi le formats, 7
mu ltiuse r systems, 404 obstacks, 19
muntr a ce (), 613- 61 4 ofet type, 102-103
Murakawa, Yasushi, 654 offsets, 102, 534
mv program, 125, 298 Open G roup, The, xxii
open ( ), 84 , 93- 96, 106, 110-113, 115, 122, 137,
N 151 , 257, 287,322,326,364,4 11--41 2
N_ () macro, 512- 513, 516 ,518, 527 flags for, 94 , 110-113 , 115, 332-334
named pipes, see FIFO files man page of, 113
nanosl eep (), 550- 551 , 562 OpenBSD filesys tem, 23 5
native language suppOrt, see NLS opendir (), 133- 135, 162,267, 464
nbloc k (), 217--218 Open Office, 629
NCP protocol for NetWare, 234 openpromf s filesystem , 234
NerBSD filesystem , 235 Open VMS filesys tem, 119
netwo rk databases, 195 operands
Netwo rk File System, see NFS order of, 26
Ne two rk Information Service, see NIS place ment of, 27
new operato r, 56, 59 Opt library, 50
newfs program, 119 optimal_bu fsize () ,597-60 1
next command (GOB) , 575 , 638 optimiza tions, 569, 574, 638
NeXTS tep system, 235 opti on suuCt, 34
NFS (Network File System), 232 , 277 options, 16, 24
nf tw ( ), 251 , 260- 268 , 278 debu gg ing, 595-597 , 607
callback fun ction for, 263, 267--268 invalid , 3 1-33
fl ags for, 262-267 long, 16,27- 28,34--40
man page of, 263 names of, 25 , 27
private version of, 270 placement of, 26- 27
nget t ext ( ), 509- 510, 516, 527 undocumented, 595
nice values, 285, 291 - 293, 342 vendo r-specific, 25, 37
ni ce (), 291- 293, 342 Original C, 12- 14,21
NIS (Network Info rmation Service), 196 function parameters in, 631
nj amd (Not Just Another Malloc Debugger) library, GNU programs in, 15
630 see also C language, Standard C
Index 679
os_close_on_exec(),341 PIO (process 10) (continued)

other, category of users, 5, 106-1 09 ,404 using in seed valu es, 459
masks for , 145-146 wrapping around, 289
owner, of a fiic, see user p i d _ t rype, 285
own ershi p, 5, 119 Pike, Rob, 21, 638,644
changing, 155- 156, 161, 163 Pinard, Fran ~o i s, xxxi
masks for, 146 pipe ( ), 316-318, 338, 343, 364
PIPE_BUF con stant, 334-335
p
pipes, 12,315,343,379
p, in pet missions, 140 blockin g, 333
parame ters, 28- 30, 54 broken, 89
lists of, 63 1 buffering, 3 18- 3 19
pareIH process [D , see PPID creating, 316- 3 18
parse u ees, 59 1 empry, 334
parse_debug (), 596 named, see F[FO files
panitions, 11 8, 162, 277 nonb[ocking, 333-336
Pascal language, 446 nonlin ear, 326, 343
passwd strucr, 197- 199 synchronization of, 319
PATH environment variab le, 1 I, 295 , 342, 423 two-way, 337-341
pa thchk program, 411 Plan 9 From Bell Labs, 50, 109
pathconf (), 364 pmode (), 219
path names, 119, 128 po i n te rs
absolute, 258 , 278 calculating, 57
checking for validiry, 4 11 da nglin g, 60, 613
relative, I I, 256 decla rin g, 58
pause () ,364, 379- 383 ,401, 542 freeing twice, 60
PDP-Il , 4, 121 , 226,414 , 457 generic, 57
p entry () ,215 guaraIHeed valid, 72
Perl language, 472 invalid, 57, 63-65
permission bits, 5,219,226, 258 passing, 61
constaIHs for, 107- 108 setting ro NULL, 60
defa ult, 108 sorti ng, 187
masks for, 145- 147 poll () , 364
perm issions, 21,119,142, 144- 15 1,219 Popt library, 50
changing, 5,108 , 156- 157,16 1, 163 portab ility, 15, 19- 20,22, 39,58,69,77- 78, 136,
checking, 404, 425 34 1, 580, 642
denied,87 porta ble object files, 518, 527
di recrory, 7, 126 porta ble object templates, 519
expressed in octal, 107 Portable Operating System IIHerface, see POSIX
file, 5, 106-109, 126 standard
macros for, 144 positional specifiers, 51 4-515
perror (), 18, 90, 115 POSIX standard, xxi- xxii
pfatal_with_name(),73 bsd_signal () , 356
PGID (process gro up ID), 286, 306- 3 15, 343 character classes in , 493 , 524-525
PID (process ID) , 10- 12, 285- 286,289- 29 1,295 , compatibility with, 15,40
3 15,342- 343 d irecrori es in, 414
pareIH, see P PI D environ var iable, 43
of the process that died, 306 , 308 e rrno values, 87- 90
_ exi t () , 302- 305
680 Index
POSIX standard (continued) / proc / self / cwd fi le, 260

extensions, xxii process groups, 312-315, 343
ADV, 530 background, 313
FSC, 114 foreground , 313
S[O, 114 IDs of, see PGID
TMR,550 leaders of, 312, 314, 343
XSI, xxii, 75, 114,133,244,307-308 ,314 orphaned, 313
FD_CLOEXEC flag, 330 sending signals to, 376-377
file locking in , 531 setting, 314-315
file ownership in, 5 process signal mask, 286, 366-370, 378, 400,
filesystems in, 133 449-450, 481
file-typ e and permission bitmasks in, 146 starring out, 382
flags for open (), 110- 113,332-333 process substitution , 326
ftw () ,261 process_file (), 273 , 275
isatty (), 202 processes, 10-12,21, 52- 55
libraty and system call interfaces in, 20 blocking, 366, 547
nice values in, 293 chiW, 10,87,284,286,289,322-326,338,342
option conventions in , 25-27, 37 dead,305 ,385-398
PIPE_BUF constant, 334-335 nondeterministic order of, 326
printf (), 514-515 continuing if stopped , 349, 378, 384, 395,397,
process group information in, 314 399
rusage, 311 creating, 284, 342
signals in, 358, 365, 367, 370, 376-379, 385, 399 executing programs in, 293, 342
st_blocks field , 218 exiting, 85
superuser in , 404 [Os of, see PID
symbo lic constants for permissions in , 107-108 killing, 399, 430, 481
time_t type, 167 orphan, 289
timezones in, 178-179 parent, 10, 284, 286, 289, 305, 323-326, 338,
wai tpid (), 306 342,385-398
posix_memalign(), 530-53 1, 562 polling, 307
posix_trace_event(),364 reading, 318
POSIXLY_CORRECT environment variable, 33,40 reaping, 305
postconditions, 428 stopping, 305, 307, 349, 384, 399
PPID (parent process ID) , 10,286,289-291,342 suspending, 380, 384
preconditions, 428 synchronization of, 291
predictable algorithms (mktemp ( ) ), 439 terminating, 300-305 , 342, 348, 357, 382, 550,
preemptive multitasking, 291 562
print_emp (), 559 writing, 318
print_employee (), 193 producers, 318, 343
print_group () , 201 profiling, 547
print_mount (), 244 programs
printf (), 95 , 268, 387, 451 , 499-503,516, basic Structu re of, 84-86
526-527,635 distributions of, 39
man page of, 50 I logging, 601-602
POSIX version of, 514-515 messages in, 507- 521, 526,602
priority, 291- 293 names of, 25
private allocarors, 67 productio n versions of, 569, 595, 597, 617
privileged operations, 416 running, see processes
proc fi lesystem, 234 testing, 632- 633, 639, 644
/ proc / mounts file, 238- 239, 241,243 , 278 undocumented featu res in, 603
Index 681
PROM fi lesystem, 234 readstring (), 69

prompts, xxvi realloc (), 57, 62- 67 ,72,80,258,273,552 ,555,
prorotypes, 12-14, 21, 631 612
ps program, 53 GNU version of, 66
pselect (), 364 Standard C version of, 63
pseudorandom numbers, 454--461, 480 recurse (), 573
pseudoterminals (pse udo-ttys), 9, 202, 233 recv (), 364
ptrdiff_t type, 57 recvfrom ( ),364
putenv(), 42, 49 recvrnsg (), 364
GNU vers ion of, 42, 48 regcomp (), 472- 475, 482
pwd progra m, 328 r e gerror ( ), 472
regex ec (), 472-475 , 482
Q reg f ree (), 472
QNX4 filesysrcm, 234 r e g i s ter keyword, 30, 50
qsort (), 181- 191 ,204, 213 registers, 53
Quarterman , John S., 341 regular expressions, 471-480, 482
Quicksort algorirhm, 181 basic, 472
quote_ n () , 124 extended, 472
ranges in, 488, 493
R rei serfs filesystem, 234
r, in permissions, 5 Remote File System, see RFS
race co nditions, 125,361 -363,365,368,380,393, remove(), 127- 128,320
400,4 11,439-440,480-48 1,549, 624 GNU/Lin ux version of, 128, 162
radix point, 500 , 505 rename ( ), 126, 162,364
Rago, Stephen, 341 reproducibility, 569
raise(), 353, 357, 363-364, 373-374,377, 384, rerurn values, 53-54, 300-301 ,342
399 126 and 127 codes, 301, 304
RAt'vf disks, 234 ,444--445 casting, 58,63
Ramey, Chet, xxxi ro void, 95
ramfs filesystem, 234 checking, 59, 63, 95, 358
rand ( ), 455- 457, 482 negative, 301
co mpared ro random ( ) , 459 rewindd ir (), 133- 134, 162
GLI BC version of, 457 RFS (Remote File System), 232
man page of, 457 Ritchi e, De nnis M. , xvii , xxx i, 12, 16,406
ra ndom numbers, 454, 460, 480-482 rm prog ram, 7
random ( ), 457-460, 482 rmdir program, 132
compared ro rand ( ) , 459 rmdir (), 128, 130,162,364
manpage of, 458, 460 Robbins, Miriam, xxxii
Raymond, Eric S., 645, 651 Rock Ridge extensions, 234
read build-in shell command, 288 romfs filesystem, 234-235
read end (of pipe) , 316, 3 18, 323 root (superuser), 6-7,155-157,236,240,276,377,
read (), 84,96-99, 111-113, 115, 137,287,318, 404,406-407, 4 10, 415-426,543
334-336,357-358,364 rpl _ utime (), 161
readdir ( ), 133- 137, 162, 214, 220-22 1,462, rs ync program, xxvii
464 Rueda, Ca rlos, 654
GNU/Li nux vers ion of, 136 run co mmand (GO B), 574
readl ine lib raty, 574,644 runtime checks, 43 1,481
readline ( ), 67,73 rusage struct, 310- 3 11 , 343
readl ink ( ), 151- 155, 163, 260, 364
68 2 Index
setpgrp (l, 314--315, 343

5 setpwent( l, 197- 198
S_1S xxx (l, macros, 144, 147, 163 setregid ( ).417-418,42I,425
sa_flags field, 371-372, 378, 385 setresgid (l ,421-422,426
SA_NODEFER flag, 371 - 372 setresuid ( l, 421 -422, 423 , 426
SA_NOMASK flag, 372 setreuid( l, 4 17-418,421,423,425
SA_ONESHOT flag, 372 setsid (l , 315, 364
SA_RESETHAND flag, 372 setsockopt ( " 364
saf e_read ( " 161 , 359- 360, 361 setstate (l, 457-458
safe_write (l , 359- 360 set timeofday ( ). 544
sbrk (l , 75- 76, 78-79,81,453 seruid bit, 146, 246-247, 258,405--407, 419--421,
scandir (l, 188- 191 , 204 425
scanf ( l, 186 running as root , 422, 426
seo UnixWare Boot Filesystem, 233 setuid (l , 364 , 4 17, 420--421, 423, 425
scripts, 8-9 Seventh Edition Research UNIX System, see V7 Unix
seTh debugger, 570 Seward, Julian, xxxi
searching shell escapes, 416
binary, 191 - 195, 204 , 435--436 shells, 642
linear, 191,204, 217 , 428 separating arguments in, 24
in user! grou p databases, 198 sorting environment variables, 43
sectors, of a disk, 245 shift states, 523
securiry, 161, 196,406-407, 422--424,426,431 , shu tdown ( " 364
442, 481 si_code field , 373
sed program, 472 , 480 side effects, 432
seed values, 454--455 , 458, 482 sig3tomic_t rype, 362, 365, 400
seekdir ( l, 138- 139, 162 sigaction srrUCt, 370-371 , 375, 385, 400
segmentation violation, 59-60 sigaction (l, 358 , 363-364, 367, 370-375,
select (l , 219, 364 3'78- 379 , 382, 400--401
sem-post () , 364 manpage of, 370, 373
send (l , 364 sigaddset (l ,364, 368,378,400
sendmsg (l , 364 sigaltstack (l ,372
sendto (l, 364 sigdelset (l, 364 , 368- 369, 378, 400
sentinel elements, 105 sigemptyset (l, 364 , 368- 369,378,400
sessions, 312, 343 , 377 sigfillset (l, 364 , 368-369,375, 378,400
IOsof, 315 sighold () , 366-367, 368
leaders of, 31 3 sigigno re (l, 366
setegid () , 417, 421 , 425 siginterrupt ( " 376, 378
setenv ( " 42, 49 sigismember (l , 364, 368- 369, 378 , 400
seteuid (), 4 17 , 421, 425 siglongjmp ( " 449-450, 451 , 481
setgid bit, 146, 246- 24 7, 258,406--407,4 19-421 , signal actions, 348, 399
425 , 540-543 , 562 default, 348- 357 , 366 , 398- 399
for directOries, 412-414 , 425 restoring, 350
setgid (), 364, 4 17-4 18, 420--421, 425 signal handlers, 349, 353-367, 370-375 , 399
setgroups (l, 4 16, 420, 425 functions that can be called from, 364-365
seti timer (l , 383 , 546-550, 562 install ing, 366, 382
setjmp (l , 447-449 , 450--454, 481, 607, 611 reinstalling, 355- 356, 361
setlocale (l, 46,488,489-490, 516, 524,526 restOring, 363
man page of, 487 shell-level. 350
setmntent (l , 24 1 signal numbers, 307, 350
setpgid ( l, 314--315 , 343,364 signal sets, 368
Index 683
signal (), 349- 3 53, 364-365 , 375, 383, 399-400 sort program, 357
BSD vers ion of, 356 man page of, 181
GNU/Linu x vers ion of, 356 so rring, 18 1- 19 1, 204,445
ma npage of, 350, 352 by modificario n rime, 208
signals, 305 , 343, 348, 363 of po inrers, 187
avai lable under G NU/ Linux, 350- 35 2 stable, 184
blocking, 365, 367-368, 371, 378, 38 1,398 , So urhern Sro rm Software, xxv ii
400-401 spagherri code, 446
ca rching, 349 SPARC sys rem, 234-2 35
dearh of ch ild, 401 speed, 18
ignoring, 348, 350 , 356- 357, 375, 380, 398-399 Spencer, HenlY, xxxi, 21 , 132
interrupr, 9, 30 5, 3 13, 354, 365, 376 s plin t (Sec ure Programm in g Li nr) program, 63 1,
jo b co ntrol, 9, 30 5, 307, 313 , 383- 385,401 639
pend ing, 370, 375 , 384,3 98,400-40 1 spr intf () , 501
real-rime, 353 s r and ( ),455 , 48 2
se nd ing, 376- 378, 399-400, 404 srandom ( ), 457-458,482
supporred , lisr of, 352 ssiz e_ t type, 73, 98
using for IPC, 379,400 st_ ct ime field , 143
sigpause (), 364, 366, 382 st_mode fiel d, 142- 145, 150
si gpending () , 364, 369- 370 , 37 5,378,400 st_s i z e field, 142, 152, 154 , 163
s i gprocmask () , 364, 369, 378,400 srack, 53-55, 77 , 623
sigqu e u e () , 364, 373 stack frames, 573
sigrel se (), 366-367 srack segments, 53- 55
s i gset (),364, 366, 367 srack rraces, 573
s i gse t _ t type, 368-369, 378, 400 Srallman, Richard M., 22 , 66
s i gse tjmp (), 449-450, 45 1, 48 1 Srandard C , 21
s igsu spend ( ), 364, 369- 370, 378,38 1-382,396, 1990 ISO , xx i, 12- 14, 19-21
401 1999 ISO, xx i, 14, 19- 2 1
s i gve c () , 367 cons t irems in , 55
simplicity, 4, 9, 12, 18, 2 1 exiring funcrions in, 302- 305
single-sreppi ng, 575 ,638 GNU programs ill , 15
s iz e program , 55, 80 ma in (), 30 1
s iz e _ t type, 57 re alloc ( ) , 63
s iz eof operaror, 58, 105, 589 remove ( ), 127- 128
s l eep (), 290- 291 , 364 , 383,401, 550 signal funcrions in, 349- 353
5MB fi lesys rem, 234 t ime_t type, 166
socke t (), 364 va riad ic macros in, 579
socketpa i r() ,364 wide cha racrers in , 523
sockers, 88- 89, 14 0 see also C language, Srandard C
mask for, 146 srandard error, xvii, 11 , 21, 92- 93, 115, 14 1, 202
sofr li nks, see symboli c links sending debugging messages ro, 578
Solaris, 278 srandard inpur, xvii, 11, 21 , 27, 92-93, 98 , 115, 14 1,
co re dumps in, 308 202, 3 15,337,476, 478
d irec[O ries in , 4 13 shared by two processes, 287
fi lesys rem in, 235 , 238 srandard ourpur, xvii, 11, 2 1,27 ,92-93, 11 5, 141,
ge t text , 486 150,202, 3 15,337
mounrin g in , 229 shared by two processes, 287
numeric valu es fo rmarring in, 50 1 srandards, xx
sig nals in, 354-355, 365 , 392 , 399 stat srrucr, 101 , 141- 144,157- 158,1 63,166,21 0,
sorring fu ncrions, 188 218, 224, 226, 229, 260 ,278, 288,545 , 597
684 Index
stat ( ), 141,145,148, 150- 151,161,163,176, swap space, 445

202,217,223,250,364 , 411 , 425,464,546, symbolic consranrs, 35, 93, 581, 584-585
603 using e nums for, 582
expensiveness of, 212 symbolic links, 128- 130, 139, 141, 151- 155,
ma npage of, 143 162-1 63
s ta t f s srrucr, 252- 256 crearing, 151
statfs () , 252-256, 278 ro direcrories, 128, 260
sraric rabIes, 18 levels of, 88
statvfs mucr, 245-247, 248 , 251-252 mask for , 146
s t atvf s (), 244-252, 278 owne rship of, 155
stderr variab le, 95 permissio ns on , 156
s tdin variable, 95 rimesramps for , 162
stdi o. h file, 70, 97, 99, 113, 133,363, 441 symbols, 56
stdlib. h file, 57 symlink ( ), 129, 151, 162,364
stdout variable, 95 manpage of, 129
step command (GOB), 575, 584, 638 sysconf (), 364
Srevens, W. Richard, 341 s ysct l program, man page of, 349, 572
sricky birs, 7, 258,414 syslog ( ), 602
for direcrori es, 425 sysrem calls, xix, xxi, 10, 18, 84
mask for , 146 checking for errors, 99
stopme () , 605 failing, 86, 115
s trcasecmp ( ), 494 ind irecr, 137
s trcmp ( ), 132, 182-183, 225,434,490-494, 526 inrerrupred, 88, 357, 365
strcoll () , 188 , 490-494,526 POSIX srandard for, 20
strcpy (), 434 resrarrable, 357-361
strdup ( ), 74-75, 80, 490 sysrem console, 9
strerror () , 90, 99,115 Sysrem Ill, 30
strfmon(),498- 501 , 516, 526 deb uggers in, 570
strft ime (), 17 1- 176, 179- 180, 204 , 498, execurable files in, 298
503-504,516,526 FIFOs in , 319
srring rerminaror, 41 , 69, 434 Sysrem V , 140
srrings direcrories in, 412-413
comparing, 488, 490-494, 526 fchown () and fchmod (), 162
copying, 74 fi le locking in, 531
marking for rran slarion, 5 10 file ownership in, 156
strip program, 56 filesysrems, 234-235
strip () , 308 filesysrems in, 232, 238
strncmp ( ), 494 ft w () , 261
Srrousrrup, Bjarne, 580 signals in , 354-355, 358, 365, 367,378,385 , 400
strtoul (), 601 st_blocks fi eld, 2 18
srrucrs, in C, 576 VlDs in , 405
arrays of, 590 sysv filesysrem, 234-235
nesred, 592-594
size of, 58 T
strverscmp () , 188 t, in permissions, 7
s trxfrm () , 493·-494,526 Taber, Louis, xxxi
subshells, 288 tar program, xxix, 149, 156, 158
Sun Microsysrems, 196, 232, 413 T aub, Mark, xxxi
superblocks, 228 tcdrain ( ), 364
superuse r, see root tcflow(),364
Index 685
tcflush () , 364 rimes (continued)

tcgeta t tr ( ), 364 local, 168
tcgetpgrp (), 364 resolution of, 543- 547, 551 , 562
tcsen dbreak (), 364 times (), 364
tcs etatt r () , 364 t ime spec srruCf, 550, 551 , 562
tcs etpgrp (), 364 timesram ps, 143, 157- 161
tcsh, man page of, xxv i timeval strUCf, 3 11 , 544, 545- 546-549, 55 1, 562
tdel ete () ,554, 561 timezone (), 179
tde stroy ( ),5 54 , 561 - 562 tm srruCf, 168- 170, 176,204
telldir (), 138- 139, 162 tm_isdst field, 170
TEMP _FAILURE_RETRY ( ) macro, 360-361 , 365 / tmp di rec tory, 7, 4 15, 444, 48 1,515
tempnam (), 437 TMPDIR environment variable, 443--445 , 481
remporary files, 18,357,436--445,481 tmpfi le () ,302,441,48 1
directories for, 127, 443 tmpfs lilesysrem, 234
open ing, 44 1--443 tmpnam(), 437
terminals (trys), 9, Ill, 140, 151, 202- 204, 286, 384, tokens, 621
420 touch program, 176
coll[rolling, 3 12- 314 GNU version of, 159
non blocking, 334 Un ix version of, 160
reading data from, 548 rranslarions, 507-521, 527
texr domai n, 507, 527 crearing, 517- 521
text sections, 52-55 preparing, 516-517
text segments, 52- 55, 80 testing, 5 15- 5 16
textdoma in () ,46, 507- 508, 516, 527 updaring, 521
tfind () , 554-557 trap build-in shell command, 350
Tho mpson, Ken, 109 rraps, 350
rhou sands separators, 496, 500- 502, 505 tro f f program, 15
rhrashing,61 Tromey, Tom, xxxi
rhreads, 56 trunc ate () , 114-115
tilde expansion, 467, 470, 482 tsearch ( ), 554- 557
Time Sharing Option, see TSO TSO (Ti me Sharing Oprion), 4
rime slicing, 291 , 439 rrys , see rerminals
rime zones, 178- 180, 204 TUHS (The UNIX Heritage Sociery), xxvii
timet), 167, 176,203,364,543 tune2 fs prog ram, manpage of, 235
time_t rype, 143,157, 166- 168,176-178, 203 , Turing, Alan , 428
545 Turski, Wladyslaw M., xxx i
rimeours, 548 twalk(),554, 557- 561
timer_getoverrun ( ),364 two_way _open ( ), 338
timer_gettime() ,364 type2str (), 255
timer_settime ( ),364 TZ environment variable, 178
timerclear ( ) macro, 544 tz s et () , 178- 180, 204
timercmp () macro, 544
timeri sset ( ) macro, 544 U
rime rs, 546- 550, 562 UDF filesysrem, 234-23 5
expiring, 547, 549 UlD (user lD), 5- 6, 21,108,1 42,1 96,198- 199,
se rring, 549 204,404
times, 166,203 effecrive, 377, 405--412, 415--425
broken-down, 168, 176, 204 mask for, see se ruid bit
currell[, 167 real, 377, 405--412, 415--425
formarring, 170- 176, 488 , 503 saved ser, 405 , 408, 4 16--422, 424--425
686 Index
uid_t type, 143 URLs (continued)

ulirni t build-in shell command, 92 Free Software Foundation, 571
urnask build-in shell command, 107-109 GNOME project, 203
urnask ( ), 108-109, 115, 364 GNU C Libraty CVS archive, 39
umasks, 108-109, 111 , 285 GNU Coding Standards, 15
urnount program, 229, 237, 238, 241,277 GNU Gengetopt, 50
urnsdos filesystem, 234-235 GNU gettext, 521
unarne ( ), 364 GNU grep, 481
Uncertainty Principle, 431 GNU Make, 67
Unicode character set, 487,522 GNU Project, xxii
uninterruptible power supply, see UPS GNU Project, The, article, 22
unions , in C, 576, 591-595, 638 GTK+ project, 203
nested, 592-594 Hints On Programming Language Design, 431
unistd. h file, 93 Insight, 571
Unix InterMezzo, 233
archives of old versions, xxvii ISO , xxii
block size in, 142 LinuxJournal, 14,629
chroot (), 277 rnalloc , 629
convention of line ending in , 70 memwat ch, 630
date, 174 mpatrol,629
Epoch in, 157 Notes on Programming in C, 21
file formats in, 8 Open G roup, The, xxii
filesystems in, 119 Opt, 50
ftw (), 261 Plan 9 From Bell Labs, 50
inode numbers in, 231 Popt,50
mounting in , 136 Protection of Data File Contents, 406
preemptive multitasking in, 291 Recommended C Style and Coding Standards, 21
programs in, 18 rsync web pages, xxvii
reading directories in, 132 Southern Srorm Software, xxvii
standard functions in , 75 splint program, 631
time slicing in, 439 Teach YourselfProgramming in Ten Years, 649, 654
t irne_ t type, 166 The Art o/Computer Programming, 480
UNIX Heritage Society, The, see TUHS TUHS , xxvii
Un ix SuppOrt Gtoup, 30 Unicode, 522
unlink (), 126-127, 162,320,364 United Stares Parent and Trademark Office, 406
unref ( ), 595 Valgrind, 629
unsetenv (), 42, 49 WINE Emulator, 629
UPS (un interruptible power supply), 112 XFS,234
URLs yarnd,630
ANSI, xxii usage (), 45, 479
Argp, 50 usbfs filesystem, 234
Argv, 50 user, categoty of users, 5-6, 106- 109,404
AutOopts, 50 databases of, 195-199, 204
ccrnalloc , 629 IDs of, see UID
Comparison between Ext2fs and Xiafs, 236 masks for , 145-146
dbug, 606 names of, 196, 204, 208, 21 7
ddd debugger, 570 passwords of, 196
debugging rules, 633, 637 see also ownership
drnalloc, 619 UTC (Coordinated Universal Time), 157, 179
Electric Fence, 616 UTF-8 encoding, 18,487
Index 687
u timbuf Struct, 157, 160 we r tomb ( ), 524

ut ime () , 157- 161 , 163, 176,364,545 wes r tombs ( ), 524
utime_null () , 160- 16 1 westombs (), 524
ut imes () , 160, 545- 546 ,562 wetomb ( ), 524
wge t program , xxviii
v wildcard ex pansions. 461-471, 48 2
V6 Unix, 469 WINE Emulator, 629
V7 Unix, xx, xxv ii , 14 wo r dexp (), 469-471, 482
cat, 150,99 wo r dfree ( ), 469-47 1
deb uggers in , 570 words
directo ries in , 412 order of, 5 14-5 15, 527
disrribu[ion of, 29 plural forms of, 509, 527
filesystem in, 232 wprin tf () , 52 3
inode numbers in, 120 write end (of pipe), 3 16, 318, 323
Is co mmand, 208- 226 wr ite (), 84 , 96- 99 , 111- 11 3, 11 5, 137,3 17-318,
mapping names to ID numbers in, 195 334-336, 357,364
rand ( ), 457
rmdir, 132 x
signals in, 3 54-355, 358, 365, 367, 379 x, in permissio ns, 7
wait ( ), 306 Xenix filesys tem , 234-23 5
Valgrind debugger, 623-630, 639 XFS filesystem, 234
van der Linden, Peter, xxxi xgett ex t ptogram , 510, 518
va riables x i af s fi lesys rem, 235
for important conditions, 580 xinetd program, manpage of, 379
local, 53 XML (Extensible Markup Language), 602
logging, 602 xreadli nk (), 153, 154, 260
temporary fo r debu gging, 583 xreal loe ( ), 7 2
variadic mac ros , 579-580
y
VAX system, 232
- -verbo se option, 16,27 yarnd library, 630
Ve ritas VxFS jo urna ling fi lesystem, 234
- -versi on option, 16,48 z
ve rsi onsor t (), 188 zeto-initialized data, 53-55
v fat fil esystem , 234-235, 278 zombies, 305, 385
v i edito r, 472 , 574,644
vim editor, 472
vo id * eype, 57, 182, 554
vo la t ile keyword, 14,362,450
w
w, in permissions, 5
wait (), 29 1, 306-310,343, 358, 364,385-386
wait3(), 310- 3 11 ,343
wait4( ), 310- 3 11 ,343
waitpid () , 291, 306-310,343, 364,388,397
Rags fo r, 307
warm fuzzy, 149
watchpoims, 576,638
Watso n, Gray, 619
we program, 435-436
RS INC.
The Most werful Sabers
In The Kn U niverse ™
For more informat ion on specific

models, pricing, and how to order, visit
\NWW. parksabers. corn
~:~~':ii~~ ~~,~%;<~~';~:?':~~,tr~~::}'~~~::~~;~;$~: :":~:~~:'~~~'~ t ~': '

'';P~I=IK.S Sa.ElERS"iJe.>-,' CALL:
,"- -
-:.;~::. ;:<'if;", ,< »:~
,.,.Y;, ,.,. t·
,
- -
/" ):>·w ...
;t~, ~ ~ 'x. ~ ~' m ' ,'"
:j (972) 564·4557
~:h :'4lJ!H!Q#J?owertlll S~ Illl'l!EKt!oVIn Universe ,./,' ;,,;. ' right now to order !
:11i~~;~i~J::~~t;;t1~~;',X<~~~"~~«~',~~~,; :i~~: i<y>~~L:(Y ;,,~~3
Not affiliated with Lucasfilm LTD'· or any Luc3sfilm Ltd'· franchise ,

All designs are property of Parks'· Sabers and / or Jeffrey A. Parks.
LlNUX'
"This is an excellent introduction to linux programming. The topics are well chosen and lucidly presented.
I learned things myself, especially about internationalization, and I've been at this for quite a while."
-CHET RAMEY, COAUTHOR ANO MAINTAINER OF THE BASH SHELL
"This is a good introduction to linux programming. Arnold 's technique of showing how experienced
programmers use the linux programming interfaces is a nice touch, much more useful than the canned
programming examples found in most books."
- ULRICH DREPPER, PROJECT LEAD, GNU C LIBRARY
"A gentle yet thorough introduction to the art of UNIX system programming, Linux Programming by Example
uses code from a wide range of familiar programs to illustrate each concept it teaches. Readers will enjoy
an interesting mix of in-depth API descriptions and portability guidelines, and will come away well prepared
to begin reading and writing systems applications. Heartily recommended."
- JIM MEYERING, COAUTHOR AND MAINTAINER OF THE GNU CORE UTILITY PROGRAMS
Learn Linux programming, hands-on ... from real source code

This book teaches Linux programming in the most effective way possible: by showin g and expl aining well-written
programs. Drawing from both V7 UNIX® and GNU source code, Arnold Robb ins focuses on the fundamental system
ca ll APl s at the core of any si gnifica nt program, presenting examp les from programs that LinuxiUNIX users already
use every day. Gradually, one step at a time, Rob bi ns teaches both high-level principles and "under the hood "
techniques. Along the way, he carefu lly addresses real -wo rld issues like performance, portability, and robustness .
Coverage inclu des:
Just learn in g to program? Switching from Windows®? Already developing with Linux but interested in exploring th e
syste m ca ll interfa ce furth er? No matter whi ch, qui ckly and di rectly, this book will help you master the fun damenta ls
needed to build serious Linux software.
~ Companion Web sites, authors.phptr.com/rohhins and www.linux-hy-example.com. include all code examples.
About the Author
ARNOLD ROBBINS is a professional programmer and instructor, and author of UNIX in a Nutshell, Learning the Korn
Shell, an d Effective awk Programming. A long-time GNUProject volunteer, he currently maintains gawk. He has
worked with C, C++, UNIX, and GNU/Linux sin ce 1980.
o Text printed on recycled paper

111111111111111111111111
9 780131 429642
1IIIIIIIIIIiIIII
:.: PRENTICE HALL
• • P E AR S ON EDUCATI ON
ISBN: 0-13-142964-7
$39.99 U.S. / $49.99 Canada

Linux Programming by Example

Uploaded by

Copyright:

Available Formats

Linux Programming by Example

Uploaded by

Copyright:

Available Formats

:.

Prentice Hall Open Source Software Development Series

((Real world code from real world applications n

Titles currently in the series include:

Linux®Debugging and Performance Tuning: Tips and Techniques

Linux Programming by E..;ample: The Fundamentals

Preface ........................................................................................................... XVII

PART I Files and Users ........................................... . 1

2.1 Option and Argument Co nventions ............................................... ................. . 24

2.2 Basic Command-Line Processing ............................ .......................................... 28

Chapter 4 Files and File I/O ...................................................... ................. 83

6.2.1.1 Example: Sorting Employees .................... ................... ............. ......... . 183

7. 1 V7 ls Options ......................................................... .. ...................................... 208

8. 1 Mouming and Unmounting Fi lesystems ...................................................... .... 228

PART II Processes, IPC, and Internationalization ..... .. 281

10.1 Introduction .......................... ... . ..................................... ................ ...... ............ 348

10.6.3 Managing the Signal Mask: sigpr ocmask () et al . .. .. .......................... . 369

Chapter 12 General Library Interfaces - Part 2 ................... ............... .......... 427

Chapter 13 Internationalization and Localization .......................................... 485

13.1 Introduction ............. ........................................ ... .. .... .............................. ......... 486

PART III Debugging and Final Project ........................ 565

15.1 First T hings First .. ...................................................... .. ............................. ....... 568

PART IV Appendixes ......... ................................... .. ... 647

Appendix B Caldera Ancient UNIX License ..................... ............................... 655

Appendix C GNU General Public License ........................ ............................... 657

Index .......................................................... ....................... ............................. 667

O ne of the best ways to learn about programming is to read well-written pro-

What You Will Learn

Small Is Beautiful: Unix Programs

1. ISO/IEC International Standard 9899: Programming Languages - C, 1990.

Features and Power: GNU Programs

2 http: // www . ansi. o r g

Chapter 4, "Files and File 110," page 83,

Chapter 9, "Process Management and Pipes,"page 283,

Chapter 12, "General Library Interfaces - Part 2,"page 427,

Chapter 15, "Debugging,"page 567,

Where to Get Unix and GNU Source Code

rsync -avz minn ie . t uhs. org : : OA_R oot .

7 http: // www.s ou th ern- storm . com . au/ v7upgrade . htm l

226 File send OK .

O nce you have the file, extract it as follows :

Systems using GNU tar may use this incantation:

Gawk 3.0.6 File Copyright dates

Gawk 3.1.3 File Copyright dates

Gawk 3.1.4 File Copyright dates

GLlBC 2.3.2 File Copyright dates

Make 3.80 File Copyright dates

Where to Get the Example Programs Used in This Book

About the Cover

Chapter 2 Arguments, Options, and the Environment page 23

Chapter 3 User-Level Memory Management page 51

Chapter 4 Files and File I/ O page 83

Chapter 5 Directories and File Metadata page 117

Chapter 6 General Library Interfaces - Part 1 page 165

Chapter 7 Putting It All Together: 15 page 207

Chapter 8 Filesystems and Directory Walks page 227

• 1 .1 T he Linux/ Uni x File Model page 4

• 1.2 The Linux/ Un ix Process Mod el page 10

• 1. 3 Stand a rd C vs. Original C page 12

• 1.4 Why GNU Programs Are Better page 14

• 1 .5 Portability Revisited page 19

• 1.6 Suggested Reading page 20

• Exe rcises page 22

1.1 The LinuxjUnix File Model

1.1.1 Files and Permissions

1.1.2 Directories and Filenames

J NOTE If you have write permission on a directory, you ca n remove fil es in th a t

1.1.3 Executable Files

BEGIN { print "hello, world" }

1.2 The LinuxjUnix Process Model

1.2.1 Pipes: Hooking Processes Together

1.3 Standard C vs. Original C

int getopt(int argc, char const argyl], const char optstring ) ;

c h ar ge t env(cons t char name) ; /50 C: Retrieve environment variable