The Linux Command Line
The Linux Command Line
A LinuxCommand.org Book
This work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License. To view a copy of this license, visit the link
above or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Linux is the registered trademark of Linus Torvalds. All other trademarks belong to
their respective owners.
This book is part of the LinuxCommand.org project, a site for Linux education and advocacy devoted to helping users of legacy operating systems migrate into the future. You
may contact the LinuxCommand.org project at http://linuxcommand.org.
This book is also available in printed form, published by No Starch Press and may be
purchased wherever fine books are sold. No Starch Press also offers this book in electronic formats for most popular e-readers: http://nostarch.com/tlcl.htm
Release History
Version
Date
Description
13.07
July 6, 2013
09.12
09.11
09.10
October 3, 2009
09.08
09.07
Table of Contents
Introduction....................................................................................................xvi
Why Use The Command Line?.....................................................................................xvi
What This Book Is About..............................................................................................xvii
Who Should Read This Book.......................................................................................xvii
What's In This Book.....................................................................................................xviii
How To Read This Book..............................................................................................xviii
Prerequisites............................................................................................................xix
Why I Don't Call It GNU/Linux...........................................................................xix
Acknowledgments..........................................................................................................xx
Your Feedback Is Needed!............................................................................................xx
What's New In The Second Internet Edition.................................................................xxi
Further Reading............................................................................................................xxi
Colophon.......................................................................................................................xxi
2 Navigation...................................................................................................7
Understanding The File System Tree..............................................................................7
The Current Working Directory........................................................................................7
Listing The Contents Of A Directory................................................................................8
Changing The Current Working Directory.......................................................................9
Absolute Pathnames..................................................................................................9
Relative Pathnames...................................................................................................9
Some Helpful Shortcuts............................................................................................11
Important Facts About Filenames........................................................................11
Summing Up..................................................................................................................12
ii
6 Redirection................................................................................................53
Standard Input, Output, And Error.................................................................................53
Redirecting Standard Output.........................................................................................54
Redirecting Standard Error............................................................................................55
Redirecting Standard Output And Standard Error To One File................................56
Disposing Of Unwanted Output................................................................................57
/dev/null In Unix Culture......................................................................................57
Redirecting Standard Input............................................................................................57
cat Concatenate Files............................................................................................57
Pipelines........................................................................................................................59
The Difference Between > and |..........................................................................60
Filters........................................................................................................................61
uniq - Report Or Omit Repeated Lines.....................................................................61
wc Print Line, Word, And Byte Counts..................................................................62
grep Print Lines Matching A Pattern......................................................................62
head / tail Print First / Last Part Of Files................................................................63
tee Read From Stdin And Output To Stdout And Files..........................................64
Summing Up..................................................................................................................65
Linux Is About Imagination..................................................................................65
iii
Modifying Text...........................................................................................................80
Cutting And Pasting (Killing And Yanking) Text........................................................80
The Meta Key......................................................................................................81
Completion....................................................................................................................81
Programmable Completion..................................................................................83
Using History.................................................................................................................83
Searching History.....................................................................................................84
History Expansion.....................................................................................................86
script....................................................................................................................86
Summing Up..................................................................................................................86
Further Reading.............................................................................................................87
9 Permissions..............................................................................................88
Owners, Group Members, And Everybody Else............................................................89
Reading, Writing, And Executing...................................................................................90
chmod Change File Mode.....................................................................................92
What The Heck Is Octal?.....................................................................................93
Setting File Mode With The GUI...............................................................................95
umask Set Default Permissions............................................................................96
Some Special Permissions..................................................................................98
Changing Identities........................................................................................................99
su Run A Shell With Substitute User And Group IDs............................................99
sudo Execute A Command As Another User.......................................................101
Ubuntu And sudo...............................................................................................101
chown Change File Owner And Group................................................................102
chgrp Change Group Ownership.........................................................................103
Exercising Our Privileges............................................................................................103
Changing Your Password............................................................................................106
Summing Up................................................................................................................107
Further Reading..........................................................................................................107
10 Processes.............................................................................................108
How A Process Works.................................................................................................108
Viewing Processes......................................................................................................109
Viewing Processes Dynamically With top..............................................................111
Controlling Processes.................................................................................................113
Interrupting A Process............................................................................................114
Putting A Process In The Background....................................................................114
Returning A Process To The Foreground...............................................................115
Stopping (Pausing) A Process................................................................................116
Signals.........................................................................................................................117
Sending Signals To Processes With kill.................................................................117
Sending Signals To Multiple Processes With killall................................................120
More Process Related Commands.............................................................................120
Summing Up................................................................................................................121
Further Reading..........................................................................................................164
15 Storage Media.......................................................................................176
Mounting And Unmounting Storage Devices..............................................................176
Viewing A List Of Mounted File Systems................................................................178
Why Unmounting Is Important...........................................................................181
Determining Device Names....................................................................................182
Creating New File Systems.........................................................................................185
Manipulating Partitions With fdisk..........................................................................185
Creating A New File System With mkfs..................................................................188
Testing And Repairing File Systems............................................................................189
What The fsck?..................................................................................................189
Formatting Floppy Disks..............................................................................................189
Moving Data Directly To/From Devices.......................................................................190
Creating CD-ROM Images..........................................................................................191
Creating An Image Copy Of A CD-ROM.................................................................191
Creating An Image From A Collection Of Files.......................................................191
A Program By Any Other Name.........................................................................192
Writing CD-ROM Images.............................................................................................192
Mounting An ISO Image Directly............................................................................192
Blanking A Re-Writable CD-ROM...........................................................................193
Writing An Image....................................................................................................193
Summing Up................................................................................................................193
Further Reading..........................................................................................................193
Extra Credit..................................................................................................................193
vi
16 Networking............................................................................................195
Examining And Monitoring A Network.........................................................................196
ping.........................................................................................................................196
traceroute...............................................................................................................197
netstat.....................................................................................................................198
Transporting Files Over A Network..............................................................................199
ftp............................................................................................................................199
lftp A Better ftp.....................................................................................................202
wget........................................................................................................................202
Secure Communication With Remote Hosts...............................................................202
ssh..........................................................................................................................203
Tunneling With SSH..........................................................................................206
scp And sftp............................................................................................................207
An SSH Client For Windows?............................................................................208
Summing Up................................................................................................................208
Further Reading..........................................................................................................208
19 Regular Expressions...........................................................................243
What Are Regular Expressions?............................................................................243
grep.............................................................................................................................243
vii
20 Text Processing....................................................................................264
Applications Of Text.....................................................................................................264
Documents.............................................................................................................265
Web Pages.............................................................................................................265
Email.......................................................................................................................265
Printer Output.........................................................................................................265
Program Source Code............................................................................................265
Revisiting Some Old Friends.......................................................................................265
cat...........................................................................................................................266
MS-DOS Text Vs. Unix Text...............................................................................267
sort..........................................................................................................................267
uniq.........................................................................................................................275
Slicing And Dicing........................................................................................................276
cut...........................................................................................................................276
Expanding Tabs.................................................................................................279
paste.......................................................................................................................280
join..........................................................................................................................281
Comparing Text...........................................................................................................283
comm......................................................................................................................284
diff...........................................................................................................................284
patch.......................................................................................................................287
Editing On The Fly.......................................................................................................288
tr..............................................................................................................................288
ROT13: The Not-So-Secret Decoder Ring........................................................290
sed..........................................................................................................................290
viii
21 Formatting Output................................................................................305
Simple Formatting Tools..............................................................................................305
nl Number Lines..................................................................................................305
fold Wrap Each Line To A Specified Length........................................................309
fmt A Simple Text Formatter................................................................................309
pr Format Text For Printing..................................................................................313
printf Format And Print Data................................................................................314
Document Formatting Systems...................................................................................317
groff.........................................................................................................................318
Summing Up................................................................................................................324
Further Reading..........................................................................................................324
22 Printing..................................................................................................326
A Brief History Of Printing............................................................................................326
Printing In The Dim Times......................................................................................326
Character-based Printers.......................................................................................327
Graphical Printers...................................................................................................328
Printing With Linux......................................................................................................329
Preparing Files For Printing.........................................................................................329
pr Convert Text Files For Printing........................................................................329
Sending A Print Job To A Printer..................................................................................331
lpr Print Files (Berkeley Style).............................................................................331
lp Print Files (System V Style).............................................................................332
Another Option: a2ps..............................................................................................333
Monitoring And Controlling Print Jobs.........................................................................336
lpstat Display Print System Status......................................................................336
lpq Display Printer Queue Status........................................................................337
lprm / cancel Cancel Print Jobs...........................................................................338
Summing Up................................................................................................................338
Further Reading..........................................................................................................338
23 Compiling Programs............................................................................340
What Is Compiling?.....................................................................................................340
Are All Programs Compiled?..................................................................................341
Compiling A C Program...............................................................................................342
Obtaining The Source Code...................................................................................342
Examining The Source Tree...................................................................................344
Building The Program.............................................................................................346
Installing The Program...........................................................................................350
Summing Up................................................................................................................350
Further Reading..........................................................................................................350
25 Starting A Project.................................................................................361
First Stage: Minimal Document...................................................................................361
Second Stage: Adding A Little Data............................................................................363
Variables And Constants.............................................................................................364
Assigning Values To Variables And Constants.......................................................367
Here Documents.........................................................................................................368
Summing Up................................................................................................................371
Further Reading..........................................................................................................371
26 Top-Down Design.................................................................................372
Shell Functions............................................................................................................373
Local Variables............................................................................................................376
Keep Scripts Running..................................................................................................377
Shell Functions In Your .bashrc File..................................................................380
Summing Up................................................................................................................380
Further Reading..........................................................................................................380
IFS..........................................................................................................................402
You Cant Pipe read...........................................................................................404
Validating Input............................................................................................................404
Menus..........................................................................................................................406
Summing Up................................................................................................................407
Extra Credit.............................................................................................................407
Further Reading..........................................................................................................408
30 Troubleshooting...................................................................................416
Syntactic Errors...........................................................................................................416
Missing Quotes.......................................................................................................417
Missing Or Unexpected Tokens..............................................................................417
Unanticipated Expansions......................................................................................418
Logical Errors .............................................................................................................420
Defensive Programming.........................................................................................420
Verifying Input.........................................................................................................422
Design Is A Function Of Time............................................................................422
Testing.........................................................................................................................422
Test Cases..............................................................................................................423
Debugging...................................................................................................................424
Finding The Problem Area......................................................................................424
Tracing....................................................................................................................424
Examining Values During Execution......................................................................427
Summing Up................................................................................................................427
Further Reading..........................................................................................................428
32 Positional Parameters.........................................................................436
Accessing The Command Line...................................................................................436
Determining The Number of Arguments.................................................................437
shift Getting Access To Many Arguments............................................................438
Simple Applications................................................................................................439
Using Positional Parameters With Shell Functions................................................440
Handling Positional Parameters En Masse.................................................................441
xi
35 Arrays....................................................................................................478
What Are Arrays?........................................................................................................478
Creating An Array........................................................................................................478
Assigning Values To An Array......................................................................................479
Accessing Array Elements...........................................................................................480
Array Operations.........................................................................................................482
Outputting The Entire Contents Of An Array..........................................................482
Determining The Number Of Array Elements.........................................................482
Finding The Subscripts Used By An Array.............................................................483
Adding Elements To The End Of An Array.............................................................483
Sorting An Array......................................................................................................484
Deleting An Array....................................................................................................484
Associative Arrays.......................................................................................................485
Summing Up................................................................................................................486
Further Reading..........................................................................................................486
36 Exotica...................................................................................................487
xii
Index..............................................................................................................501
xiii
xiv
To Karen
xv
Introduction
I want to tell you a story.
No, not the story of how, in 1991, Linus Torvalds wrote the first version of the Linux kernel. You can read that story in lots of Linux books. Nor am I going to tell you the story of
how, some years earlier, Richard Stallman began the GNU Project to create a free Unixlike operating system. That's an important story too, but most other Linux books have that
one, as well.
No, I want to tell you the story of how you can take back control of your computer.
When I began working with computers as a college student in the late 1970s, there was a
revolution going on. The invention of the microprocessor had made it possible for ordinary people like you and me to actually own a computer. It's hard for many people today
to imagine what the world was like when only big business and big government ran all
the computers. Let's just say, you couldn't get much done.
Today, the world is very different. Computers are everywhere, from tiny wristwatches to
giant data centers to everything in between. In addition to ubiquitous computers, we also
have a ubiquitous network connecting them together. This has created a wondrous new
age of personal empowerment and creative freedom, but over the last couple of decades
something else has been happening. A few giant corporations have been imposing their
control over most of the world's computers and deciding what you can and cannot do
with them. Fortunately, people from all over the world are doing something about it. They
are fighting to maintain control of their computers by writing their own software. They
are building Linux.
Many people speak of freedom with regard to Linux, but I don't think most people
know what this freedom really means. Freedom is the power to decide what your computer does, and the only way to have this freedom is to know what your computer is doing. Freedom is a computer that is without secrets, one where everything can be known if
you care enough to find out.
down at
the computer, he never touches a mouse? It's because movie makers realize that we, as
human beings, instinctively know the only way to really get anything done on a computer
xvi
is by typing on a keyboard!
Most computer users today are only familiar with the graphical user interface (GUI) and
have been taught by vendors and pundits that the command line interface (CLI) is a terrifying thing of the past. This is unfortunate, because a good command line interface is a
marvelously expressive way of communicating with a computer in much the same way
the written word is for human beings. It's been said that graphical user interfaces make
easy tasks easy, while command line interfaces make difficult tasks possible and this is
still very true today.
Since Linux is modeled after the Unix family of operating systems, it shares the same
rich heritage of command line tools as Unix. Unix came into prominence during the early
1980s (although it was first developed a decade earlier), before the widespread adoption
of the graphical user interface and, as a result, developed an extensive command line interface instead. In fact, one of the strongest reasons early adopters of Linux chose it over,
say, Windows NT was the powerful command line interface which made the difficult
tasks possible.
erage Linux system has literally thousands of programs you can employ on the command
line. Consider yourself warned; learning the command line is not a casual endeavor.
On the other hand, learning the Linux command line is extremely rewarding. If you think
you're a power user now, just wait. You don't know what real power is yet.
And, un like many other computer skills, knowledge of the command line is long lasting. The
skills learned today will still be useful ten years from now. The command line has survived the test of time.
It is also assumed that you have no programming experience, but not to worry, we'll start
you down that path as well.
Part 1 Learning The Shell starts our exploration of the basic language of the
command line including such things as the structure of commands, file system
navigation, command line editing, and finding help and documentation for commands.
Part 3 Common Tasks And Essential Tools explores many of the ordinary
tasks that are commonly performed from the command line. Unix-like operating
systems, such as Linux, contain many classic command line programs that are
used to perform powerful operations on data.
Prerequisites
To use this book, all you will need is a working Linux installation. You can get this in one
of two ways:
1. Install Linux on a (not so new) computer. It doesn't matter which distribution
you choose, though most people today start out with either Ubuntu, Fedora, or
OpenSUSE. If in doubt, try Ubuntu first. Installing a modern Linux distribution
can be ridiculously easy or ridiculously difficult depending on your hardware. I
suggest a desktop computer that is a couple of years old and has at least 256
megabytes of RAM and 6 gigabytes of free hard disk space. Avoid laptops and
wireless networks if at all possible, as these are often more difficult to get working.
2. Use a Live CD. One of the cool things you can do with many Linux distributions is run them directly from a CDROM (or USB flash drive) without installing
them at all. Just go into your BIOS setup and set your computer to Boot from
CDROM, insert the live CD, and reboot. Using a live CD is a great way to test a
computer for Linux compatibility prior to installation. The disadvantage of using
a live CD is that it may be very slow compared to having Linux installed on your
hard drive. Both Ubuntu and Fedora (among others) have live CD versions.
Regardless of how you install Linux, you will need to have occasional superuser (i.e., administrative) privileges to carry out the lessons in this book.
After you have a working installation, start reading and follow along with your own computer. Most of the material in this book is hands on, so sit down and get typing!
xix
Acknowledgments
I want to thank the following people, who helped make this book possible:
Jenny Watson, Acquisitions Editor at Wiley Publishing who originally suggested that I
write a shell scripting book.
John C. Dvorak, noted columnist and pundit. In an episode of his video podcast, Cranky
Geeks, Mr. Dvorak described the process of writing: Hell. Write 200 words a day and
in a year, you have a novel. This advice led me to write a page a day until I had a book.
Dmitri Popov wrote an article in Free Software Magazine titled, Creating a book template with Writer, which inspired me to use OpenOffice.org Writer for composing the
text. As it turned out, it worked wonderfully.
Mark Polesky performed an extraordinary review and test of the text.
Jesse Becker, Tomasz Chrzczonowicz, Michael Levin, Spence Miner also tested and reviewed portions of the text.
Karen M. Shotts contributed a lot of hours, polishing my so-called English by editing the
text.
And lastly, the readers of LinuxCommand.org, who have sent me so many kind emails.
Their encouragement gave me the idea that I was really on to something!
xx
Further Reading
Here are some Wikipedia articles on the famous people mentioned in this chapter:
http://en.wikipedia.org/wiki/Linus_Torvalds
http://en.wikipedia.org/wiki/Richard_Stallman
Colophon
This book was originally written using OpenOffice.org Writer in Liberation Serif and
Sans fonts on a Dell Inspiron 530N, factory configured with Ubuntu 8.04. The PDF version of the text was generated directly by OpenOffice.org Writer. The Second Internet
Edition was produced on the same computer using LibreOffice Writer on Ubuntu 12.04.
xxi
xxii
Terminal Emulators
When using a graphical user interface, we need another program called a terminal emulator to interact with the shell. If we look through our desktop menus, we will probably find
one. KDE uses konsole and GNOME uses gnome-terminal, though it's likely
called simply terminal on our menu. There are a number of other terminal emulators
available for Linux, but they all basically do the same thing; give us access to the shell.
You will probably develop a preference for one or another based on the number of bells
and whistles it has.
This is called a shell prompt and it will appear whenever the shell is ready to accept input. While it may vary in appearance somewhat depending on the distribution, it will usually include your username@machinename, followed by the current working directory
(more about that in a little bit) and a dollar sign.
If the last character of the prompt is a pound sign (#) rather than a dollar sign, the terminal session has superuser privileges. This means either we are logged in as the root
user or we selected a terminal emulator that provides superuser (administrative) privi2
Since this command makes no sense, the shell will tell us so and give us another chance:
bash: kaekfjaeifj: command not found
[me@linuxbox ~]$
Command History
If we press the up-arrow key, we will see that the previous command kaekfjaeifj reappears after the prompt. This is called command history. Most Linux distributions remember the last 500 commands by default. Press the down-arrow key and the previous command disappears.
Cursor Movement
Recall the previous command with the up-arrow key again. Now try the left and right-arrow keys. See how we can position the cursor anywhere on the command line? This
makes editing commands easy.
A related command is cal which, by default, displays a calendar of the current month.
[me@linuxbox ~]$ cal
October 2007
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
To see the current amount of free space on your disk drives, enter df:
[me@linuxbox ~]$ df
Filesystem
/dev/sda2
/dev/sda5
/dev/sda1
1K-blocks
15115452
59631908
147764
256856
256856
0% /dev/shm
Likewise, to display the amount of free memory, enter the free command.
[me@linuxbox ~]$ free
total
used
Mem:
513712
503976
-/+ buffers/cache: 375748
Swap: 1052248
104712
free
9736
137964
947536
shared
0
buffers
5312
cached
122916
Summing Up
As we begin our journey, we are introduced to the shell and see the command line for the
first time and learn how to start and end a terminal session. We also see how to issue
some simple commands and perform a little light command line editing. That wasn't so
scary was it?
Further Reading
To learn more about Steve Bourne, father of the Bourne Shell, see this Wikipedia
article:
http://en.wikipedia.org/wiki/Steve_Bourne
2 Navigation
2 Navigation
The first thing we need to learn (besides just typing) is how to navigate the file system on
our Linux system. In this chapter we will introduce the following commands:
cd - Change directory
2 Navigation
Imagine that the file system is a maze shaped like an upside-down tree and we are able to
When we first log in to our system (or start a terminal emulator session) our current
working directory is set to our home directory. Each user account is given its own home
directory and it is the only place a regular user is allowed to write files.
Pictures
Public
Templates
Videos
Absolute Pathnames
An absolute pathname begins with the root directory and follows the tree branch by
branch until the path to the desired directory or file is completed. For example, there is a
directory on your system in which most of your system's programs are installed. The
pathname of the directory is /usr/bin. This means from the root directory (represented
by the leading slash in the pathname) there is a directory called "usr" which contains a directory called "bin".
[me@linuxbox ~]$ cd /usr/bin
[me@linuxbox bin]$ pwd
/usr/bin
[me@linuxbox bin]$ ls
Now we can see that we have changed the current working directory to /usr/bin and
that it is full of files. Notice how the shell prompt has changed? As a convenience, it is
usually set up to automatically display the name of the working directory.
Relative Pathnames
Where an absolute pathname starts from the root directory and leads to its destination, a
relative pathname starts from the working directory. To do this, it uses a couple of special
symbols to represent relative positions in the file system tree. These special symbols are
"." (dot) and ".." (dot dot).
The "." symbol refers to the working directory and the ".." symbol refers to the working
directory's parent directory. Here is how it works. Let's change the working directory to
9
2 Navigation
/usr/bin again:
[me@linuxbox ~]$ cd /usr/bin
[me@linuxbox bin]$ pwd
/usr/bin
Okay, now let's say that we wanted to change the working directory to the parent of
/usr/bin which is /usr. We could do that two different ways. Either with an absolute
pathname:
[me@linuxbox bin]$ cd /usr
[me@linuxbox usr]$ pwd
/usr
Two different methods with identical results. Which one should we use? The one that
requires the least typing!
Likewise, we can change the working directory from /usr to /usr/bin in two
different ways. Either using an absolute pathname:
[me@linuxbox usr]$ cd /usr/bin
[me@linuxbox bin]$ pwd
/usr/bin
Now, there is something important that I must point out here. In almost all cases, you can
10
does the same thing. In general, if you do not specify a pathname to something, the working directory will be assumed.
Result
Changes the working directory to your home directory.
cd -
cd ~user_name
11
2 Navigation
Summing Up
In this chapter we saw how the shell treats the directory structure of the system. We
learned about absolute and relative pathnames and the basic commands that are used to
move about that structure. In the next chapter we will use this knowledge to go on a tour
of a modern Linux system.
12
Pictures
Public
Templates
Videos
Besides the current working directory, we can specify the directory to list, like so:
me@linuxbox ~]$ ls /usr
bin games
kerberos libexec
etc include lib
local
sbin
share
src
tmp
Or even specify multiple directories. In this example we will list both the user's home directory (symbolized by the ~ character) and the /usr directory:
[me@linuxbox ~]$ ls ~ /usr
/home/me:
13
Documents
/usr:
bin games
etc include
Music
kerberos
lib
Pictures
libexec
local
Public
sbin
share
Templates
Videos
src
tmp
We can also change the format of the output to reveal more detail:
[me@linuxbox
total 56
drwxrwxr-x 2
drwxrwxr-x 2
drwxrwxr-x 2
drwxrwxr-x 2
drwxrwxr-x 2
drwxrwxr-x 2
drwxrwxr-x 2
~]$ ls -l
me
me
me
me
me
me
me
me
me
me
me
me
me
me
4096
4096
4096
4096
4096
4096
4096
2007-10-26
2007-10-26
2007-10-26
2007-10-26
2007-10-26
2007-10-26
2007-10-26
17:20
17:20
17:20
17:20
17:20
17:20
17:20
Desktop
Documents
Music
Pictures
Public
Templates
Videos
Most commands use options consisting of a single character preceded by a dash, for example, -l, but many commands, including those from the GNU Project, also support
long options, consisting of a word preceded by two dashes. Also, many commands allow
multiple short options to be strung together. In this example, the ls command is given
two options, the l option to produce long format output, and the t option to sort the
result by the file's modification time.
[me@linuxbox ~]$ ls -lt
14
Long Option
--all
Description
-A
--almost-all
-d
--directory
-F
--classify
-h
--human-readable
-l
-r
-S
-t
15
1
1
1
1
1
1
1
1
1
1
1
root
root
root
root
root
root
root
root
root
root
root
Let's look at the different fields from one of the files and examine their meanings:
Table 3-2: ls Long Listing Fields
Field
-rw-r--r--
Meaning
root
root
32059
2007-04-03 11:05
oo-cd-cover.odf
16
When invoked, the file command will print a brief description of the file's contents.
For example:
[me@linuxbox ~]$ file picture.jpg
picture.jpg: JPEG image data, JFIF standard 1.01
There are many kinds of files. In fact, one of the common ideas in Unix-like operating
systems such as Linux is that everything is a file. As we proceed with our lessons, we
will see just how true that statement is.
While many of the files on your system are familiar, for example MP3 and JPEG, there
are many kinds that are a little less obvious and a few that are quite strange.
What Is Text?
There are many ways to represent information on a computer. All methods involve defining a relationship between the information and some numbers that will
be used to represent it. Computers, after all, only understand numbers and all data
is converted to numeric representation.
Some of these representation systems are very complex (such as compressed
video files), while others are rather simple. One of the earliest and simplest is
called ASCII text. ASCII (pronounced "As-Key") is short for American Standard
17
Code for Information Interchange. This is a simple encoding scheme that was first
used on Teletype machines to map keyboard characters to numbers.
Text is a simple one-to-one mapping of characters to numbers. It is very compact.
Fifty characters of text translates to fifty bytes of data. It is important to understand that text only contains a simple mapping of characters to numbers. It is not
the same as a word processor document such as one created by Microsoft Word or
OpenOffice.org Writer. Those files, in contrast to simple ASCII text, contain
many non-text elements that are used to describe its structure and formatting.
Plain ASCII text files contain only the characters themselves and a few rudimentary control codes like tabs, carriage returns and line feeds.
Throughout a Linux system, many files are stored in text format and there are
many Linux tools that work with text files. Even Windows recognizes the importance of this format. The well-known NOTEPAD.EXE program is an editor for
plain ASCII text files.
Why would we want to examine text files? Because many of the files that contain system
settings (called configuration files) are stored in this format, and being able to read them
gives us insight about how the system works. In addition, many of the actual programs
that the system uses (called scripts) are stored in this format. In later chapters, we will
learn how to edit text files in order to modify systems settings and write our own scripts,
but for now we will just look at their contents.
The less command is used like this:
less filename
Once started, the less program allows you to scroll forward and backward through a
text file. For example, to examine the file that defines all the system's user accounts, enter
the following command:
[me@linuxbox ~]$ less /etc/passwd
Once the less program starts, we can view the contents of the file. If the file is longer
than one page, we can scroll up and down. To exit less, press the q key.
The table below lists the most common keyboard commands used by less.
18
Action
Page Up or b
Up Arrow
Down Arrow
1G or g
/characters
Quit less
Less Is More
The less program was designed as an improved replacement of an earlier Unix
program called more. The name less is a play on the phrase less is morea
motto of modernist architects and designers.
less falls into the class of programs called pagers, programs that allow the
easy viewing of long text documents in a page by page manner. Whereas the
more program could only page forward, the less program allows paging both
forward and backward and has many other features as well.
A Guided Tour
The file system layout on your Linux system is much like that found on other Unix-like
systems. The design is actually specified in a published standard called the Linux Filesystem Hierarchy Standard. Not all Linux distributions conform to the standard exactly but
most come pretty close.
Next, we are going to wander around the file system ourselves to see what makes our
Linux system tick. This will give you a chance to practice your navigation skills. One of
the things we will discover is that many of the interesting files are in plain human-read able text. As we go about our tour, try the following:
19
Comments
/bin
/boot
/dev
20
A Guided Tour
Directory
Comments
/etc
/home
/lib
/lost+found
/media
/mnt
/opt
21
Comments
/proc
/root
/sbin
/tmp
/usr
/usr/bin
/usr/lib
/usr/local
/usr/sbin
/usr/share
/usr/share/doc
22
A Guided Tour
Directory
Comments
/var
/var/log
Symbolic Links
As we look around, we are likely to see a directory listing with an entry like this:
lrwxrwxrwx 1 root root
Notice how the first letter of the listing is l and the entry seems to have two filenames?
This is a special kind of a file called a symbolic link (also known as a soft link or symlink.) In most Unix-like systems it is possible to have a file referenced by multiple names.
While the value of this may not be obvious, it is really a useful feature.
Picture this scenario: A program requires the use of a shared resource of some kind contained in a file named foo, but foo has frequent version changes. It would be good to
include the version number in the filename so the administrator or other interested party
could see what version of foo is installed. This presents a problem. If we change the
name of the shared resource, we have to track down every program that might use it and
change it to look for a new resource name every time a new version of the resource is installed. That doesn't sound like fun at all.
Here is where symbolic links save the day. Let's say we install version 2.6 of foo,
which has the filename foo-2.6 and then create a symbolic link simply called foo that
points to foo-2.6. This means that when a program opens the file foo, it is actually
opening the file foo-2.6. Now everybody is happy. The programs that rely on foo can
find it and we can still see what actual version is installed. When it is time to upgrade to
foo-2.7, we just add the file to our system, delete the symbolic link foo and create a
new one that points to the new version. Not only does this solve the problem of the version upgrade, but it also allows us to keep both versions on our machine. Imagine that
foo-2.7 has a bug (damn those developers!) and we need to revert to the old version.
Again, we just delete the symbolic link pointing to the new version and create a new
23
Hard Links
While we are on the subject of links, we need to mention that there is a second type of
link called a hard link. Hard links also allow files to have multiple names, but they do it
in a different way. Well talk more about the differences between symbolic and hard links
in the next chapter.
Summing Up
With our tour behind us, we have learned a lot about our system. We've seen various files
and directories and their contents. One thing you should take away from this is how open
the system is. In Linux there are many important files that are plain human-readable text.
Unlike many proprietary systems, Linux makes everything available for examination and
study.
Further Reading
24
The full version of the Linux Filesystem Hierarchy Standard can be found here:
http://www.pathname.com/fhs/
These five commands are among the most frequently used Linux commands. They are
used for manipulating both files and directories.
Now, to be frank, some of the tasks performed by these commands are more easily done
with a graphical file manager. With a file manager, we can drag and drop a file from one
directory to another, cut and paste files, delete files, etc. So why use these old command
line programs?
The answer is power and flexibility. While it is easy to perform simple file manipulations
with a graphical file manager, complicated tasks can be easier with the command line
programs. For example, how could we copy all the HTML files from one directory to another, but only copy files that do not exist in the destination directory or are newer than
the versions in the destination directory? Pretty hard with a file manager. Pretty easy with
the command line:
cp -u *.html destination
Wildcards
Before we begin using our commands, we need to talk about a shell feature that makes
these commands so powerful. Since the shell uses filenames so much, it provides special
characters to help you rapidly specify groups of filenames. These special characters are
25
Meaning
[characters]
[!characters]
[[:class:]]
Meaning
[:alnum:]
[:alpha:]
[:digit:]
[:lower:]
[:upper:]
Using wildcards makes it possible to construct very sophisticated selection criteria for
filenames. Here are some examples of patterns and what they match:
Table 4-3: Wildcard Examples
Pattern
Matches
All files
g*
b*.txt
26
Wildcards
Data???
[abc]*
BACKUP.[0-9][0-9][0-9]
[[:upper:]]*
[![:digit:]]*
*[[:lower:]123]
Wildcards can be used with any command that accepts filenames as arguments, but well
talk more about that in Chapter 7.
Character Ranges
If you are coming from another Unix-like environment or have been reading
some other books on this subject, you may have encountered the [A-Z] or the
[a-z] character range notations. These are traditional Unix notations and
worked in older versions of Linux as well. They can still work, but you have to be
very careful with them because they will not produce the expected results unless
properly configured. For now, you should avoid using them and use character
classes instead.
27
Many ideas originally found in the command line interface make their way into
the graphical interface, too. It is one of the many things that make the Linux desktop so powerful.
A note on notation: When three periods follow an argument in the description of a command (as above), it means that the argument can be repeated, thus:
mkdir dir1
to copy the single file or directory item1 to file or directory item2 and:
cp item... directory
28
Meaning
-a, --archive
-i, --interactive
-r, --recursive
-u, --update
-v, --verbose
Results
cp file1 file2
cp -i file1 file2
cp dir1/* dir2
29
Meaning
-u, --update
-v, --verbose
30
Results
mv file1 file2
mv -i file1 file2
mv dir1 dir2
Meaning
-i, --interactive
-f, --force
-v, --verbose
Results
rm file1
rm -i file1
rm -r file1 dir1
32
files that will be deleted. Then press the up arrow key to recall the command and
replace the ls with rm.
ln Create Links
The ln command is used to create either hard or symbolic links. It is used in one of two
ways:
ln file link
Hard Links
Hard links are the original Unix way of creating links, compared to symbolic links, which
are more modern. By default, every file has a single hard link that gives the file its name.
When we create a hard link, we create an additional directory entry for a file. Hard links
have two important limitations:
1. A hard link cannot reference a file outside its own file system. This means a link
cannot reference a file that is not on the same disk partition as the link itself.
2. A hard link may not reference a directory.
A hard link is indistinguishable from the file itself. Unlike a symbolic link, when you list
a directory containing a hard link you will see no special indication of the link. When a
hard link is deleted, the link is removed but the contents of the file itself continue to exist
(that is, its space is not deallocated) until all links to the file are deleted.
It is important to be aware of hard links because you might encounter them from time to
time, but modern practice prefers symbolic links, which we will cover next.
Symbolic Links
Symbolic links were created to overcome the limitations of hard links. Symbolic links
work by creating a special type of file that contains a text pointer to the referenced file or
33
Creating Directories
The mkdir command is used to create a directory. To create our playground directory we
will first make sure we are in our home directory and will then create the new directory:
[me@linuxbox ~]$ cd
[me@linuxbox ~]$ mkdir playground
To make our playground a little more interesting, let's create a couple of directories inside
it called dir1 and dir2. To do this, we will change our current working directory to
playground and execute another mkdir:
[me@linuxbox ~]$ cd playground
[me@linuxbox playground]$ mkdir dir1 dir2
Notice that the mkdir command will accept multiple arguments allowing us to create
both directories with a single command.
Copying Files
Next, let's get some data into our playground. We'll do this by copying a file. Using the
34
Notice how we used the shorthand for the current working directory, the single trailing
period. So now if we perform an ls, we will see our file:
[me@linuxbox
total 12
drwxrwxr-x 2
drwxrwxr-x 2
-rw-r--r-- 1
playground]$ ls -l
me
me
me
Now, just for fun, let's repeat the copy using the -v option (verbose) to see what it does:
[me@linuxbox playground]$ cp -v /etc/passwd .
`/etc/passwd' -> `./passwd'
The cp command performed the copy again, but this time displayed a concise message
indicating what operation it was performing. Notice that cp overwrote the first copy
without any warning. Again this is a case of cp assuming that you know what youre are
doing. To get a warning, we'll include the -i (interactive) option:
[me@linuxbox playground]$ cp -i /etc/passwd .
cp: overwrite `./passwd'?
Responding to the prompt by entering a y will cause the file to be overwritten, any
other character (for example, n) will cause cp to leave the file alone.
35
to finally bring it back to the current working directory. Next, let's see the effect of mv on
directories. First we will move our data file into dir1 again:
[me@linuxbox playground]$ mv fun dir1
me
Note that since dir2 already existed, mv moved dir1 into dir2. If dir2 had not existed, mv would have renamed dir1 to dir2. Lastly, let's put everything back:
[me@linuxbox playground]$ mv dir2/dir1 .
[me@linuxbox playground]$ mv dir1/fun .
36
So now we have four instances of the file fun. Let's take a look our playground directory:
[me@linuxbox
total 16
drwxrwxr-x 2
drwxrwxr-x 2
-rw-r--r-- 4
-rw-r--r-- 4
playground]$ ls -l
me
me
me
me
me
me
me
me
4096
4096
1650
1650
2008-01-14
2008-01-14
2008-01-10
2008-01-10
16:17
16:17
16:33
16:33
dir1
dir2
fun
fun-hard
One thing you notice is that the second field in the listing for fun and fun-hard both
contain a 4 which is the number of hard links that now exist for the file. You'll remember that a file will aways have at least one link because the file's name is created by a
link. So, how do we know that fun and fun-hard are, in fact, the same file? In this
case, ls is not very helpful. While we can see that fun and fun-hard are both the
same size (field 5), our listing provides no way to be sure. To solve this problem, we're
going to have to dig a little deeper.
When thinking about hard links, it is helpful to imagine that files are made up of two
parts: the data part containing the file's contents and the name part which holds the file's
name. When we create hard links, we are actually creating additional name parts that all
refer to the same data part. The system assigns a chain of disk blocks to what is called an
inode, which is then associated with the name part. Each hard link therefore refers to a
specific inode containing the file's contents.
The ls command has a way to reveal this information. It is invoked with the -i option:
[me@linuxbox playground]$ ls -li
total 16
12353539 drwxrwxr-x 2 me
me
4096 2008-01-14 16:17 dir1
12353540 drwxrwxr-x 2 me
me
4096 2008-01-14 16:17 dir2
12353538 -rw-r--r-- 4 me
me
1650 2008-01-10 16:33 fun
37
me
In this version of the listing, the first field is the inode number and, as we can see, both
fun and fun-hard share the same inode number, which confirms they are the same
file.
The first example is pretty straightforward, we simply add the -s option to create a
symbolic link rather than a hard link. But what about the next two? Remember, when we
create a symbolic link, we are creating a text description of where the target file is relative to the symbolic link. It's easier to see if we look at the ls output:
[me@linuxbox playground]$ ls -l dir1
total 4
-rw-r--r-- 4 me
me
1650 2008-01-10 16:33 fun-hard
lrwxrwxrwx 1 me
me
6 2008-01-15 15:17 fun-sym -> ../fun
The listing for fun-sym in dir1 shows that it is a symbolic link by the leading l in
the first field and that it points to ../fun, which is correct. Relative to the location of
fun-sym, fun is in the directory above it. Notice too, that the length of the symbolic
link file is 6, the number of characters in the string ../fun rather than the length of the
file to which it is pointing.
When creating symbolic links, you can either use absolute pathnames:
ln -s /home/me/playground/fun dir1/fun-sym
38
me
me
me
me
me
me
4096
4
4096
1650
1650
3
2008-01-15
2008-01-16
2008-01-15
2008-01-10
2008-01-10
2008-01-15
15:17
14:45
15:17
16:33
16:33
15:15
dir1
dir1-sym -> dir1
dir2
fun
fun-hard
fun-sym -> fun
playground]$ rm fun-hard
playground]$ ls -l
me
me
me
me
me
me
me
me
me
me
4096
4
4096
1650
3
2008-01-15
2008-01-16
2008-01-15
2008-01-10
2008-01-15
15:17
14:45
15:17
16:33
15:15
dir1
dir1-sym -> dir1
dir2
fun
fun-sym -> fun
That worked as expected. The file fun-hard is gone and the link count shown for fun
is reduced from four to three, as indicated in the second field of the directory listing.
Next, we'll delete the file fun, and just for enjoyment, we'll include the -i option to
show what that does:
[me@linuxbox playground]$ rm -i fun
rm: remove regular file `fun'?
Enter y at the prompt and the file is deleted. But let's look at the output of ls now. Noticed what happened to fun-sym? Since it's a symbolic link pointing to a now-nonexistent file, the link is broken:
39
playground]$ ls -l
me
me
me
me
me
me
me
me
4096
4
4096
3
2008-01-15
2008-01-16
2008-01-15
2008-01-15
15:17
14:45
15:17
15:15
dir1
dir1-sym -> dir1
dir2
fun-sym -> fun
Most Linux distributions configure ls to display broken links. On a Fedora box, broken
links are displayed in blinking red text! The presence of a broken link is not, in and of itself dangerous but it is rather messy. If we try to use a broken link we will see this:
[me@linuxbox playground]$ less fun-sym
fun-sym: No such file or directory
me
me
One thing to remember about symbolic links is that most file operations are carried out
on the link's target, not the link itself. rm is an exception. When you delete a link, it is the
link that is deleted, not the target.
Finally, we will remove our playground. To do this, we will return to our home directory
and use rm with the recursive option (-r) to delete playground and all of its contents, including its subdirectories:
[me@linuxbox playground]$ cd
[me@linuxbox ~]$ rm -r playground
40
while dragging a file will create a link rather than copying (or moving) the file. In
KDE, a small menu appears whenever a file is dropped, offering a choice of copying, moving, or linking the file.
Summing Up
We've covered a lot of ground here and it will take a while to fully sink in. Perform the
playground exercise over and over until it makes sense. It is important to get a good understanding of basic file manipulation commands and wildcards. Feel free to expand on
the playground exercise by adding more files and directories, using wildcards to specify
files for various operations. The concept of links is a little confusing at first, but take the
time to learn how they work. They can be a real lifesaver.
Further Reading
41
42
Identifying Commands
Identifying Commands
It is often useful to know exactly which of the four kinds of commands is being used and
Linux provides a couple of ways to find out.
where command is the name of the command you want to examine. Here are some examples:
[me@linuxbox ~]$ type type
type is a shell builtin
[me@linuxbox ~]$ type ls
ls is aliased to `ls --color=tty'
[me@linuxbox ~]$ type cp
cp is /bin/cp
Here we see the results for three different commands. Notice that the one for ls (taken
from a Fedora system) and how the ls command is actually an alias for the ls command
with the -- color=tty option added. Now we know why the output from ls is displayed
in color!
which only works for executable programs, not builtins nor aliases that are substitutes
for actual executable programs. When we try to use which on a shell builtin, for example, cd, we either get no response or an error message:
43
The variable CDPATH defines the search path for the directory
containing DIR. Alternative directory names in CDPATH are separated
by a colon (:). A null directory name is the same as the current
directory. If DIR begins with a slash (/), then CDPATH is not used.
If the directory is not found, and the shell option `cdable_vars' is
set, the word is assumed to be a variable name. If that variable
has a value, its value is used for DIR.
Options:
-L
force symbolic links to be followed
-P
use the physical directory structure without following symbolic
links
-e
if the -P option is supplied, and the current working directory
cannot be determined successfully, exit with a non-zero status
The default is to follow symbolic links, as if `-L' were specified.
Exit Status:
Returns 0 if the directory is changed, and if $PWD is set
successfully when -P is used; non-zero otherwise.
44
Some programs don't support the --help option, but try it anyway. Often it results in an
error message that will reveal the same usage information.
On most Linux systems, man uses less to display the manual page, so all of the familiar
less commands work while displaying the page.
The manual that man displays is broken into sections and not only covers user commands but also system administration commands, programming interfaces, file formats
and more. The table below describes the layout of the manual:
Table 5-1: Man Page Organization
Section
Contents
User commands
File formats
Miscellaneous
Sometimes we need to look in a specific section of the manual to find what we are looking for. This is particularly true if we are looking for a file format that is also the name of
a command. Without specifying a section number, we will always get the first instance of
a match, probably in section 1. To specify a section number, we use man like this:
46
For example:
[me@linuxbox ~]$ man 5 passwd
This will display the man page describing the file format of the /etc/passwd file.
The first field in each line of output is the name of the man page, the second field shows
the section. Note that the man command with the -k option performs the exact same
function as apropos.
47
Node: ls invocation,
The `ls' program lists information about files (of any type,
including directories). Options and file arguments can be intermixed
arbitrarily, as usual.
48
The info program reads info files, which are tree structured into individual nodes, each
containing a single topic. Info files contain hyperlinks that can move you from node to
node. A hyperlink can be identified by its leading asterisk, and is activated by placing the
cursor upon it and pressing the enter key.
To invoke info, type info followed optionally by the name of a program. Below is a
table of commands used to control the reader while displaying an info page:
Table 5-2: info Commands
Command
Action
PgUp or Backspace
PgDn or Space
Enter
Quit
Most of the command line programs we have discussed so far are part of the GNU
Project's coreutils package, so typing:
[me@linuxbox ~]$ info coreutils
will display a menu page with hyperlinks to each program contained in the coreutils
package.
share
src
tmp
As we can see, we have combined three commands on one line. First we change directory
to /usr then list the directory and finally return to the original directory (by using 'cd
-') so we end up where we started. Now let's turn this sequence into a new command using alias. The first thing we have to do is dream up a name for our new command.
Let's try test. Before we do that, it would be a good idea to find out if the name test is
already being used. To find out, we can use the type command again:
[me@linuxbox ~]$ type test
test is a shell builtin
50
After the command alias we give alias a name followed immediately (no whitespace allowed) by an equals sign, followed immediately by a quoted string containing the meaning to be assigned to the name. After we define our alias, it can be used anywhere the
shell would expect a command. Let's try it:
[me@linuxbox ~]$ foo
bin games
kerberos
etc include lib
/home/me
[me@linuxbox ~]$
lib64
libexec
local
sbin
share
src
tmp
We can also use the type command again to see our alias:
[me@linuxbox ~]$ type foo
foo is aliased to `cd /usr; ls ; cd -'
While we purposefully avoided naming our alias with an existing command name, it is
not uncommon to do so. This is often done to apply a commonly desired option to each
invocation of a common command. For instance, we saw earlier how the ls command is
often aliased to add color support:
51
To see all the aliases defined in the environment, use the alias command without arguments. Here are some of the aliases defined by default on a Fedora system. Try and figure
out what they all do:
[me@linuxbox
alias l.='ls
alias ll='ls
alias ls='ls
~]$ alias
-d .* --color=tty'
-l --color=tty'
--color=tty'
There is one tiny problem with defining aliases on the command line. They vanish when
your shell session ends. In a later chapter, we will see how to add our own aliases to the
files that establish the environment each time we log on, but for now, enjoy the fact that
we have taken our first, albeit tiny, step into the world of shell programming!
Summing Up
Now that we have learned how to find the documentation for commands, go and look up
the documentation for all the commands we have encountered so far. Study what additional options are available and try them out!
Further Reading
There are many online sources of documentation for Linux and the command line. Here
are some of the best:
52
The Bash Reference Manual is a reference guide to the bash shell. Its still a reference work but contains examples and is easier to read than the bash man page.
http://www.gnu.org/software/bash/manual/bashref.html
The Bash FAQ contains answers to frequently asked questions regarding bash.
This list is aimed at intermediate to advanced users, but contains a lot of good information.
http://mywiki.wooledge.org/BashFAQ
The GNU Project provides extensive documentation for its programs, which form
the core of the Linux command line experience. You can see a complete list here:
http://www.gnu.org/manual/manual.html
6 Redirection
6 Redirection
In this lesson we are going to unleash what may be the coolest feature of the command
line. It's called I/O redirection. The I/O stands for input/output and with this facility
you can redirect the input and output of commands to and from files, as well as connect
multiple commands together into powerful command pipelines. To show off this facility,
we will introduce the following commands:
tee - Read from standard input and write to standard output and files
53
6 Redirection
I/O redirection allows us to change where output goes and where input comes from. Normally, output goes to the screen and input comes from the keyboard, but with I/O redirection, we can change that.
Here, we created a long listing of the /usr/bin directory and sent the results to the file
ls-output.txt. Let's examine the redirected output of the command:
[me@linuxbox ~]$ ls -l ls-output.txt
-rw-rw-r-- 1 me
me
167878 2008-02-01 15:07 ls-output.txt
Good; a nice, large, text file. If we look at the file with less, we will see that the file
ls-output.txt does indeed contain the results from our ls command:
[me@linuxbox ~]$ less ls-output.txt
Now, let's repeat our redirection test, but this time with a twist. We'll change the name of
the directory to one that does not exist:
[me@linuxbox ~]$ ls -l /bin/usr > ls-output.txt
ls: cannot access /bin/usr: No such file or directory
We received an error message. This makes sense since we specified the non-existent directory /bin/usr, but why was the error message displayed on the screen rather than
being redirected to the file ls-output.txt? The answer is that the ls program does
not send its error messages to standard output. Instead, like most well-written Unix programs, it sends its error messages to standard error. Since we only redirected standard
output and not standard error, the error message was still sent to the screen. We'll see how
54
The file now has zero length! This is because, when we redirect output with the > redirection operator, the destination file is always rewritten from the beginning. Since our ls
command generated no results and only an error message, the redirection operation
started to rewrite the file and then stopped because of the error, resulting in its truncation.
In fact, if we ever need to actually truncate a file (or create a new, empty file) we can use
a trick like this:
[me@linuxbox ~]$ > ls-output.txt
Simply using the redirection operator with no command preceding it will truncate an existing file or create a new, empty file.
So, how can we append redirected output to a file instead of overwriting the file from the
beginning? For that, we use the >> redirection operator, like so:
[me@linuxbox ~]$ ls -l /usr/bin >> ls-output.txt
Using the >> operator will result in the output being appended to the file. If the file
does not already exist, it is created just as though the > operator had been used. Let's
put it to the test:
[me@linuxbox
[me@linuxbox
[me@linuxbox
[me@linuxbox
-rw-rw-r-- 1
~]$
~]$
~]$
~]$
me
ls -l
ls -l
ls -l
ls -l
me
We repeated the command three times resulting in an output file three times as large.
6 Redirection
standard error we must refer to its file descriptor. A program can produce output on any
of several numbered file streams. While we have referred to the first three of these file
streams as standard input, output and error, the shell references them internally as file descriptors 0, 1 and 2, respectively. The shell provides a notation for redirecting files using
the file descriptor number. Since standard error is the same as file descriptor number 2,
we can redirect standard error with this notation:
[me@linuxbox ~]$ ls -l /bin/usr 2> ls-error.txt
The file descriptor 2 is placed immediately before the redirection operator to perform
the redirection of standard error to the file ls-error.txt.
Using this method, we perform two redirections. First we redirect standard output to the
file ls-output.txt and then we redirect file descriptor 2 (standard error) to file descriptor one (standard output) using the notation 2>&1.
Notice that the order of the redirections is significant. The redirection of standard error must always occur after redirecting standard output or it doesn't work. In
the example above,
>ls-output.txt 2>&1
redirects standard error to the file ls-output.txt, but if the order is changed to
2>&1 >ls-output.txt
standard error is directed to the screen.
Recent versions of bash provide a second, more streamlined method for performing this
56
In this example, we use the single notation &> to redirect both standard output and standard error to the file ls-output.txt. You may also append the standard output and
standard error streams to a single file like so:
[me@linuxbox ~]$ ls -l /bin/usr &>> ls-output.txt
6 Redirection
cat [file...]
In most cases, you can think of cat as being analogous to the TYPE command in DOS.
You can use it to display files without paging, for example:
[me@linuxbox ~]$ cat ls-output.txt
will display the contents of the file ls-output.txt. cat is often used to display short
text files. Since cat can accept more than one file as an argument, it can also be used to
join files together. Say we have downloaded a large file that has been split into multiple
parts (multimedia files are often split this way on Usenet), and we want to join them back
together. If the files were named:
movie.mpeg.001 movie.mpeg.002 ... movie.mpeg.099
we could join them back together with this command:
cat movie.mpeg.0* > movie.mpeg
Since wildcards always expand in sorted order, the arguments will be arranged in the correct order.
This is all well and good, but what does this have to do with standard input? Nothing yet,
but let's try something else. What happens if we enter cat with no arguments:
[me@linuxbox ~]$ cat
Nothing happens, it just sits there like it's hung. It may seem that way, but it's really doing
exactly what it's supposed to.
If cat is not given any arguments, it reads from standard input and since standard input
is, by default, attached to the keyboard, it's waiting for us to type something! Try adding
the following text and pressing Enter:
[me@linuxbox ~]$ cat
The quick brown fox jumped over the lazy dog.
Next, type a Ctrl-d (i.e., hold down the Ctrl key and press d) to tell cat that it has
58
In the absence of filename arguments, cat copies standard input to standard output, so
we see our line of text repeated. We can use this behavior to create short text files. Let's
say that we wanted to create a file called lazy_dog.txt containing the text in our example. We would do this:
[me@linuxbox ~]$ cat > lazy_dog.txt
The quick brown fox jumped over the lazy dog.
Type the command followed by the text we want in to place in the file. Remember to type
Ctrl-d at the end. Using the command line, we have implemented the world's dumbest
word processor! To see our results, we can use cat to copy the file to stdout again:
[me@linuxbox ~]$ cat lazy_dog.txt
The quick brown fox jumped over the lazy dog.
Now that we know how cat accepts standard input, in addition to filename arguments,
let's try redirecting standard input:
[me@linuxbox ~]$ cat < lazy_dog.txt
The quick brown fox jumped over the lazy dog.
Using the < redirection operator, we change the source of standard input from the keyboard to the file lazy_dog.txt. We see that the result is the same as passing a single
filename argument. This is not particularly useful compared to passing a filename argument, but it serves to demonstrate using a file as a source of standard input. Other commands make better use of standard input, as we shall soon see.
Before we move on, check out the man page for cat, as it has several interesting options.
Pipelines
The ability of commands to read data from standard input and send to standard output is
59
6 Redirection
utilized by a shell feature called pipelines. Using the pipe operator | (vertical bar), the
standard output of one command can be piped into the standard input of another:
command1 | command2
To fully demonstrate this, we are going to need some commands. Remember how we said
there was one we already knew that accepts standard input? It's less. We can use less
to display, page-by-page, the output of any command that sends its results to standard
output:
[me@linuxbox ~]$ ls -l /usr/bin | less
This is extremely handy! Using this technique, we can conveniently examine the output
of any command that produces standard output.
60
Pipelines
The first command put him in the directory where most programs are stored and
the second command told the shell to overwrite the file less with the output of
the ls command. Since the /usr/bin directory already contained a file named
less (the less program), the second command overwrote the less program
file with the text from ls thus destroying the less program on his system.
The lesson here is that the redirection operator silently creates or overwrites files,
so you need to treat it with a lot of respect.
Filters
Pipelines are often used to perform complex operations on data. It is possible to put several commands together into a pipeline. Frequently, the commands used this way are referred to as filters. Filters take input, change it somehow and then output it. The first one
we will try is sort. Imagine we wanted to make a combined list of all of the executable
programs in /bin and /usr/bin, put them in sorted order and view it:
[me@linuxbox ~]$ ls /bin /usr/bin | sort | less
Since we specified two directories (/bin and /usr/bin), the output of ls would have
consisted of two sorted lists, one for each directory. By including sort in our pipeline,
we changed the data to produce a single, sorted list.
In this example, we use uniq to remove any duplicates from the output of the sort
command. If we want to see the list of duplicates instead, we add the -d option to uniq
like so:
61
6 Redirection
[me@linuxbox ~]$ ls /bin /usr/bin | sort | uniq -d | less
In this case it prints out three numbers: lines, words, and bytes contained in ls-output.txt. Like our previous commands, if executed without command line arguments,
wc accepts standard input. The -l option limits its output to only report lines. Adding it
to a pipeline is a handy way to count things. To see the number of items we have in our
sorted list, we can do this:
[me@linuxbox ~]$ ls /bin /usr/bin | sort | uniq | wc -l
2728
When grep encounters a pattern in the file, it prints out the lines containing it. The
patterns that grep can match can be very complex, but for now we will concentrate on
simple text matches. We'll cover the advanced patterns, called regular expressions in a
later chapter.
Let's say we wanted to find all the files in our list of programs that had the word zip
embedded in the name. Such a search might give us an idea of some of the programs on
our system that had something to do with file compression. We would do this:
[me@linuxbox ~]$ ls /bin /usr/bin | sort | uniq | grep zip
62
Pipelines
bunzip2
bzip2
gunzip
gzip
unzip
zip
zipcloak
zipgrep
zipinfo
zipnote
zipsplit
There are a couple of handy options for grep: -i which causes grep to ignore case
when performing the search (normally searches are case sensitive) and -v which tells
grep to only print lines that do not match the pattern.
08:58
13:39
14:27
20:16
[
411toppm
a2p
a52dec
10:56
04:21
12:23
12:23
05:22
znew
zonetab2pot.py
zonetab2pot.pyc
zonetab2pot.pyo
zsoelim -> soelim
63
6 Redirection
zsoelim
tail has an option which allows you to view files in real-time. This is useful for watching the progress of log files as they are being written. In the following example, we will
look at the messages file in /var/log (or the /var/log/syslog file if messages is missing). Superuser privileges are required to do this on some Linux distributions, since the /var/log/messages file may contain security information:
[me@linuxbox ~]$ tail -f /var/log/messages
Feb 8 13:40:05 twin4 dhclient: DHCPACK from 192.168.1.1
Feb 8 13:40:05 twin4 dhclient: bound to 192.168.1.4 -- renewal in
1652 seconds.
Feb 8 13:55:32 twin4 mountd[3953]: /var/NFSv4/musicbox exported to
both 192.168.1.0/24 and twin7.localdomain in
192.168.1.0/24,twin7.localdomain
Feb 8 14:07:37 twin4 dhclient: DHCPREQUEST on eth0 to 192.168.1.1
port 67
Feb 8 14:07:37 twin4 dhclient: DHCPACK from 192.168.1.1
Feb 8 14:07:37 twin4 dhclient: bound to 192.168.1.4 -- renewal in
1771 seconds.
Feb 8 14:09:56 twin4 smartd[3468]: Device: /dev/hda, SMART
Prefailure Attribute: 8 Seek_Time_Performance changed from 237 to 236
Feb 8 14:10:37 twin4 mountd[3953]: /var/NFSv4/musicbox exported to
both 192.168.1.0/24 and twin7.localdomain in
192.168.1.0/24,twin7.localdomain
Feb 8 14:25:07 twin4 sshd(pam_unix)[29234]: session opened for user
me by (uid=0)
Feb 8 14:25:36 twin4 su(pam_unix)[29279]: session opened for user
root by me(uid=500)
Using the -f option, tail continues to monitor the file and when new lines are appended, they immediately appear on the display. This continues until you type Ctrl-c.
64
Pipelines
[me@linuxbox ~]$ ls /usr/bin | tee ls.txt | grep zip
bunzip2
bzip2
gunzip
gzip
unzip
zip
zipcloak
zipgrep
zipinfo
zipnote
zipsplit
Summing Up
As always, check out the documentation of each of the commands we have covered in
this chapter. We have only seen their most basic usage. They all have a number of interesting options. As we gain Linux experience, we will see that the redirection feature of
the command line is extremely useful for solving specialized problems. There are many
commands that make use of standard input and output, and almost all command line programs use standard error to display their informative messages.
65
6 Redirection
that you have your own ideas of what to make. You don't ever have to go back to
the store, as you already have everything you need. The Erector Set takes on the
shape of your imagination. It does what you want.
Your choice of toys is, of course, a personal thing, so which toy would you find
more satisfying?
66
Expansion
Each time you type a command line and press the enter key, bash performs several processes upon the text before it carries out your command. We have seen a couple of cases
of how a simple character sequence, for example *, can have a lot of meaning to the
shell. The process that makes this happen is called expansion. With expansion, you enter
something and it is expanded into something else before the shell acts upon it. To demonstrate what we mean by this, let's take a look at the echo command. echo is a shell
builtin that performs a very simple task. It prints out its text arguments on standard output:
[me@linuxbox ~]$ echo this is a test
this is a test
That's pretty straightforward. Any argument passed to echo gets displayed. Let's try another example:
[me@linuxbox ~]$ echo *
Desktop Documents ls-output.txt Music Pictures Public Templates
Videos
So what just happened? Why didn't echo print *? As you recall from our work with
wildcards, the * character means match any characters in a filename, but what we didn't
see in our original discussion was how the shell does that. The simple answer is that the
shell expands the * into something else (in this instance, the names of the files in the
67
Pathname Expansion
The mechanism by which wildcards work is called pathname expansion. If we try some
of the techniques that we employed in our earlier chapters, we will see that they are really
expansions. Given a home directory that looks like this:
[me@linuxbox ~]$ ls
Desktop
ls-output.txt
Documents Music
Pictures
Public
Templates
Videos
and:
[me@linuxbox ~]$ echo *s
Documents Pictures Templates Videos
or even:
[me@linuxbox ~]$ echo [[:upper:]]*
Desktop Documents Music Pictures Public Templates Videos
68
Expansion
Tilde Expansion
As you may recall from our introduction to the cd command, the tilde character (~) has
a special meaning. When used at the beginning of a word, it expands into the name of the
home directory of the named user, or if no user is named, the home directory of the current user:
[me@linuxbox ~]$ echo ~
/home/me
69
Arithmetic Expansion
The shell allows arithmetic to be performed by expansion. This allow us to use the shell
prompt as a calculator:
[me@linuxbox ~]$ echo $((2 + 2))
4
Description
Addition
Subtraction
Multiplication
**
Exponentiation
Spaces are not significant in arithmetic expressions and expressions may be nested. For
example, to multiply 5 squared by 3:
[me@linuxbox ~]$ echo $(($((5**2)) * 3))
75
Single parentheses may be used to group multiple subexpressions. With this technique,
we can rewrite the example above and get the same result using a single expansion instead of two:
70
Expansion
[me@linuxbox ~]$ echo $(((5**2) * 3))
75
Here is an example using the division and remainder operators. Notice the effect of integer division:
[me@linuxbox ~]$ echo Five divided by two equals $((5/2))
Five divided by two equals 2
[me@linuxbox ~]$ echo with $((5%2)) left over.
with 1 left over.
Brace Expansion
Perhaps the strangest expansion is called brace expansion. With it, you can create multiple text strings from a pattern containing braces. Here's an example:
[me@linuxbox ~]$ echo Front-{A,B,C}-Back
Front-A-Back Front-B-Back Front-C-Back
Patterns to be brace expanded may contain a leading portion called a preamble and a
trailing portion called a postscript. The brace expression itself may contain either a
comma-separated list of strings, or a range of integers or single characters. The pattern
may not contain embedded whitespace. Here is an example using a range of integers:
[me@linuxbox ~]$ echo Number_{1..5}
Number_1 Number_2 Number_3 Number_4 Number_5
So what is this good for? The most common application is to make lists of files or directories to be created. For example, if we were photographers and had a large collection of
images that we wanted to organize into years and months, the first thing we might do is
create a series of directories named in numeric Year-Month format. This way, the directory names will sort in chronological order. We could type out a complete list of directories, but that's a lot of work and it's error-prone too. Instead, we could do this:
[me@linuxbox ~]$ mkdir Photos
[me@linuxbox ~]$ cd Photos
[me@linuxbox Photos]$ mkdir {2007..2009}-{01..12}
[me@linuxbox Photos]$ ls
2007-01 2007-07 2008-01 2008-07 2009-01 2009-07
2007-02 2007-08 2008-02 2008-08 2009-02 2009-08
2007-03 2007-09 2008-03 2008-09 2009-03 2009-09
2007-04 2007-10 2008-04 2008-10 2009-04 2009-10
2007-05 2007-11 2008-05 2008-11 2009-05 2009-11
2007-06 2007-12 2008-06 2008-12 2009-06 2009-12
Pretty slick!
Parameter Expansion
We're only going to touch briefly on parameter expansion in this chapter, but we'll be
covering it extensively later. It's a feature that is more useful in shell scripts than directly
on the command line. Many of its capabilities have to do with the system's ability to store
small chunks of data and to give each chunk a name. Many such chunks, more properly
called variables, are available for your examination. For example, the variable named
USER contains your username. To invoke parameter expansion and reveal the contents
of USER you would do this:
[me@linuxbox ~]$ echo $USER
72
Expansion
me
You may have noticed that with other types of expansion, if you mistype a pattern, the
expansion will not take place and the echo command will simply display the mistyped
pattern. With parameter expansion, if you misspell the name of a variable, the expansion
will still take place, but will result in an empty string:
[me@linuxbox ~]$ echo $SUER
[me@linuxbox ~]$
Command Substitution
Command substitution allows us to use the output of a command as an expansion:
[me@linuxbox ~]$ echo $(ls)
Desktop Documents ls-output.txt Music Pictures Public Templates
Videos
73
In this example, the results of the pipeline became the argument list of the file command.
There is an alternate syntax for command substitution in older shell programs which is
also supported in bash. It uses back-quotes instead of the dollar sign and parentheses:
[me@linuxbox ~]$ ls -l `which cp`
-rwxr-xr-x 1 root root 71516 2007-12-05 08:58 /bin/cp
Quoting
Now that we've seen how many ways the shell can perform expansions, it's time to learn
how we can control it. Take for example:
[me@linuxbox ~]$ echo this is a
this is a test
test
or:
[me@linuxbox ~]$ echo The total is $100.00
The total is 00.00
In the first example, word-splitting by the shell removed extra whitespace from the echo
command's list of arguments. In the second example, parameter expansion substituted an
empty string for the value of $1 because it was an undefined variable. The shell provides a mechanism called quoting to selectively suppress unwanted expansions.
74
Quoting
Double Quotes
The first type of quoting we will look at is double quotes. If you place text inside double
quotes, all the special characters used by the shell lose their special meaning and are
treated as ordinary characters. The exceptions are $, \ (backslash), and ` (backquote). This means that word-splitting, pathname expansion, tilde expansion, and brace
expansion are suppressed, but parameter expansion, arithmetic expansion, and command
substitution are still carried out. Using double quotes, we can cope with filenames containing embedded spaces. Say we were the unfortunate victim of a file called
two words.txt. If we tried to use this on the command line, word-splitting would
cause this to be treated as two separate arguments rather than the desired single argument:
[me@linuxbox ~]$ ls -l two words.txt
ls: cannot access two: No such file or directory
ls: cannot access words.txt: No such file or directory
By using double quotes, we stop the word-splitting and get the desired result; further, we
can even repair the damage:
[me@linuxbox ~]$ ls -l "two words.txt"
-rw-rw-r-- 1 me
me
18 2008-02-20 13:03 two words.txt
[me@linuxbox ~]$ mv "two words.txt" two_words.txt
There! Now we don't have to keep typing those pesky double quotes.
Remember, parameter expansion, arithmetic expansion, and command substitution still
take place within double quotes:
[me@linuxbox ~]$ echo "$USER $((2+2)) $(cal)"
me 4
February 2008
Su Mo Tu We Th Fr Sa
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29
We should take a moment to look at the effect of double quotes on command substitution.
First let's look a little deeper at how word splitting works. In our earlier example, we saw
how word-splitting appears to remove extra spaces in our text:
75
test
By default, word-splitting looks for the presence of spaces, tabs, and newlines (linefeed
characters) and treats them as delimiters between words. This means that unquoted spaces, tabs, and newlines are not considered to be part of the text. They only serve as separators. Since they separate the words into different arguments, our example command line
contains a command followed by four distinct arguments. If we add double quotes:
[me@linuxbox ~]$ echo "this is a
this is a
test
test"
word-splitting is suppressed and the embedded spaces are not treated as delimiters, rather
they become part of the argument. Once the double quotes are added, our command line
contains a command followed by a single argument.
The fact that newlines are considered delimiters by the word-splitting mechanism causes
an interesting, albeit subtle, effect on command substitution. Consider the following:
[me@linuxbox ~]$ echo $(cal)
February 2008 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
[me@linuxbox ~]$ echo "$(cal)"
February 2008
Su Mo Tu We Th Fr Sa
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29
In the first instance, the unquoted command substitution resulted in a command line containing 38 arguments. In the second, a command line with one argument that includes the
embedded spaces and newlines.
Single Quotes
If we need to suppress all expansions, we use single quotes. Here is a comparison of unquoted, double quotes, and single quotes:
76
Quoting
[me@linuxbox ~]$ echo text ~/*.txt {a,b} $(echo foo) $((2+2)) $USER
text /home/me/ls-output.txt a b foo 4 me
[me@linuxbox ~]$ echo "text ~/*.txt {a,b} $(echo foo) $((2+2)) $USER"
text ~/*.txt {a,b} foo 4 me
[me@linuxbox ~]$ echo 'text ~/*.txt {a,b} $(echo foo) $((2+2)) $USER'
text ~/*.txt {a,b} $(echo foo) $((2+2)) $USER
As we can see, with each succeeding level of quoting, more and more of the expansions
are suppressed.
Escaping Characters
Sometimes we only want to quote a single character. To do this, we can precede a character with a backslash, which in this context is called the escape character. Often this is
done inside double quotes to selectively prevent an expansion:
[me@linuxbox ~]$ echo "The balance for user $USER is: \$5.00"
The balance for user me is: $5.00
To allow a backslash character to appear, escape it by typing \\. Note that within single
quotes, the backslash loses its special meaning and is treated as an ordinary character.
77
Escape Sequence
Meaning
\a
\b
Backspace
\n
\r
Carriage return
\t
Tab
The table above lists some of the common backslash escape sequences. The idea
behind this representation using the backslash originated in the C programming
language and has been adopted by many others, including the shell.
Adding the -e option to echo will enable interpretation of escape sequences.
You may also place them inside $' '. Here, using the sleep command, a simple program that just waits for the specified number of seconds and then exits, we
can create a primitive countdown timer:
sleep 10; echo -e "Time's up\a"
Summing Up
As we move forward with using the shell, we will find that expansions and quoting will
be used with increasing frequency, so it makes sense to get a good understanding of the
way they work. In fact, it could be argued that they are the most important subjects to
learn about the shell. Without a proper understanding of expansion, the shell will always
be a source of mystery and confusion, and much of it potential power wasted.
Further Reading
78
The bash man page has major sections on both expansion and quoting which
cover these topics in a more formal manner.
The Bash Reference Manual also contains chapters on expansion and quoting:
http://www.gnu.org/software/bash/manual/bashref.html
Cursor Movement
The following table lists the keys used to move the cursor:
79
Action
Ctrl-a
Ctrl-e
Ctrl-f
Move cursor forward one character; same as the right arrow key.
Ctrl-b
Move cursor backward one character; same as the left arrow key.
Alt-f
Alt-b
Ctrl-l
Clear the screen and move the cursor to the top left corner. The
clear command does the same thing.
Modifying Text
Table 8-2 lists keyboard commands that are used to edit characters on the command line.
Table 8-2: Text Editing Commands
Key
Action
Ctrl-d
Ctrl-t
Alt-t
Transpose the word at the cursor location with the one preceding it.
Alt-l
Convert the characters from the cursor location to the end of the
word to lowercase.
Alt-u
Convert the characters from the cursor location to the end of the
word to uppercase.
80
Action
Ctrl-k
Ctrl-u
Kill text from the cursor location to the beginning of the line.
Alt-d
Kill text from the cursor location to the end of the current word.
AltBackspace
Kill text from the cursor location to the beginning of the current
word. If the cursor is at the beginning of a word, kill the previous
word.
Ctrl-y
Yank text from the kill-ring and insert it at the cursor location.
Completion
Another way that the shell can help you is through a mechanism called completion. Completion occurs when you press the tab key while typing a command. Let's see how this
works. Given a home directory that looks like this:
81
Pictures
Public
Templates
Videos
Try typing the following but don't press the Enter key:
[me@linuxbox ~]$ ls l
See how the shell completed the line for you? Let's try another one. Again, don't press
Enter:
[me@linuxbox ~]$ ls D
Press tab:
[me@linuxbox ~]$ ls D
No completion, just a beep. This happened because D matches more than one entry in
the directory. For completion to be successful, the clue you give it has to be unambiguous. If we go further:
[me@linuxbox ~]$ ls Do
Completion
completion will also work on variables (if the beginning of the word is a $), user names
(if the word begins with ~), commands (if the word is the first word on the line.) and
hostnames (if the beginning of the word is @). Hostname completion only works for
hostnames listed in /etc/hosts.
There are a number of control and meta key sequences that are associated with completion:
Table 8-4: Completion Commands
Key
Action
Alt-?
Alt-*
Insert all possible completions. This is useful when you want to use
more than one possible match.
There quite a few more that I find rather obscure. You can see a list in the bash man
page under READLINE.
Programmable Completion
Recent versions of bash have a facility called programmable completion. Programmable completion allows you (or more likely, your distribution provider) to
add additional completion rules. Usually this is done to add support for specific
applications. For example it is possible to add completions for the option list of a
command or match particular file types that an application supports. Ubuntu has a
fairly large set defined by default. Programmable completion is implemented by
shell functions, a kind of mini shell script that we will cover in later chapters. If
you are curious, try:
set | less
and see if you can find them. Not all distributions include them by default.
Using History
As we discovered in Chapter 1, bash maintains a history of commands that have been
entered. This list of commands is kept in your home directory in a file called
.bash_history. The history facility is a useful resource for reducing the amount of
typing you have to do, especially when combined with command line editing.
83
Searching History
At any time, we can view the contents of the history list by:
[me@linuxbox ~]$ history | less
By default, bash stores the last 500 commands you have entered. We will see how to adjust this value in a later chapter. Let's say we want to find the commands we used to list
/usr/bin. One way we could do this:
[me@linuxbox ~]$ history | grep /usr/bin
And let's say that among our results we got a line containing an interesting command like
this:
88
The number 88 is the line number of the command in the history list. We could use this
immediately using another type of expansion called history expansion. To use our discovered line we could do this:
[me@linuxbox ~]$ !88
bash will expand !88 into the contents of the eighty-eighth line in the history list.
There are other forms of history expansion that we will cover a little later.
bash also provides the ability to search the history list incrementally. This means that we
can tell bash to search the history list as we enter characters, with each additional character further refining our search. To start incremental search press Ctrl-r followed by
the text you are looking for. When you find it, you can either press Enter to execute the
command or press Ctrl-j to copy the line from the history list to the current command
line. To find the next occurrence of the text (moving up the history list), press Ctrl-r
again. To quit searching, press either Ctrl-g or Ctrl-c. Here we see it in action:
[me@linuxbox ~]$
84
Using History
(reverse-i-search)`':
The prompt changes to indicate that we are performing a reverse incremental search. It is
reverse because we are searching from now to some time in the past. Next, we start
typing our search text. In this example /usr/bin:
(reverse-i-search)`/usr/bin': ls -l /usr/bin > ls-output.txt
Immediately, the search returns our result. With our result, we can execute the command
by pressing Enter, or we can copy the command to our current command line for further editing by pressing Ctrl-j. Let's copy it. Press Ctrl-j:
[me@linuxbox ~]$ ls -l /usr/bin > ls-output.txt
Our shell prompt returns and our command line is loaded and ready for action!
The table below lists some of the keystrokes used to manipulate the history list:
Table 8-5: History Commands
Key
Ctrl-p
Action
Ctrl-n
Move to the next history entry. Same action as the down arrow.
Alt-<
Alt->
Move to the end (bottom) of the history list, i.e., the current
command line.
Ctrl-r
Alt-p
Alt-n
Ctrl-o
Execute the current item in the history list and advance to the next
one. This is handy if you are trying to re-execute a sequence of
commands in the history list.
85
History Expansion
The shell offers a specialized type of expansion for items in the history list by using the
! character. We have already seen how the exclamation point can be followed by a
number to insert an entry from the history list. There are a number of other expansion features:
Table 8-6: History Expansion Commands
Sequence
Action
!!
!number
!string
!?string
I would caution against using the !string and !?string forms unless you are absolutely
sure of the contents of the history list items.
There are many more elements available in the history expansion mechanism, but this
subject is already too arcane and our heads may explode if we continue. The HISTORY
EXPANSION section of the bash man page goes into all the gory details. Feel free to
explore!
script
In addition to the command history feature in bash, most Linux distributions include a program called script that can be used to record an entire shell session
and store it in a file. The basic syntax of the command is:
script [file]
where file is the name of the file used for storing the recording. If no file is specified, the file typescript is used. See the script man page for a complete
list of the programs options and features.
Summing Up
In this chapter we have covered some of the keyboard tricks that the shell provides to
help hardcore typists reduce their workloads. I suspect that as time goes by and you become more involved with the command line, you will refer back to this chapter to pick up
86
Summing Up
more of these tricks. For now, consider them optional and potentially helpful.
Further Reading
87
9 Permissions
9 Permissions
Operating systems in the Unix tradition differ from those in the MS-DOS tradition in that
they are not only multitasking systems, but also multi-user systems, as well.
What exactly does this mean? It means that more than one person can be using the computer at the same time. While a typical computer will likely have only one keyboard and
monitor, it can still be used by more than one user. For example, if a computer is attached
to a network or the Internet, remote users can log in via ssh (secure shell) and operate
the computer. In fact, remote users can execute graphical applications and have the
graphical output appear on a remote display. The X Window System supports this as part
of its basic design.
The multiuser capability of Linux is not a recent "innovation," but rather a feature that is
deeply embedded into the design of the operating system. Considering the environment in
which Unix was created, this makes perfect sense. Years ago, before computers were
"personal," they were large, expensive, and centralized. A typical university computer
system, for example, consisted of a large central computer located in one building and
terminals which were located throughout the campus, each connected to the large central
computer. The computer would support many users at the same time.
In order to make this practical, a method had to be devised to protect the users from each
other. After all, the actions of one user could not be allowed to crash the computer, nor
could one user interfere with the files belonging to another user.
In this chapter we are going to look at this essential part of system security and introduce
the following commands:
88
9 Permissions
The reason for this error message is that, as regular users, we do not have permission to
read this file.
In the Unix security model, a user may own files and directories. When a user owns a file
or directory, the user has control over its access. Users can, in turn, belong to a group
consisting of one or more users who are given access to files and directories by their
owners. In addition to granting access to a group, an owner may also grant some set of
access rights to everybody, which in Unix terms is referred to as the world. To find out information about your identity, use the id command:
[me@linuxbox ~]$ id
uid=500(me) gid=500(me) groups=500(me)
Let's look at the output. When user accounts are created, users are assigned a number
called a user ID or uid which is then, for the sake of the humans, mapped to a username.
The user is assigned a primary group ID or gid and may belong to additional groups. The
above example is from a Fedora system. On other systems, such as Ubuntu, the output
may look a little different:
[me@linuxbox ~]$ id
uid=1000(me) gid=1000(me)
groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),44(v
ideo),46(plugdev),108(lpadmin),114(admin),1000(me)
As we can see, the uid and gid numbers are different. This is simply because Fedora starts
its numbering of regular user accounts at 500, while Ubuntu starts at 1000. We can also
89
9 Permissions
see that the Ubuntu user belongs to a lot more groups. This has to do with the way
Ubuntu manages privileges for system devices and services.
So where does this information come from? Like so many things in Linux, from a couple
of text files. User accounts are defined in the /etc/passwd file and groups are defined
in the /etc/group file. When user accounts and groups are created, these files are
modified along with /etc/shadow which holds information about the user's password.
For each user account, the /etc/passwd file defines the user (login) name, uid, gid,
the account's real name, home directory, and login shell. If you examine the contents of
/etc/passwd and /etc/group, you will notice that besides the regular user accounts, there are accounts for the superuser (uid 0) and various other system users.
In the next chapter, when we cover processes, you will see that some of these other
users are, in fact, quite busy.
While many Unix-like systems assign regular users to a common group such as users,
modern Linux practice is to create a unique, single-member group with the same name as
the user. This makes certain types of permission assignment easier.
The first ten characters of the listing are the file attributes. The first of these characters is
the file type. Here are the file types you are most likely to see (there are other, less common types too):
Table 9-1: File Types
Attribute
-
File Type
A directory.
A symbolic link. Notice that with symbolic links, the remaining file
attributes are always rwxrwxrwx and are dummy values. The real
file attributes are those of the file the symbolic link points to.
90
A regular file.
A block special file. This file type refers to a device that handles
data in blocks, such as a hard drive or CD-ROM drive.
The remaining nine characters of the file attributes, called the file mode, represent the
read, write, and execute permissions for the file's owner, the file's group owner, and
everybody else:
Owner
Group
World
rwx
rwx
rwx
When set, the r, w, and x mode attributes have the following effect on files and directories:
Table 9-2: Permission Attributes
Attribute
Files
Directories
Allows a directory to be
entered, e.g., cd directory.
91
9 Permissions
Table 9-3: Permission Attribute Examples
File Attributes
Meaning
-rwx------
-rw-------
-rw-r--r--
-rwxr-xr-x
-rw-rw----
lrwxrwxrwx
drwxrwx---
drwxr-x---
92
Binary
File Mode
93
9 Permissions
0
000
---
001
--x
010
-w-
011
-wx
100
r--
101
r-x
110
rw-
111
rwx
By using three octal digits, we can set the file mode for the owner, group owner, and
world:
[me@linuxbox
[me@linuxbox
-rw-rw-r-- 1
[me@linuxbox
[me@linuxbox
-rw------- 1
~]$
~]$
me
~]$
~]$
me
> foo.txt
ls -l foo.txt
me
0 2008-03-06 14:52 foo.txt
chmod 600 foo.txt
ls -l foo.txt
me
0 2008-03-06 14:52 foo.txt
By passing the argument 600, we were able to set the permissions of the owner to read
and write while removing all permissions from the group owner and world. Though remembering the octal to binary mapping may seem inconvenient, you will usually only
have to use a few common ones: 7 (rwx), 6 (rw-), 5 (r-x), 4 (r--), and 0 (---).
chmod also supports a symbolic notation for specifying file modes. Symbolic notation is
divided into three parts: who the change will affect, which operation will be performed,
and what permission will be set. To specify who is affected, a combination of the characters u, g, o, and a is used as follows:
Table 9-5: chmod Symbolic Notation
Symbol
Meaning
Group owner.
94
Meaning
u+x
u-x
+x
o-rw
Remove the read and write permission from anyone besides the
owner and group owner.
go=rw
Set the group owner and anyone besides the owner to have read and
write permission. If either the group owner or world previously had
execute permissions, they are removed.
u+x,go=rx
Add execute permission for the owner and set the permissions for
the group and others to read and execute. Multiple specifications
may be separated by commas.
Some people prefer to use octal notation, some folks really like the symbolic. Symbolic
notation does offer the advantage of allowing you to set a single attribute without disturbing any of the others.
Take a look at the chmod man page for more details and a list of options. A word of caution regarding the --recursive option: it acts on both files and directories, so it's not as
useful as one would hope since, we rarely want files and directories to have the same permissions.
95
9 Permissions
~]$ rm -f foo.txt
~]$ umask
~]$ > foo.txt
~]$ ls -l foo.txt
me
me
0 2008-03-06 14:53 foo.txt
We first removed any old copy of foo.txt to make sure we were starting fresh. Next,
we ran the umask command without an argument to see the current value. It responded
96
~]$
~]$
~]$
~]$
me
rm foo.txt
umask 0000
> foo.txt
ls -l foo.txt
me
0 2008-03-06 14:58 foo.txt
When we set the mask to 0000 (effectively turning it off), we see that the file is now
world writable. To understand how this works, we have to look at octal numbers again. If
we take the mask and expand it into binary, and then compare it to the attributes we can
see what happens:
Original file mode
Mask
Result
Ignore for the moment the leading zeros (we'll get to those in a minute) and observe that
where the 1 appears in our mask, an attribute was removedin
this case, the world write
permission. That's what the mask does. Everywhere a 1 appears in the binary value of the
mask, an attribute is unset. If we look at a mask value of 0022, we can see what it does:
Original file mode
Mask
Result
Again, where a 1 appears in the binary value, the corresponding attribute is unset. Play
with some values (try some sevens) to get used to how this works. When you're done, remember to clean up:
97
9 Permissions
[me@linuxbox ~]$ rm foo.txt; umask 0002
Most of the time you won't have to change the mask; the default provided by your distribution will be fine. In some high-security situations, however, you will want to control it.
98
When viewing the output from ls, you can determine the special permissions.
Here are some examples. First, a program that is setuid:
-rwsr-xr-x
A directory that has the setgid attribute:
drwxrwsr-x
A directory with the sticky bit set:
drwxrwxrwt
Changing Identities
At various times, we may find it necessary to take on the identity of another user. Often
we want to gain superuser privileges to carry out some administrative task, but it is also
possible to become another regular user for such things as testing an account. There are
three ways to take on an alternate identity:
1. Log out and log back in as the alternate user.
2. Use the su command.
3. Use the sudo command.
We will skip the first technique since we know how to do it and it lacks the convenience
of the other two. From within our own shell session, the su command allows you to assume the identity of another user, and either start a new shell session with that user's IDs,
or to issue a single command as that user. The sudo command allows an administrator to
set up a configuration file called /etc/sudoers, and define specific commands that
particular users are permitted to execute under an assumed identity. The choice of which
command to use is largely determined by which Linux distribution you use. Your distribution probably includes both commands, but its configuration will favor either one or
the other. We'll start with su.
If the -l option is included, the resulting shell session is a login shell for the specified
user. This means that the user's environment is loaded and the working directory is
99
9 Permissions
changed to the user's home directory. This is usually what we want. If the user is not
specified, the superuser is assumed. Notice that (strangely) the -l may be abbreviated
-, which is how it is most often used. To start a shell for the superuser, we would do
this:
[me@linuxbox ~]$ su Password:
[root@linuxbox ~]#
After entering the command, we are prompted for the superuser's password. If it is successfully entered, a new shell prompt appears indicating that this shell has superuser privileges (the trailing # rather than a $) and the current working directory is now the
home directory for the superuser (normally /root.) Once in the new shell, we can carry
out commands as the superuser. When finished, enter exit to return to the previous
shell:
[root@linuxbox ~]# exit
[me@linuxbox ~]$
It is also possible to execute a single command rather than starting a new interactive command by using su this way:
su -c 'command'
Using this form, a single command line is passed to the new shell for execution. It is important to enclose the command in quotes, as we do not want expansion to occur in our
shell, but rather in the new shell:
[me@linuxbox ~]$ su -c 'ls -l /root/*'
Password:
-rw------- 1 root root
754 2007-08-11 03:19 /root/anaconda-ks.cfg
/root/Mail:
total 0
[me@linuxbox ~]$
100
Changing Identities
After entering the command, we are prompted for our password (not the superuser's) and
once the authentication is complete, the specified command is carried out. One important
difference between su and sudo is that sudo does not start a new shell, nor does it load
another user's environment. This means that commands do not need to be quoted any differently than they would be without using sudo. Note that this behavior can be overridden by specifying various options. See the sudo man page for details.
To see what privileges are granted by sudo, use the -l option to list them:
[me@linuxbox ~]$ sudo -l
User me may run the following commands on this host:
(ALL) ALL
101
9 Permissions
user to have the same abilities. This is desirable in most cases, but it also permits
malware (malicious software) such as viruses to have free reign of the computer.
In the Unix world, there has always been a larger division between regular users
and administrators, owing to the multiuser heritage of Unix. The approach taken
in Unix is to grant superuser privileges only when needed. To do this, the su and
sudo commands are commonly used.
Up until a few of years ago, most Linux distributions relied on su for this purpose. su didn't require the configuration that sudo required, and having a root
account is traditional in Unix. This introduced a problem. Users were tempted to
operate as root unnecessarily. In fact, some users operated their systems as the
root user exclusively, since it does away with all those annoying permission denied messages. This is how you reduce the security of a Linux system to that of a
Windows system. Not a good idea.
When Ubuntu was introduced, its creators took a different tack. By default,
Ubuntu disables logins to the root account (by failing to set a password for the account), and instead uses sudo to grant superuser privileges. The initial user account is granted full access to superuser privileges via sudo and may grant similar powers to subsequent user accounts.
chown can change the file owner and/or the file group owner depending on the first argument of the command. Here are some examples:
Table 9-7: chown Argument Examples
Argument
bob
Results
bob:users
Changes the ownership of the file from its current owner to user
bob and changes the file group owner to group users.
102
Changes the ownership of the file from its current owner to user
bob.
Changing Identities
:admins
Changes the group owner to the group admins. The file owner is
unchanged.
bob:
Change the file owner from the current owner to user bob and
changes the group owner to the login group of user bob.
Let's say that we have two users; janet, who has access to superuser privileges and
tony, who does not. User janet wants to copy a file from her home directory to the
home directory of user tony. Since user janet wants tony to be able to edit the file,
janet changes the ownership of the copied file from janet to tony:
[janet@linuxbox ~]$
Password:
[janet@linuxbox ~]$
-rw-r--r-- 1 root
[janet@linuxbox ~]$
[janet@linuxbox ~]$
-rw-r--r-- 1 tony
Here we see user janet copy the file from her directory to the home directory of user
tony. Next, janet changes the ownership of the file from root (a result of using
sudo) to tony. Using the trailing colon in the first argument, janet also changed the
group ownership of the file to the login group of tony, which happens to be group
tony.
Notice that after the first use of sudo, janet was not prompted for her password? This
is because sudo, in most configurations, trusts you for several minutes until its timer
runs out.
9 Permissions
music files as Ogg Vorbis or MP3. User bill has access to superuser privileges via
sudo.
The first thing that needs to happen is creating a group that will have both bill and
karen as members. Using the graphical user management tool, bill creates a group
called music and adds users bill and karen to it:
Since bill is manipulating files outside his home directory, superuser privileges are required. After the directory is created, it has the following ownerships and permissions:
[bill@linuxbox ~]$ ls -ld /usr/local/share/Music
drwxr-xr-x 2 root root 4096 2008-03-21 18:05 /usr/local/share/Music
As we can see, the directory is owned by root and has 755 permissions. To make this
directory sharable, bill needs to change the group ownership and the group permissions
to allow writing:
104
So what does this all mean? It means that we now have a directory,
/usr/local/share/Music that is owned by root and allows read and write access to group music. Group music has members bill and karen, thus bill and
karen can create files in directory /usr/local/share/Music. Other users can list
the contents of the directory but cannot create files there.
But we still have a problem. With the current permissions, files and directories created
within the Music directory will have the normal permissions of the users bill and
karen:
[bill@linuxbox ~]$ > /usr/local/share/Music/test_file
[bill@linuxbox ~]$ ls -l /usr/local/share/Music
-rw-r--r-- 1 bill
bill
0 2008-03-24 20:03 test_file
Actually there are two problems. First, the default umask on this system is 0022 which
prevents group members from writing files belonging to other members of the group.
This would not be a problem if the shared directory only contained files, but since this directory will store music, and music is usually organized in a hierarchy of artists and albums, members of the group will need the ability to create files and directories inside directories created by other members. We need to change the umask used by bill and
karen to 0002 instead.
Second, each file and directory created by one member will be set to the primary group of
the user rather than the group music. This can be fixed by setting the setgid bit on the
directory:
[bill@linuxbox ~]$ sudo chmod g+s /usr/local/share/Music
[bill@linuxbox ~]$ ls -ld /usr/local/share/Music
drwxrwsr-x 2 root music 4096 2008-03-24 20:03 /usr/local/share/Music
Now we test to see if the new permissions fix the problem. bill sets his umask to
0002, removes the previous test file, and creates a new test file and directory:
[bill@linuxbox ~]$ umask 0002
105
9 Permissions
[bill@linuxbox ~]$
[bill@linuxbox ~]$
[bill@linuxbox ~]$
[bill@linuxbox ~]$
drwxrwsr-x 2 bill
-rw-rw-r-- 1 bill
[bill@linuxbox ~]$
rm /usr/local/share/Music/test_file
> /usr/local/share/Music/test_file
mkdir /usr/local/share/Music/test_dir
ls -l /usr/local/share/Music
music 4096 2008-03-24 20:24 test_dir
music 0 2008-03-24 20:22 test_file
Both files and directories are now created with the correct permissions to allow all members of the group music to create files and directories inside the Music directory.
The one remaining issue is umask. The necessary setting only lasts until the end of session and must be reset. In Chapter 11, we'll look at making the change to umask permanent.
To change your password, just enter the passwd command. You will be prompted for
your old password and your new password:
[me@linuxbox ~]$ passwd
(current) UNIX password:
New UNIX password:
The passwd command will try to enforce use of strong passwords. This means it will
refuse to accept passwords that are too short, too similar to previous passwords, are dictionary words, or are too easily guessed:
[me@linuxbox ~]$ passwd
(current) UNIX password:
New UNIX password:
BAD PASSWORD: is too similar to the old one
New UNIX password:
BAD PASSWORD: it is WAY too short
106
If you have superuser privileges, you can specify a username as an argument to the
passwd command to set the password for another user. Other options are available to
the superuser to allow account locking, password expiration, etc. See the passwd man
page for details.
Summing Up
In this chapter we have seen how Unix-like systems such as Linux manage user permissions to allow the read, write, and execution access to files and directories. The basic
ideas of this system of permissions date back to the early days of Unix and have stood up
pretty well to the test of time. But the native permissions mechanism in Unix-like systems lacks the fine granularity of more modern systems.
Further Reading
There are number of command line programs used to create and maintain users and
groups. For more information, see the man pages for the following commands:
adduser
useradd
groupadd
107
10 Processes
10 Processes
Modern operating systems are usually multitasking, meaning that they create the illusion
of doing more than one thing at once by rapidly switching from one executing program to
another. The Linux kernel manages this through the use of processes. Processes are how
Linux organizes the different programs waiting for their turn at the CPU.
Sometimes a computer will become sluggish or an application will stop responding. In
this chapter, we will look at some of the tools available at the command line that let us
examine what programs are doing, and how to terminate processes that are misbehaving.
This chapter will introduce the following commands:
Viewing Processes
The most commonly used command to view processes (there are several) is ps. The ps
program has a lot of options, but in it simplest form it is used like this:
[me@linuxbox ~]$ ps
PID TTY
TIME CMD
5198 pts/1
00:00:00 bash
10129 pts/1
00:00:00 ps
The result in this example lists two processes, process 5198 and process 10129, which are
bash and ps respectively. As we can see, by default, ps doesn't show us very much, just
the processes associated with the current terminal session. To see more, we need to add
some options, but before we do that, let's look at the other fields produced by ps. TTY is
short for Teletype, and refers to the controlling terminal for the process. Unix is showing its age here. The TIME field is the amount of CPU time consumed by the process. As
we can see, neither process makes the computer work very hard.
If we add an option, we can get a bigger picture of what the system is doing:
[me@linuxbox ~]$ ps x
PID TTY
STAT
TIME COMMAND
2799 ?
Ssl
0:00 /usr/libexec/bonobo-activation-server ac
2820 ?
Sl
0:01 /usr/libexec/evolution-data-server-1.10 -15647 ?
Ss
0:00 /bin/sh /usr/bin/startkde
15751 ?
Ss
0:00 /usr/bin/ssh-agent /usr/bin/dbus-launch -15754 ?
S
0:00 /usr/bin/dbus-launch --exit-with-session
15755 ?
Ss
0:01 /bin/dbus-daemon --fork --print-pid 4 pr
15774 ?
Ss
0:02 /usr/bin/gpg-agent -s daemon
15793 ?
S
0:00 start_kdeinit --new-startup +kcminit_start
15794 ?
Ss
0:00 kdeinit Running...
15797 ?
S
0:00 dcopserver nosid
109
10 Processes
Adding the x option (note that there is no leading dash) tells ps to show all of our processes regardless of what terminal (if any) they are controlled by. The presence of a ? in
the TTY column indicates no controlling terminal. Using this option, we see a list of every process that we own.
Since the system is running a lot of processes, ps produces a long list. It is often helpful
to pipe the output from ps into less for easier viewing. Some option combinations also
produce long lines of output, so maximizing the terminal emulator window may be a
good idea, too.
A new column titled STAT has been added to the output. STAT is short for state and reveals the current status of the process:
Table 10-1: Process States
State
Meaning
<
The process state may be followed by other characters. These indicate various exotic
process characteristics. See the ps man page for more detail.
Another popular set of options is aux (without a leading dash). This gives us even more
information:
110
Viewing Processes
[me@linuxbox ~]$ ps aux
USER
PID %CPU %MEM
VSZ
root
1 0.0 0.0
2136
root
2 0.0 0.0
0
root
3 0.0 0.0
0
root
4 0.0 0.0
0
root
5 0.0 0.0
0
root
6 0.0 0.0
0
root
7 0.0 0.0
0
RSS
644
0
0
0
0
0
0
TTY
?
?
?
?
?
?
?
STAT
Ss
S<
S<
S<
S<
S<
S<
START
Mar05
Mar05
Mar05
Mar05
Mar05
Mar05
Mar05
TIME
0:31
0:00
0:00
0:00
0:06
0:36
0:00
COMMAND
init
[kt]
[mi]
[ks]
[wa]
[ev]
[kh]
This set of options displays the processes belonging to every user. Using the options
without the leading dash invokes the command with BSD style behavior. The Linux
version of ps can emulate the behavior of the ps program found in several different
Unix implementations. With these options, we get these additional columns:
Table 10-2: BSD Style ps Column Headers
Header
Meaning
USER
%CPU
%MEM
VSZ
RSS
START
Time when the process started. For values over 24 hours, a date is
used.
111
10 Processes
The top program displays a continuously updating (by default, every 3 seconds) display
of the system processes listed in order of process activity. The name top comes from
the fact that the top program is used to see the top processes on the system. The top
display consists of two parts: a system summary at the top of the display, followed by a
table of processes sorted by CPU activity:
top - 14:59:20 up 6:30, 2 users, load average: 0.07, 0.02, 0.00
Tasks: 109 total,
1 running, 106 sleeping,
0 stopped,
2 zombie
Cpu(s): 0.7%us, 1.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si
Mem:
319496k total,
314860k used,
4636k free,
19392k buff
Swap:
875500k total,
149128k used,
726372k free,
114676k cach
PID
6244
11071
6180
6321
4955
1
2
3
4
5
6
7
41
67
114
116
USER
me
me
me
me
root
root
root
root
root
root
root
root
root
root
root
root
PR
39
20
20
20
20
20
15
RT
15
RT
15
15
15
15
20
15
TIME+
16:24.42
0:00.14
0:03.66
2:51.38
2:19.39
0:03.14
0:00.00
0:00.00
0:00.72
0:00.04
0:00.42
0:00.06
0:01.08
0:00.00
0:01.62
0:02.44
COMMAND
trackerd
top
dbus-dae
multiloa
Xorg
init
kthreadd
migratio
ksoftirq
watchdog
events/0
khelper
kblockd/
kseriod
pdflush
kswapd0
112
Field
top
Meaning
14:59:20
up 6:30
2 users
load average:
Viewing Processes
that are waiting to run, that is, the number of
processes that are in a runnable state and are
sharing the CPU. Three values are shown, each
for a different period of time. The first is the
average for the last 60 seconds, the next the
previous 5 minutes, and finally the previous 15
minutes. Values under 1.0 indicate that the
machine is not busy.
2
Tasks:
Cpu(s):
0.7%us
1.0%sy
0.0%ni
98.3%id
0.0%wa
Mem:
Swap:
The top program accepts a number of keyboard commands. The two most interesting are
h, which displays the program's help screen, and q, which quits top.
Both major desktop environments provide graphical applications that display information
similar to top (in much the same way that Task Manager in Windows works), but I find
that top is better than the graphical versions because it is faster and it consumes far
fewer system resources. After all, our system monitor program shouldn't be the source of
the system slowdown that we are trying to track.
Controlling Processes
Now that we can see and monitor processes, let's gain some control over them. For our
113
10 Processes
experiments, we're going to use a little program called xlogo as our guinea pig. The
xlogo program is a sample program supplied with the X Window System (the underlying engine that makes the graphics on our display go) which simply displays a re-sizable
window containing the X logo. First, we'll get to know our test subject:
[me@linuxbox ~]$ xlogo
After entering the command, a small window containing the logo should appear somewhere on the screen. On some systems, xlogo may print a warning message, but it may
be safely ignored.
Tip: If your system does not include the xlogo program, try using gedit or
kwrite instead.
We can verify that xlogo is running by resizing its window. If the logo is redrawn in the
new size, the program is running.
Notice how our shell prompt has not returned? This is because the shell is waiting for the
program to finish, just like all the other programs we have used so far. If we close the
xlogo window, the prompt returns.
Interrupting A Process
Let's observe what happens when we run xlogo again. First, enter the xlogo command
and verify that the program is running. Next, return to the terminal window and press
Ctrl-c.
[me@linuxbox ~]$ xlogo
[me@linuxbox ~]$
In a terminal, pressing Ctrl-c, interrupts a program. This means that we politely asked
the program to terminate. After we pressed Ctrl-c, the xlogo window closed and the
shell prompt returned.
Many (but not all) command-line programs can be interrupted by using this technique.
Controlling Processes
gram. Well do this by placing the program in the background. Think of the terminal as
having a foreground (with stuff visible on the surface like the shell prompt) and a background (with hidden stuff behind the surface.) To launch a program so that it is immediately placed in the background, we follow the command with an- & character:
[me@linuxbox ~]$ xlogo &
[1] 28236
[me@linuxbox ~]$
After entering the command, the xlogo window appeared and the shell prompt returned,
but some funny numbers were printed too. This message is part of a shell feature called
job control. With this message, the shell is telling us that we have started job number 1
([1]) and that it has PID 28236. If we run ps, we can see our process:
[me@linuxbox ~]$ ps
PID TTY
TIME CMD
10603 pts/1
00:00:00 bash
28236 pts/1
00:00:00 xlogo
28239 pts/1
00:00:00 ps
The shell's job control facility also gives us a way to list the jobs that have been launched
from our terminal. Using the jobs command, we can see this list:
[me@linuxbox ~]$ jobs
[1]+ Running
xlogo &
The results show that we have one job, numbered 1, that it is running, and that the command was xlogo &.
xlogo &
115
10 Processes
xlogo
The command fg followed by a percent sign and the job number (called a jobspec) does
the trick. If we only have one background job, the jobspec is optional. To terminate xlogo, press Ctrl-c.
xlogo
After stopping xlogo, we can verify that the program has stopped by attempting to resize the xlogo window. We will see that it appears quite dead. We can either restore the
program to the foreground, using the fg command, or move the program to the background with the bg command:
[me@linuxbox ~]$ bg %1
[1]+ xlogo &
[me@linuxbox ~]$
As with the fg command, the jobspec is optional if there is only one job.
Moving a process from the foreground to the background is handy if we launch a graphical program from the command, but forget to place it in the background by appending the
trailing &.
Why would you want to launch a graphical program from the command line? There are
two reasons. First, the program you wish to run might not be listed on the window manager's menus (such as xlogo). Secondly, by launching a program from the command
line, you might be able to see error messages that would otherwise be invisible if the program were launched graphically. Sometimes, a program will fail to start up when
launched from the graphical menu. By launching it from the command line instead, we
may see an error message that will reveal the problem. Also, some graphical programs
have many interesting and useful command line options.
116
Signals
Signals
The kill command is used to kill processes. This allows us to terminate programs
that need killing. Here's an example:
[me@linuxbox ~]$ xlogo &
[1] 28401
[me@linuxbox ~]$ kill 28401
[1]+ Terminated
xlogo
We first launch xlogo in the background. The shell prints the jobspec and the PID of the
background process. Next, we use the kill command and specify the PID of the process
we want to terminate. We could have also specified the process using a jobspec (for example, %1) instead of a PID.
While this is all very straightforward, there is more to it than that. The kill command
doesn't exactly kill processes, rather it sends them signals. Signals are one of several
ways that the operating system communicates with programs. We have already seen signals in action with the use of Ctrl-c and Ctrl-z. When the terminal receives one of
these keystrokes, it sends a signal to the program in the foreground. In the case of Ctrlc, a signal called INT (Interrupt) is sent; with Ctrl-z, a signal called TSTP (Terminal
Stop). Programs, in turn, listen for signals and may act upon them as they are received.
The fact that a program can listen and act upon signals allows a program to do things like
save work in progress when it is sent a termination signal.
If no signal is specified on the command line, then the TERM (Terminate) signal is sent by
default. The kill command is most often used to send the following signals:
Table 10-4: Common Signals
Number
Name
Meaning
HUP
10 Processes
computers with phone lines and modems. The
signal is used to indicate to programs that the
controlling terminal has hung up. The effect of
this signal can be demonstrated by closing a
terminal session. The foreground program
running on the terminal will be sent the signal and
will terminate.
This signal is also used by many daemon
programs to cause a reinitialization. This means
that when a daemon is sent this signal, it will
restart and re-read its configuration file. The
Apache web server is an example of a daemon
that uses the HUP signal in this way.
2
INT
KILL
15
TERM
18
CONT
19
STOP
118
Signals
Let's try out the kill command:
[me@linuxbox ~]$ xlogo &
[1] 13546
[me@linuxbox ~]$ kill -1 13546
[1]+ Hangup
xlogo
In this example, we start the xlogo program in the background and then send it a HUP
signal with kill. The xlogo program terminates and the shell indicates that the background process has received a hangup signal. You may need to press the enter key a couple of times before you see the message. Note that signals may be specified either by
number or by name, including the name prefixed with the letters SIG:
[me@linuxbox ~]$
[1] 13601
[me@linuxbox ~]$
[1]+ Interrupt
[me@linuxbox ~]$
[1] 13608
[me@linuxbox ~]$
[1]+ Interrupt
xlogo &
kill -INT 13601
xlogo
xlogo &
kill -SIGINT 13608
xlogo
Repeat the example above and try out the other signals. Remember, you can also use jobspecs in place of PIDs.
Processes, like files, have owners, and you must be the owner of a process (or the superuser) in order to send it signals with kill.
In addition to the list of signals above, which are most often used with kill, there are
other signals frequently used by the system. Here is a list of other common signals:
Table 10-5: Other Common Signals
Number
Name
Meaning
QUIT
Quit.
11
SEGV
20
TSTP
10 Processes
the program but the program may choose to
ignore it.
28
WINCH
For the curious, a complete list of signals can be seen with the following command:
[me@linuxbox ~]$ kill -l
To demonstrate, we will start a couple of instances of the xlogo program and then terminate them:
[me@linuxbox ~]$ xlogo &
[1] 18801
[me@linuxbox ~]$ xlogo &
[2] 18802
[me@linuxbox ~]$ killall xlogo
[1]- Terminated
xlogo
[2]+ Terminated
xlogo
Remember, as with kill, you must have superuser privileges to send signals to processes that do not belong to you.
Description
pstree
vmstat
xload
tload
Similar to the xload program, but draws the graph in the terminal.
Terminate the output with Ctrl-c.
Summing Up
Most modern systems feature a mechanism for managing multiple processes. Linux provides a rich set of tools for this purpose. Given that Linux is the world's most deployed
server operating system, this makes a lot of sense. However, unlike some other systems,
Linux relies primarily on command line tools for process management. Though there are
graphical process tools for Linux, the command line tools are greatly preferred because of
their speed and light footprint. While the GUI tools may look pretty, they often create a
lot of system load themselves, which somewhat defeats the purpose.
121
123
11 The Environment
11 The Environment
As we discussed earlier, the shell maintains a body of information during our shell session called the environment. Data stored in the environment is used by programs to determine facts about our configuration. While most programs use configuration files to store
program settings, some programs will also look for values stored in the environment to
adjust their behavior. Knowing this, we can use the environment to customize our shell
experience.
In this chapter, we will work with the following commands:
124
What we see is a list of environment variables and their values. For example, we see a
variable called USER, which contains the value me. The printenv command can
also list the value of a specific variable:
[me@linuxbox ~]$ printenv USER
me
The set command, when used without options or arguments, will display both the shell
and environment variables, as well as any defined shell functions. Unlike printenv, its
output is courteously sorted in alphabetical order:
[me@linuxbox ~]$ set | less
It is also possible to view the contents of a variable using the echo command, like this:
125
11 The Environment
[me@linuxbox ~]$ echo $HOME
/home/me
One element of the environment that neither set nor printenv displays is aliases. To
see them, enter the alias command without arguments:
[me@linuxbox ~]$ alias
alias l.='ls -d .* --color=tty'
alias ll='ls -l --color=tty'
alias ls='ls --color=tty'
alias vi='vim'
alias which='alias | /usr/bin/which --tty-only --read-alias --showdot --show-tilde'
Contents
EDITOR
SHELL
HOME
LANG
OLD_PWD
PAGER
The name of the program to be used for paging output. This is often
set to /usr/bin/less.
PATH
PS1
126
TERM
TZ
USER
Your username.
Don't worry if some of these values are missing. They vary by distribution.
Contents
/etc/profile
~/.bash_profile
~/.bash_login
~/.profile
11 The Environment
Table 11-3: Startup Files For Non-Login Shell Sessions
File
Contents
/etc/bash.bashrc
~/.bashrc
In addition to reading the startup files above, non-login shells also inherit the environment from their parent process, usually a login shell.
Take a look at your system and see which of these startup files you have. Remember
since most of the filenames listed above start with a period (meaning that they are hidden), you will need to use the -a option when using ls.
The ~/.bashrc file is probably the most important startup file from the ordinary users
point of view, since it is almost always read. Non-login shells read it by default and most
startup files for login shells are written in such a way as to read the ~/.bashrc file as
well.
Lines that begin with a # are comments and are not read by the shell. These are there
for human readability. The first interesting thing occurs on the fourth line, with the following code:
if [ -f ~/.bashrc ]; then
128
This is called an if compound command, which we will cover fully when we get to shell
scripting in Part 4, but for now we will translate:
If the file "~/.bashrc" exists, then
read the "~/.bashrc" file.
We can see that this bit of code is how a login shell gets the contents of .bashrc. The
next thing in our startup file has to do with the PATH variable.
Ever wonder how the shell knows where to find commands when we enter them on the
command line? For example, when we enter ls, the shell does not search the entire computer to find /bin/ls (the full pathname of the ls command), rather, it searches a list
of directories that are contained in the PATH variable.
The PATH variable is often (but not always, depending on the distribution) set by the
/etc/profile startup file and with this code:
PATH=$PATH:$HOME/bin
PATH is modified to add the directory $HOME/bin to the end of the list. This is an example of parameter expansion, which we touched on in Chapter 7. To demonstrate how
this works, try the following:
[me@linuxbox
[me@linuxbox
This is some
[me@linuxbox
[me@linuxbox
This is some
Using this technique, we can append text to the end of a variable's contents.
By adding the string $HOME/bin to the end of the PATH variable's contents, the directory $HOME/bin is added to the list of directories searched when a command is entered.
This means that when we want to create a directory within our home directory for storing
our own private programs, the shell is ready to accommodate us. All we have to do is call
129
11 The Environment
it bin, and were ready to go.
Note: Many distributions provide this PATH setting by default. Some Debian based
distributions, such as Ubuntu, test for the existence of the ~/bin directory at login, and dynamically add it to the PATH variable if the directory is found.
Lastly, we have:
export PATH
The export command tells the shell to make the contents of PATH available to child
processes of this shell.
Text Editors
To edit (i.e., modify) the shell's startup files, as well as most of the other configuration
files on the system, we use a program called a text editor. A text editor is a program that
is, in some ways, like a word processor in that it allows you to edit the words on the
screen with a moving cursor. It differs from a word processor by only supporting pure
text, and often contains features designed for writing programs. Text editors are the central tool used by software developers to write code, and by system administrators to manage the configuration files that control the system.
There are a lot of different text editors available for Linux; your system probably has several installed. Why so many different ones? Probably because programmers like writing
130
This command will start the gedit text editor and load the file named some_file, if it
exists.
All graphical text editors are pretty self-explanatory, so we won't cover them here. Instead, we will concentrate on our first text-based text editor, nano. Let's fire up nano
and edit the .bashrc file. But before we do that, let's practice some safe computing.
Whenever we edit an important configuration file, it is always a good idea to create a
backup copy of the file first. This protects us in case we mess the file up while editing. To
create a backup of the .bashrc file, do this:
[me@linuxbox ~]$ cp .bashrc .bashrc.bak
It doesn't matter what you call the backup file, just pick an understandable name. The extensions .bak, .sav, .old, and .orig are all popular ways of indicating a backup
file. Oh, and remember that cp will overwrite existing files silently.
131
11 The Environment
Now that we have a backup file, we'll start the editor:
[me@linuxbox ~]$ nano .bashrc
File: .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
[ Read 8 lines ]
^G Get Help^O WriteOut^R Read Fil^Y Prev Pag^K Cut Text^C Cur Pos
^X Exit
^J Justify ^W Where Is^V Next Pag^U UnCut Te^T To Spell
Note: If your system does not have nano installed, you may use a graphical editor
instead.
The screen consists of a header at the top, the text of the file being edited in the middle
and a menu of commands at the bottom. Since nano was designed to replace the text editor supplied with an email client, it is rather short on editing features.
The first command you should learn in any text editor is how to exit the program. In the
case of nano, you type Ctrl-x to exit. This is indicated in the menu at the bottom of
the screen. The notation ^X means Ctrl-x. This is a common notation for control
characters used by many programs.
The second command we need to know is how to save our work. With nano it's Ctrl132
Note: Your distribution may already include some of these, but duplicates won't
hurt anything.
Here is the meaning of our additions:
Table 11-4: Additions to our .bashrc
Line
umask 0002
Meaning
export HISTCONTROL=ignoredups
export HISTSIZE=1000
As we can see, many of our additions are not intuitively obvious, so it would be a good
idea to add some comments to our .bashrc file to help explain things to the humans.
133
11 The Environment
Using the editor, change our additions to look like this:
# Change umask to make directory sharing easier
umask 0002
# Ignore duplicates in command history and increase
# history size to 1000 lines
export HISTCONTROL=ignoredups
export HISTSIZE=1000
# Add some helpful aliases
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
Ah, much better! With our changes complete, press Ctrl-o to save our modified
.bashrc file, and Ctrl-x to exit nano.
134
After doing this, we should be able to see the effect of our changes. Try out one of the
new aliases:
[me@linuxbox ~]$ ll
Summing Up
In this chapter we learned an essential skillediting
configuration files with a text edi tor. Moving forward, as we read man pages for commands, take note of the environment
variables that commands support. There may be a gem or two. In later chapters, we will
learn about shell functions, a powerful feature that you can also include in the bash
startup files to add to your arsenal of custom commands.
Further Reading
The INVOCATION section of the bash man page covers the bash startup files
in gory detail.
135
12 A Gentle Introduction To vi
12 A Gentle Introduction To vi
There is an old joke about a visitor to New York City asking a passerby for directions to
the city's famous classical music venue:
Visitor: Excuse me, how do I get to Carnegie Hall?
Passerby: Practice, practice, practice!
Learning the Linux command line, like becoming an accomplished pianist, is not something that we pick up in an afternoon. It takes years of practice. In this chapter, we will
introduce the vi (pronounced vee eye) text editor, one of the core programs in the
Unix tradition. vi is somewhat notorious for its difficult user interface, but when we see
a master sit down at the keyboard and begin to play, we will indeed be witness to some
great art. We won't become masters in this chapter, but when we are done, we will know
how to play chopsticks in vi.
vi is always available. This can be a lifesaver if we have a system with no graphical interface, such as a remote server or a local system with a broken X configuration. nano, while increasingly popular is still not universal. POSIX, a standard
for program compatibility on Unix systems, requires that vi be present.
vi is lightweight and fast. For many tasks, it's easier to bring up vi than it is to
find the graphical text editor in the menus and wait for its multiple megabytes to
load. In addition, vi is designed for typing speed. As we shall see, a skilled vi
user never has to lift his or her fingers from the keyboard while editing.
We don't want other Linux and Unix users to think we are sissies.
136
A Little Background
A Little Background
The first version of vi was written in 1976 by Bill Joy, a University of California at
Berkley student who later went on to co-found Sun Microsystems. vi derives its name
from the word visual, because it was intended to allow editing on a video terminal with
a moving cursor. Previous to visual editors, there were line editors which operated on a
single line of text at a time. To specify a change, we tell a line editor to go to a particular
line and describe what change to make, such as adding or deleting text. With the advent
of video terminals (rather than printer-based terminals like teletypes) visual editing became possible. vi actually incorporates a powerful line editor called ex, and we can use
line editing commands while using vi.
Most Linux distributions don't include real vi; rather, they ship with an enhanced replacement called vim (which is short for vi improved) written by Bram Moolenaar.
vim is a substantial improvement over traditional Unix vi and is usually symbolically
linked (or aliased) to the name vi on Linux systems. In the discussions that follow, we
will assume that we have a program called vi that is really vim.
VIM - Vi Improved
version 7.1.138
by Bram Moolenaar et al.
Vim is open source and freely distributable
type
type
type
type
:q<Enter>
:help<Enter> or <F1>
:help version7<Enter>
type
to exit
for on-line help
for version info
137
12 A Gentle Introduction To vi
~
~
~
~
type
Just as we did with nano earlier, the first thing to learn is how to exit. To exit, we enter
the following command (note that the colon character is part of the command):
:q
The shell prompt should return. If, for some reason, vi will not quit (usually because we
made a change to a file that has not yet been saved), we can tell vi that we really mean it
by adding an exclamation point to the command:
:q!
Tip: If you get lost in vi, try pressing the Esc key twice to find your way again.
Compatibility Mode
In the example startup screen above (taken from Ubuntu 8.04), we see the text
Running in Vi compatible mode. This means that vim will run in a mode that is
closer to the normal behavior of vi rather than the enhanced behavior of vim.
For purposes of this chapter, we will want to run vim with its enhanced behavior.
To do this, you have a few options:
Try running vim instead of vi.
If that works, consider adding alias vi='vim' to your .bashrc file.
Alternatively, use this command to add a line to your vim configuration file:
echo "set nocp" >> ~/.vimrc
Different Linux distributions package vim in different ways. Some distributions
install a minimal version of vim by default that only supports a limited set of
vim features. While preforming the lessons that follow, you may encounter missing features. If this is the case, install the full version of vim.
138
Editing Modes
Editing Modes
Let's start up vi again, this time passing to it the name of a nonexistent file. This is how
we can create a new file with vi:
[me@linuxbox ~]$ rm -f foo.txt
[me@linuxbox ~]$ vi foo.txt
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"foo.txt" [New File]
The leading tilde characters (~) indicate that no text exists on that line. This shows that
we have an empty file. Do not type anything yet!
The second most important thing to learn about vi (after learning how to exit) is that vi
is a modal editor. When vi starts up, it begins in command mode. In this mode, almost
every key is a command, so if we were to start typing, vi would basically go crazy and
make a big mess.
139
12 A Gentle Introduction To vi
To exit insert mode and return to command mode, press the Esc key.
To write our modified file, we follow the colon with a w then Enter:
:w
The file will be written to the hard drive and we should get a confirmation message at the
bottom of the screen, like this:
"foo.txt" [New] 1L, 46C written
Tip: If you read the vim documentation, you will notice that (confusingly) command mode is called normal mode and ex commands are called command mode.
140
Editing Modes
Beware.
l or Right Arrow
h or Left Arrow
j or Down Arrow
k or Up Arrow
Up one line.
0 (zero)
Ctrl-b or Page Up
Up one page.
numberG
Why are the h, j, k, and l keys used for cursor movement? Because when vi was origi141
12 A Gentle Introduction To vi
nally written, not all video terminals had arrow keys, and skilled typists could use regular
keyboard keys to move the cursor without ever having to lift their fingers from the keyboard.
Many commands in vi can be prefixed with a number, as with the G command listed
above. By prefixing a command with a number, we may specify the number of times a
command is to be carried out. For example, the command 5j causes vi to move the
cursor down five lines.
Basic Editing
Most editing consists of a few basic operations such as inserting text, deleting text, and
moving text around by cutting and pasting. vi, of course, supports all of these operations
in its own unique way. vi also provides a limited form of undo. If we press the u key
while in command mode, vi will undo the last change that you made. This will come in
handy as we try out some of the basic editing commands.
Appending Text
vi has several different ways of entering insert mode. We have already used the i command to insert text.
Let's go back to our foo.txt file for a moment:
The quick brown fox jumped over the lazy dog.
If we wanted to add some text to the end of this sentence, we would discover that the i
command will not do it, since we can't move the cursor beyond the end of the line. vi
provides a command to append text, the sensibly named a command. If we move the
cursor to the end of the line and type a, the cursor will move past the end of the line
and vi will enter insert mode. This will allow us to add some more text:
The quick brown fox jumped over the lazy dog. It was cool.
Basic Editing
Now we type A and add the following lines of text:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
Line 5
Opening A Line
Another way we can insert text is by opening a line. This inserts a blank line between
two existing lines and enters insert mode. This has two variants:
Table 12-2: Line Opening Keys
Command
o
Opens
We can demonstrate this as follows: place the cursor on Line 3 then press the o key.
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
Line 5
A new line was opened below the third line and we entered insert mode. Exit insert mode
by pressing the Esc key. Press the u key to undo our change.
Press the O key to open the line above the cursor:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
143
12 A Gentle Introduction To vi
Line 3
Line 4
Line 5
Exit insert mode by pressing the Esc key and undo our change by pressing u.
Deleting Text
As we might expect, vi offers a variety of ways to delete text, all of which contain one
of two keystrokes. First, the x key will delete a character at the cursor location. x may be
preceded by a number specifying how many characters are to be deleted. The d key is
more general purpose. Like x, it may be preceded by a number specifying the number of
times the deletion is to be performed. In addition, d is always followed by a movement
command that controls the size of the deletion. Here are some examples:
Table 12-3: Text Deletion Commands
Command
Deletes
3x
dd
5dd
dW
d$
d0
d^
From the current cursor location to the first nonwhitespace character in the line.
dG
d20G
Place the cursor on the word It on the first line of our text. Press the x key repeatedly
until the rest of the sentence is deleted. Next, press the u key repeatedly until the deletion
144
Basic Editing
is undone.
Note: Real vi only supports a single level of undo. vim supports multiple levels.
Let's try the deletion again, this time using the d command. Again, move the cursor to the
word It and press dW to delete the word:
The quick brown fox jumped over the lazy dog. was cool.
Line 2
Line 3
Line 4
Line 5
Press d$ to delete from the cursor position to the end of the line:
The quick brown fox jumped over the lazy dog.
Line 2
Line 3
Line 4
Line 5
Press dG to delete from the current line to the end of the file:
~
~
~
~
~
12 A Gentle Introduction To vi
used to cut text. Here are some examples combining the y command with various movement commands:
Table13- 4: Yanking Commands
Command
Copies
yy
5yy
yW
y$
y0
y^
From the current cursor location to the first nonwhitespace character in the line.
yG
y20G
Let's try some copy and paste. Place the cursor on the first line of the text and type yy to
copy the current line. Next, move the cursor to the last line (G) and type p to paste the
line below the current line:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
Line 5
The quick brown fox jumped over the lazy dog. It was cool.
Just as before, the u command will undo our change. With the cursor still positioned on
the last line of the file, type P to paste the text above the current line:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
146
Basic Editing
The quick brown fox jumped over the lazy dog. It was cool.
Line 5
Try out some of the other y commands in the table above and get to know the behavior of
both the p and P commands. When you are done, return the file to its original state.
Joining Lines
vi is rather strict about its idea of a line. Normally, it is not possible to move the cursor
to the end of a line and delete the end-of-line character to join one line with the one below it. Because of this, vi provides a specific command, J (not to be confused with j,
which is for cursor movement) to join lines together.
If we place the cursor on line 3 and type the J command, here's what happens:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3 Line 4
Line 5
Search-And-Replace
vi has the ability to move the cursor to locations based on searches. It can do this on either a single line or over an entire file. It can also perform text replacements with or without confirmation from the user.
12 A Gentle Introduction To vi
with the n command. Here's an example:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
Line 5
followed by the Enter key. The cursor will move to line 2. Next, type n and the cursor
will move to line 3. Repeating the n command will move the cursor down the file until it
runs out of matches. While we have so far only used words and phrases for our search
patterns, vi allows the use of regular expressions, a powerful method of expressing complex text patterns. We will cover regular expressions in some detail in a later chapter.
Global Search-And-Replace
vi uses an ex command to perform search-and-replace operations (called substitution
in vi) over a range of lines or the entire file. To change the word Line to line for the
entire file, we would enter the following command:
:%s/Line/line/g
Let's break this command down into separate items and see what each one does:
Table12- 5:An example of global search-and-replace syntax
Item
:
Meaning
148
Search-And-Replace
s
/Line/line/
After executing our search-and-replace command our file looks like this:
The quick brown fox jumped over the lazy dog. It was cool.
line 2
line 3
line 4
line 5
We can also specify a substitution command with user confirmation. This is done by
adding a c to the end of the command. For example:
:%s/line/Line/gc
This command will change our file back to its previous form; however, before each substitution, vi stops and asks us to confirm the substitution with this message:
replace with Line (y/n/a/q/l/^E/^Y)?
Action
149
12 A Gentle Introduction To vi
q or Esc
Quit substituting.
Ctrl-e, Ctrl-y
If you type y, the substitution will be performed, n will cause vi to skip this instance and
move on to the next one.
Let's exit our existing vi session and create a new file for editing. Type :wq to exit vi,
saving our modified text. Next, we'll create an additional file in our home directory that
we can play with. We'll create the file by capturing some output from the ls command:
[me@linuxbox ~]$ ls -l /usr/bin > ls-output.txt
Let's edit our old file and our new one with vi:
[me@linuxbox ~]$ vi foo.txt ls-output.txt
vi will start up and we will see the first file on the screen:
The quick brown fox jumped over the lazy dog. It was cool.
Line 2
Line 3
Line 4
Line 5
150
While we can move from one file to another, vi enforces a policy that prevents us from
switching files if the current file has unsaved changes. To force vi to switch files and
abandon your changes, add an exclamation point (!) to the command.
In addition to the switching method described above, vim (and some versions of vi) also
provide some ex commands that make multiple files easier to manage. We can view a list
of files being edited with the :buffers command. Doing so will display a list of the
files at the bottom of the display:
:buffers
1 %a
"foo.txt"
line 1
2
"ls-output.txt"
line 0
Press ENTER or type command to continue
To switch to another buffer (file), type :buffer followed by the number of the buffer
you wish to edit. For example, to switch from buffer 1 which contains the file foo.txt
to buffer 2 containing the file ls-output.txt we would type this:
:buffer 2
151
12 A Gentle Introduction To vi
Start vi again with just one file:
[me@linuxbox ~]$ vi foo.txt
And it should appear on the screen. The first file is still present as we can verify:
:buffers
1 #
"foo.txt"
line 1
2 %a
"ls-output.txt"
line 0
Press ENTER or type command to continue
Note: You cannot switch to files loaded with the :e command using either the :n
or :N command. To switch files, use the :buffer command followed by the buffer number.
152
Next, move the cursor to the first line, and type yy to yank (copy) the line.
Switch to the second buffer by entering:
:buffer 2
The screen will now contain some file listings like this (only a portion is shown here):
total 343700
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
root
root
root
root
root
root
root
root
root
root
root
root
31316
8240
111276
25368
11532
7292
2007-12-05
2007-12-09
2008-01-31
2006-10-06
2007-05-04
2007-05-04
08:58
13:39
13:36
20:16
17:43
17:43
[
411toppm
a2p
a52dec
aafire
aainfo
Move the cursor to the first line and paste the line we copied from the preceding file by
typing the p command:
total 343700
The quick brown fox jumped over the lazy dog.
-rwxr-xr-x 1 root root
31316 2007-12-05
-rwxr-xr-x 1 root root
8240 2007-12-09
-rwxr-xr-x 1 root root
111276 2008-01-31
-rwxr-xr-x 1 root root
25368 2006-10-06
-rwxr-xr-x 1 root root
11532 2007-05-04
-rwxr-xr-x 1 root root
7292 2007-05-04
It was cool.
08:58 [
13:39 411toppm
13:36 a2p
20:16 a52dec
17:43 aafire
17:43 aainfo
153
12 A Gentle Introduction To vi
[me@linuxbox ~]$ vi ls-output.txt
root
root
root
root
root
root
root
root
root
root
root
root
31316
8240
111276
25368
11532
7292
2007-12-05
2007-12-09
2008-01-31
2006-10-06
2007-05-04
2007-05-04
08:58
13:39
13:36
20:16
17:43
17:43
[
411toppm
a2p
a52dec
aafire
aainfo
Move the cursor to the third line, then enter the following ex command:
:r foo.txt
The :r command (short for read) inserts the specified file before the cursor position.
Our screen should now look like this:
total 343700
-rwxr-xr-x 1 root root
31316 2007-12-05
-rwxr-xr-x 1 root root
8240 2007-12-09
The quick brown fox jumped over the lazy dog.
Line 2
Line 3
Line 4
Line 5
-rwxr-xr-x 1 root root
111276 2008-01-31
-rwxr-xr-x 1 root root
25368 2006-10-06
-rwxr-xr-x 1 root root
11532 2007-05-04
-rwxr-xr-x 1 root root
7292 2007-05-04
08:58 [
13:39 411toppm
It was cool.
13:36
20:16
17:43
17:43
a2p
a52dec
aafire
aainfo
Note: While the command above saves the file under a new name, it does not
change the name of the file you are editing. As you continue to edit, you will still
be editing foo.txt, not foo1.txt.
Summing Up
With this basic set of skills we can now perform most of the text editing needed to maintain a typical Linux system. Learning to use vim on a regular basis will pay off in the
long run. Since vi-style editors are so deeply embedded in Unix culture, we will see many
other programs that have been influenced by its design. less is a good example of this
influence.
Further Reading
Even with all that we have covered in this chapter, we have barely scratched the surface
of what vi and vim can do. Here are a couple of on-line resources you can use to continue your journey towards vi mastery:
Learning The vi Editor A Wikibook from Wikipedia that offers a concise guide
to vi and several of its work-a-likes including vim. It's available at:
http://en.wikibooks.org/wiki/Vi
The Vim Book - The vim project has a 570-page book that covers (almost) all of
the features in vim. You can find it at:
ftp://ftp.vim.org/pub/vim/doc/book/vimbook-OPL.pdf.
155
Anatomy Of A Prompt
Our default prompt looks something like this:
[me@linuxbox ~]$
Notice that it contains our username, our hostname and our current working directory, but
how did it get that way? Very simply, it turns out. The prompt is defined by an environment variable named PS1 (short for prompt string one). We can view the contents of
PS1 with the echo command:
[me@linuxbox ~]$ echo $PS1
[\u@\h \W]\$
Note: Don't worry if your results are not exactly the same as the example above.
Every Linux distribution defines the prompt string a little differently, some quite
exotically.
From the results, we can see that PS1 contains a few of the characters we see in our
prompt such as the brackets, the at-sign, and the dollar sign, but the rest are a mystery.
The astute among us will recognize these as backslash-escaped special characters like
156
Anatomy Of A Prompt
those we saw in Chapter 7. Here is a partial list of the characters that the shell treats specially in the prompt string:
Table 13-1: Escape Codes Used In Shell Prompts
Sequence
Value Displayed
\a
\d
Current date in day, month, date format. For example, Mon May
26.
\h
\H
Full hostname.
\j
\l
\n
A newline character.
\r
A carriage return.
\s
\t
\T
\@
\A
\u
\v
\V
\w
\W
\!
\#
\$
\[
We create a new variable called ps1_old and assign the value of PS1 to it. We can verify that the string has been copied by using the echo command:
[me@linuxbox ~]$ echo $ps1_old
[\u@\h \W]\$
We can restore the original prompt at any time during our terminal session by simply reversing the process:
[me@linuxbox ~]$ PS1="$ps1_old"
Now that we are ready to proceed, let's see what happens if we have an empty prompt
string:
[me@linuxbox ~]$ PS1=
If we assign nothing to the prompt string, we get nothing. No prompt string at all! The
prompt is still there, but displays nothing, just as we asked it to. Since this is kind of disconcerting to look at, we'll replace it with a minimal prompt:
PS1="\$ "
That's better. At least now we can see what we are doing. Notice the trailing space within
the double quotes. This provides the space between the dollar sign and the cursor when
158
Now we should hear a beep each time the prompt is displayed. This could get annoying,
but it might be useful if we needed notification when an especially long-running command has been executed. Note that we included the \[ and \] sequences. Since the
ASCII bell (\a) does not print, that is, it does not move the cursor, we need to tell
bash so it can correctly determine the length of the prompt.
Next, let's try to make an informative prompt with some hostname and time-of-day information:
$ PS1="\A \h \$ "
17:33 linuxbox $
Adding time-of-day to our prompt will be useful if we need to keep track of when we
perform certain tasks. Finally, we'll make a new prompt that is similar to our original:
17:37 linuxbox $ PS1="<\u@\h \W>\$ "
<me@linuxbox ~>$
Try out the other sequences listed in the table above and see if you can come up with a
brilliant new prompt.
Adding Color
Most terminal emulator programs respond to certain non-printing character sequences to
control such things as character attributes (like color, bold text, and the dreaded blinking
text) and cursor position. We'll cover cursor position in a little bit, but first we'll look at
color.
159
Terminal Confusion
Back in ancient times, when terminals were hooked to remote computers, there
were many competing brands of terminals and they all worked differently. They
had different keyboards and they all had different ways of interpreting control information. Unix and Unix-like systems have two rather complex subsystems to
deal with the babel of terminal control (called termcap and terminfo). If you
look in the deepest recesses of your terminal emulator settings you may find a setting for the type of terminal emulation.
In an effort to make terminals speak some sort of common language, the American National Standards Institute (ANSI) developed a standard set of character sequences to control video terminals. Old time DOS users will remember the ANSI.SYS file that was used to enable interpretation of these codes.
Character color is controlled by sending the terminal emulator an ANSI escape code embedded in the stream of characters to be displayed. The control code does not print out
on the display, rather it is interpreted by the terminal as an instruction. As we saw in the
table above, the \[ and \] sequences are used to encapsulate non-printing characters. An
ANSI escape code begins with an octal 033 (the code generated by the escape key), followed by an optional character attribute, followed by an instruction. For example, the
code to set the text color to normal (attribute = 0), black text is:
\033[0;30m
Here is a table of available text colors. Notice that the colors are divided into two groups,
differentiated by the application of the bold character attribute (1) which creates the appearance of light colors:
Table14- 2: Escape Sequences Used To Set Text Colors
Sequence
\033[0;30m
Text Color
Text Color
Black
Sequence
\033[1;30m
\033[0;31m
Red
\033[1;31m
Light Red
\033[0;32m
Green
\033[1;32m
Light Green
\033[0;33m
Brown
\033[1;33m
Yellow
\033[0;34m
Blue
\033[1;34m
Light Blue
\033[0;35m
Purple
\033[1;35m
Light Purple
160
Dark Gray
Adding Color
\033[0;36m
Cyan
\033[1;36m
Light Cyan
\033[0;37m
Light Grey
\033[1;37m
White
Let's try to make a red prompt. We'll insert the escape code at the beginning:
<me@linuxbox ~>$ PS1="\[\033[0;31m\]<\u@\h \W>\$ "
<me@linuxbox ~>$
That works, but notice that all the text that we type after the prompt is also red. To fix
this, we will add another escape code to the end of the prompt that tells the terminal emulator to return to the previous color:
<me@linuxbox ~>$ PS1="\[\033[0;31m\]<\u@\h \W>\$\[\033[0m\] "
<me@linuxbox ~>$
That's better!
It's also possible to set the text background color using the codes listed below. The background colors do not support the bold attribute.
Table 13-3: Escape Sequences Used To Set Background Color
Sequence
Background Color
Sequence
Background Color
\033[0;40m
Black
\033[0;44m
Blue
\033[0;41m
Red
\033[0;45m
Purple
\033[0;42m
Green
\033[0;46m
Cyan
\033[0;43m
Brown
\033[0;47m
Light Grey
We can create a prompt with a red background by applying a simple change to the first
escape code:
<me@linuxbox ~>$ PS1="\[\033[0;41m\]<\u@\h \W>\$\[\033[0m\] "
<me@linuxbox ~>$
Try out the color codes and see what you can create!
161
Action
\033[l;cH
\033[nA
\033[nB
\033[nC
\033[nD
\033[2J
Clear the screen and move the cursor to the upper left corner (line
0, column 0)
\033[K
Clear from the cursor position to the end of the current line
\033[s
\033[u
Using the codes above, we'll construct a prompt that draws a red bar at the top of the
screen containing a clock (rendered in yellow text) each time the prompt is displayed.
The code for the prompt is this formidable looking string:
PS1="\[\033[s\033[0;0H\033[0;41m\033[K\033[1;33m\t\033[0m\033[u\]
<\u@\h \W>\$ "
Let's take a look at each part of the string to see what it does:
162
Action
\[
\033[s
\033[0;0H
\033[0;41m
\033[K
Clear from the current cursor location (the top left corner) to
the end of the line. Since the background color is now red, the
line is cleared to that color creating our bar. Note that clearing
to the end of the line does not change the cursor position, which
remains at the upper left corner.
\033[1;33m
\t
\033[0m
Turn off color. This affects both the text and background.
\033[u
\]
<\u@\h \W>\$
Prompt string.
163
Summing Up
Believe it or not, there is much more that can be done with prompts involving shell functions and scripts that we haven't covered here, but this is a good start. Not everyone will
care enough to change the prompt, since the default prompt is usually satisfactory. But for
those of us who like to tinker, the shell provides the opportunity for many hours of trivial
fun.
Further Reading
The Bash Prompt HOWTO from the Linux Documentation Project provides a
pretty complete discussion of what the shell prompt can be made to do. It is available at:
http://tldp.org/HOWTO/Bash-Prompt-HOWTO/
164
165
14 Package Management
14 Package Management
If we spend any time in the Linux community, we hear many opinions as to which of the
many Linux distributions is best. Often, these discussions get really silly, focusing on
such things as the prettiness of the desktop background (some people won't use Ubuntu
because of its default color scheme!) and other trivial matters.
The most important determinant of distribution quality is the packaging system and the
vitality of the distribution's support community. As we spend more time with Linux, we
see that its software landscape is extremely dynamic. Things are constantly changing.
Most of the top-tier Linux distributions release new versions every six months and many
individual program updates every day. To keep up with this blizzard of software, we need
good tools for package management.
Package management is a method of installing and maintaining software on the system.
Today, most people can satisfy all of their software needs by installing packages from
their Linux distributor. This contrasts with the early days of Linux, when one had to
download and compile source code in order to install software. Not that there is anything
wrong with compiling source code; in fact, having access to source code is the great wonder of Linux. It gives us (and everybody else) the ability to examine and improve the system. It's just that having a precompiled package is faster and easier to deal with.
In this chapter, we will look at some of the command line tools used for package management. While all of the major distributions provide powerful and sophisticated graphical
programs for maintaining the system, it is important to learn about the command line programs, too. They can perform many tasks that are difficult (or impossible) to do with their
graphical counterparts.
Packaging Systems
Different distributions use different packaging systems and as a general rule, a package
intended for one distribution is not compatible with another distribution. Most distributions fall into one of two camps of packaging technologies: the Debian .deb camp and
the Red Hat .rpm camp. There are some important exceptions such as Gentoo, Slackware, and Foresight, but most others use one of these two basic systems.
166
Packaging Systems
Table 14-1: Major Packaging System Families
Packaging System
Package Files
The basic unit of software in a packaging system is the package file. A package file is a
compressed collection of files that comprise the software package. A package may consist
of numerous programs and data files that support the programs. In addition to the files to
be installed, the package file also includes metadata about the package, such as a text description of the package and its contents. Additionally, many packages contain pre- and
post-installation scripts that perform configuration tasks before and after the package installation.
Package files are created by a person known as a package maintainer, often (but not always) an employee of the distribution vendor. The package maintainer gets the software
in source code form from the upstream provider (the author of the program), compiles it,
and creates the package metadata and any necessary installation scripts. Often, the package maintainer will apply modifications to the original source code to improve the program's integration with the other parts of the Linux distribution.
Repositories
While some software projects choose to perform their own packaging and distribution,
most packages today are created by the distribution vendors and interested third parties.
Packages are made available to the users of a distribution in central repositories that may
contain many thousands of packages, each specially built and maintained for the distribution.
167
14 Package Management
A distribution may maintain several different repositories for different stages of the software development life cycle. For example, there will usually be a testing repository
that contains packages that have just been built and are intended for use by brave souls
who are looking for bugs before they are released for general distribution. A distribution
will often have a development repository where work-in-progress packages destined
for inclusion in the distribution's next major release are kept.
A distribution may also have related third-party repositories. These are often needed to
supply software that, for legal reasons such as patents or DRM anti-circumvention issues,
cannot be included with the distribution. Perhaps the best known case is that of encrypted
DVD support, which is not legal in the United States. The third-party repositories operate
in countries where software patents and anti-circumvention laws do not apply. These
repositories are usually wholly independent of the distribution they support and to use
them, one must know about them and manually include them in the configuration files for
the package management system.
Dependencies
Programs seldom standalone; rather they rely on the presence of other software components to get their work done. Common activities, such as input/output for example, are
handled by routines shared by many programs. These routines are stored in what are
called shared libraries, which provide essential services to more than one program. If a
package requires a shared resource such as a shared library, it is said to have a dependency. Modern package management systems all provide some method of dependency
resolution to ensure that when a package is installed, all of its dependencies are installed,
too.
Low-Level Tools
High-Level Tools
Debian-Style
dpkg
apt-get, aptitude
168
rpm
yum
Command(s)
Debian
apt-get update
apt-cache search search_string
Red Hat
Example: To search a yum repository for the emacs text editor, this command could be
used:
yum search emacs
Command(s)
apt-get update
169
14 Package Management
apt-get install package_name
Red Hat
Command(s)
dpkg --install package_file
Red Hat
rpm -i package_file
Note: Since this technique uses the low-level rpm program to perform the installation, no dependency resolution is performed. If rpm discovers a missing dependency, rpm will exit with an error.
Removing A Package
Packages can be uninstalled using either the high-level or low-level tools. The high-level
tools are shown below.
170
Command(s)
Debian
Red Hat
Command(s)
apt-get update; apt-get upgrade
Red Hat
yum update
Example: To apply any available updates to the installed packages on a Debian-style system:
apt-get update; apt-get upgrade
Command(s)
Debian
171
14 Package Management
Red Hat
rpm -U package_file
Example: Updating an existing installation of emacs to the version contained in the package file emacs-22.1-7.fc7-i386.rpm on a Red Hat system:
rpm -U emacs-22.1-7.fc7-i386.rpm
Note: dpkg does not have a specific option for upgrading a package versus installing one as rpm does.
Command(s)
dpkg --list
Red Hat
rpm -qa
Command(s)
dpkg --status package_name
Red Hat
rpm -q package_name
172
Command(s)
Debian
Red Hat
Command(s)
dpkg --search file_name
Red Hat
Example: To see what package installed the /usr/bin/vim file on a Red Hat system:
rpm -qf /usr/bin/vim
Summing Up
In the chapters that follow, we will explore many different programs covering a wide
range of application areas. While most of these programs are commonly installed by default, we may need to install additional packages if necessary programs are not already
installed on our system. With our newfound knowledge (and appreciation) of package
173
14 Package Management
management, we should have no problem installing and managing the programs we need.
174
Further Reading
Further Reading
Spend some time getting to know the package management system for your distribution.
Each distribution provides documentation for its package management tools. In addition,
here are some more generic sources:
The Debian GNU/Linux FAQ chapter on package management provides an overview of package management on Debian systems :
http://www.debian.org/doc/FAQ/ch-pkgtools.en.html
175
15 Storage Media
15 Storage Media
In previous chapters weve looked at manipulating data at the file level. In this chapter,
we will consider data at the device level. Linux has amazing capabilities for handling
storage devices, whether physical storage, such as hard disks, or network storage, or virtual storage devices like RAID (Redundant Array of Independent Disks) and LVM (Logical Volume Manager).
However, since this is not a book about system administration, we will not try to cover
this entire topic in depth. What we will try to do is introduce some of the concepts and
key commands that are used to manage storage devices.
To carry out the exercises in this chapter, we will use a USB flash drive, a CD-RW disc
(for systems equipped with a CD-ROM burner) and a floppy disk (again, if the system is
so equipped.)
We will look at the following commands:
/
/home
/boot
/dev/shm
/dev/pts
/sys
/proc
swap
ext3
ext3
ext3
tmpfs
devpts
sysfs
proc
swap
defaults
defaults
defaults
defaults
gid=5,mode=620
defaults
defaults
defaults
1
1
1
0
0
0
0
0
1
2
2
0
0
0
0
0
Most of the file systems listed in this example file are virtual and are not applicable to our
discussion. For our purposes, the interesting ones are the first three:
LABEL=/12
LABEL=/home
LABEL=/boot
/
/home
/boot
ext3
ext3
ext3
defaults
defaults
defaults
1 1
1 2
1 2
These are the hard disk partitions. Each line of the file consists of six fields, as follows:
Table 15-1: /etc/fstab Fields
Field
Contents
Description
Device
15 Storage Media
associate a device with a text label instead. This label
(which is added to the storage media when it is
formatted) is read by the operating system when the
device is attached to the system. That way, no matter
which device file is assigned to the actual physical
device, it can still be correctly identified.
2
Mount Point
Options
Frequency
Order
178
The format of the listing is: device on mount_point type file_system_type (options). For
example, the first line shows that device /dev/sda2 is mounted as the root file system,
is of type ext3, and is both readable and writable (the option rw). This listing also has
two interesting entries at the bottom of the list. The next-to-last entry shows a 2 gigabyte
SD memory card in a card reader mounted at /media/disk, and the last entry is a network drive mounted at /misc/musicbox.
For our first experiment, we will work with a CD-ROM. First, let's look at a system before a CD-ROM is inserted:
[me@linuxbox ~]$ mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
This listing is from a CentOS 5 system, which is using LVM (Logical Volume Manager)
to create its root file system. Like many modern Linux distributions, this system will attempt to automatically mount the CD-ROM after insertion. After we insert the disc, we
see the following:
[me@linuxbox ~]$ mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/hda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/hdc on /media/live-1.0.10-8 type iso9660 (ro,noexec,nosuid,
nodev,uid=500)
After we insert the disc, we see the same listing as before with one additional entry. At
the end of the listing we see that the CD-ROM (which is device /dev/hdc on this sys179
15 Storage Media
tem) has been mounted on /media/live-1.0.10-8, and is type iso9660 (a CDROM). For purposes of our experiment, we're interested in the name of the device. When
you conduct this experiment yourself, the device name will most likely be different.
Warning: In the examples that follow, it is vitally important that you pay close attention to the actual device names in use on your system and do not use the names
used in this text!
Also note that audio CDs are not the same as CD-ROMs. Audio CDs do not contain
file systems and thus cannot be mounted in the usual sense.
Now that we have the device name of the CD-ROM drive, let's unmount the disc and remount it at another location in the file system tree. To do this, we become the superuser
(using the command appropriate for our system) and unmount the disc with the umount
(notice the spelling) command:
[me@linuxbox ~]$ su Password:
[root@linuxbox ~]# umount /dev/hdc
The next step is to create a new mount point for the disk. A mount point is simply a directory somewhere on the file system tree. Nothing special about it. It doesn't even have to
be an empty directory, though if you mount a device on a non-empty directory, you will
not be able to see the directory's previous contents until you unmount the device. For our
purposes, we will create a new directory:
[root@linuxbox ~]# mkdir /mnt/cdrom
Finally, we mount the CD-ROM at the new mount point. The -t option is used to specify
the file system type:
[root@linuxbox ~]# mount -t iso9660 /dev/hdc /mnt/cdrom
Afterward, we can examine the contents of the CD-ROM via the new mount point:
[root@linuxbox ~]# cd /mnt/cdrom
180
Why is this? The reason is that we cannot unmount a device if the device is being used by
someone or some process. In this case, we changed our working directory to the mount
point for the CD-ROM, which causes the device to be busy. We can easily remedy the issue by changing the working directory to something other than the mount point:
[root@linuxbox cdrom]# cd
[root@linuxbox ~]# umount /dev/hdc
181
15 Storage Media
This idea of buffering is used extensively in computers to make them faster. Don't
let the need to occasionally read or write data to or from slow devices impede the
speed of the system. Operating systems store data that has been read from, and is
to be written to storage devices in memory for as long as possible before actually
having to interact with the slower device. On a Linux system for example, you
will notice that the system seems to fill up memory the longer it is used. This does
not mean Linux is using all the memory, it means that Linux is taking advantage of all the available memory to do as much buffering as it can.
This buffering allows writing to storage devices to be done very quickly, because
the writing to the physical device is being deferred to a future time. In the meantime, the data destined for the device is piling up in memory. From time to time,
the operating system will write this data to the physical device.
Unmounting a device entails writing all the remaining data to the device so that it
can be safely removed. If the device is removed without unmounting it first, the
possibility exists that not all the data destined for the device has been transferred.
In some cases, this data may include vital directory updates, which will lead to
file system corruption, one of the worst things that can happen on a computer.
The contents of this listing reveal some patterns of device naming. Here are a few:
182
Device
/dev/fd*
/dev/hd*
/dev/lp*
Printers.
/dev/sd*
SCSI disks. On recent Linux systems, the kernel treats all disklike devices (including PATA/SATA hard disks, flash drives, and
USB mass storage devices such as portable music players, and
digital cameras) as SCSI disks. The rest of the naming system is
similar to the older /dev/hd* naming scheme described above.
/dev/sr*
The last few lines of the file will be displayed and then pause. Next, plug in the removable device. In this example, we will use a 16 MB flash drive. Almost immediately, the
kernel will notice the device and probe it:
183
15 Storage Media
Jul 23 10:07:53 linuxbox kernel: usb 3-2: new full speed USB device
using uhci_hcd and address 2
Jul 23 10:07:53 linuxbox kernel: usb 3-2: configuration #1 chosen
from 1 choice
Jul 23 10:07:53 linuxbox kernel: scsi3 : SCSI emulation for USB Mass
Storage devices
Jul 23 10:07:58 linuxbox kernel: scsi scan: INQUIRY result too short
(5), using 36
Jul 23 10:07:58 linuxbox kernel: scsi 3:0:0:0: Direct-Access
Easy
Disk
1.00 PQ: 0 ANSI: 2
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] 31263 512-byte
hardware sectors (16 MB)
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Write Protect is
off
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Assuming drive
cache: write through
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] 31263 512-byte
hardware sectors (16 MB)
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Write Protect is
off
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Assuming drive
cache: write through
Jul 23 10:07:59 linuxbox kernel: sdb: sdb1
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Attached SCSI
removable disk
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: Attached scsi generic
sg3 type 0
After the display pauses again, press Ctrl-c to get the prompt back. The interesting parts
of the output are the repeated references to [sdb] which matches our expectation of a
SCSI disk device name. Knowing this, two lines become particularly illuminating:
Jul 23 10:07:59 linuxbox kernel: sdb: sdb1
Jul 23 10:07:59 linuxbox kernel: sd 3:0:0:0: [sdb] Attached SCSI
removable disk
This tells us the device name is /dev/sdb for the entire device and /dev/sdb1 for
the first partition on the device. As we have seen, working with Linux is full of interesting detective work!
Tip: Using the tail -f /var/log/messages technique is a great way to
watch what the system is doing in near real-time.
With our device name in hand, we can now mount the flash drive:
184
The device name will remain the same as long as it remains physically attached to the
computer and the computer is not rebooted.
Notice that we must specify the device in terms of the entire device, not by partition number. After the program starts up, we will see the following prompt:
185
15 Storage Media
Command (m for help):
The first thing we want to do is examine the existing partition layout. We do this by en tering p to print the partition table for the device:
Command (m for help): p
Disk /dev/sdb: 16 MB, 16006656 bytes
1 heads, 31 sectors/track, 1008 cylinders
Units = cylinders of 31 * 512 = 15872 bytes
Device Boot
/dev/sdb1
Start
2
End
1008
Blocks
15608+
Id
b
System
W95 FAT32
In this example, we see a 16 MB device with a single partition (1) that uses 1006 of the
available 1008 cylinders on the device. The partition is identified as a Windows 95
FAT32 partition. Some programs will use this identifier to limit the kinds of operation
that can be done to the disk, but most of the time it is not critical to change it. However,
186
If we enter l at the prompt, a large list of possible types is displayed. Among them we
see b for our existing partition type and 83 for Linux.
Going back to the menu, we see this choice to change a partition ID:
t
This completes all the changes that we need to make. Up to this point, the device has
been untouched (all the changes have been stored in memory, not on the physical device),
so we will write the modified partition table to the device and exit. To do this, we enter
w at the prompt:
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.
Syncing disks.
[me@linuxbox ~]$
If we had decided to leave the device unaltered, we could have entered q at the prompt,
187
15 Storage Media
which would have exited the program without writing the changes. We can safely ignore
the ominous sounding warning message.
The program will display a lot of information when ext3 is the chosen file system type.
To re-format the device to its original FAT32 file system, specify vfat as the file system
type:
[me@linuxbox ~]$ sudo mkfs -t vfat /dev/sdb1
This process of partitioning and formatting can be used anytime additional storage devices are added to the system. While we worked with a tiny flash drive, the same process
188
In my experience, file system corruption is quite rare unless there is a hardware problem,
such as a failing disk drive. On most systems, file system corruption detected at boot time
will cause the system to stop and direct you to run fsck before continuing.
15 Storage Media
drives, we can manage those devices, too. Preparing a blank floppy for use is a two step
process. First, we perform a low-level format on the diskette, and then create a file system. To accomplish the formatting, we use the fdformat program specifying the name
of the floppy device (usually /dev/fd0):
[me@linuxbox ~]$ sudo fdformat /dev/fd0
Double-sided, 80 tracks, 18 sec/track. Total capacity 1440 kB.
Formatting ... done
Verifying ... done
Notice that we use the msdos file system type to get the older (and smaller) style file
allocation tables. After a diskette is prepared, it may be mounted like other devices.
Lets say we had two USB flash drives of the same size and we wanted to exactly copy
the first drive to the second. If we attached both drives to the computer and they are as signed to devices /dev/sdb and /dev/sdc respectively, we could copy everything on
the first drive to the second drive with the following:
dd if=/dev/sdb of=/dev/sdc
190
Warning! The dd command is very powerful. Though its name derives from data
definition, it is sometimes called destroy disk because users often mistype either
the if or of specifications. Always double check your input and output specifications before pressing enter!
This technique works for data DVDs as well, but will not work for audio CDs, as they do
not use a file system for storage. For audio CDs, look at the cdrdao command.
15 Storage Media
genisoimage -o cd-rom.iso -R -J ~/cd-rom-files
The -R option adds metadata for the Rock Ridge extensions, which allows the use of
long filenames and POSIX style file permissions. Likewise, the -J option enables the
Joliet extensions, which permit long filenames for Windows.
In the example above, we created a mount point named /mnt/iso_image and then
mounted the image file image.iso at that mount point. After the image is mounted, it
can be treated just as though it were a real CD-ROM or DVD. Remember to unmount the
image when it is no longer needed.
192
Writing An Image
To write an image, we again use wodim, specifying the name of the optical media writer
device and the name of the image file:
wodim dev=/dev/cdrw image.iso
In addition to the device name and image file, wodim supports a very large set of options. Two common ones are -v for verbose output, and -dao, which writes the disc in
disc-at-once mode. This mode should be used if you are preparing a disc for commercial
reproduction. The default mode for wodim is track-at-once, which is useful for recording
music tracks.
Summing Up
In this chapter we have looked at the basic storage management tasks. There are, of
course, many more. Linux supports a vast array of storage devices and file system
schemes. It also offers many features for interoperability with other systems.
Further Reading
Take a look at the man pages of the commands we have covered. Some of them support
huge numbers of options and operations. Also, look for on-line tutorials for adding hard
drives to your Linux system (there are many) and working with optical media.
Extra Credit
Its often useful to verify the integrity of an iso image that we have downloaded. In most
cases, a distributor of an iso image will also supply a checksum file. A checksum is the result of an exotic mathematical calculation resulting in a number that represents the con193
15 Storage Media
tent of the target file. If the contents of the file change by even one bit, the resulting
checksum will be much different. The most common method of checksum generation
uses the md5sum program. When you use md5sum, it produces a unique hexadecimal
number:
md5sum image.iso
34e354760f9bb7fbf85c96f6a3f94ece
image.iso
After you download an image, you should run md5sum against it and compare the results
with the md5sum value supplied by the publisher.
In addition to checking the integrity of a downloaded file, we can use md5sum to verify
newly written optical media. To do this, we first calculate the checksum of the image file
and then calculate a checksum for the media. The trick to verifying the media is to limit
the calculation to only the portion of the optical media that contains the image. We do this
by determining the number of 2048 byte blocks the image contains (optical media is always written in 2048 byte blocks) and reading that many blocks from the media. On
some types of media, this is not required. A CD-R written in disc-at-once mode can be
checked this way:
md5sum /dev/cdrom
34e354760f9bb7fbf85c96f6a3f94ece
/dev/cdrom
Many types of media, such as DVDs, require a precise calculation of the number of
blocks. In the example below, we check the integrity of the image file dvd-image.iso
and the disc in the DVD reader /dev/dvd. Can you figure out how this works?
md5sum dvd-image.iso; dd if=/dev/dvd bs=2048 count=$(( $(stat -c "%s"
dvd-image.iso) / 2048 )) | md5sum
194
16 Networking
16 Networking
When it comes to networking, there is probably nothing that cannot be done with Linux.
Linux is used to build all sorts of networking systems and appliances, including firewalls,
routers, name servers, NAS (Network Attached Storage) boxes and on and on.
Just as the subject of networking is vast, so are the number of commands that can be used
to configure and control it. We will focus our attention on just a few of the most frequently used ones. The commands chosen for examination include those used to monitor
networks and those used to transfer files. In addition, we are going to explore the ssh
program that is used to perform remote logins. This chapter will cover:
netstat - Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships
Were going to assume a little background in networking. In this, the Internet age, everyone using a computer needs a basic understanding of networking concepts. To make full
use of this chapter we should be familiar with the following terms:
Please see the Further Reading section below for some useful articles regarding these
terms.
Note: Some of the commands we will cover may (depending on your distribution)
require the installation of additional packages from your distributions repositories,
195
16 Networking
and some may require superuser privileges to execute.
ping
The most basic network command is ping. The ping command sends a special network
packet called an IMCP ECHO_REQUEST to a specified host. Most network devices receiving this packet will reply to it, allowing the network connection to be verified.
Note: It is possible to configure most network devices (including Linux hosts) to
ignore these packets. This is usually done for security reasons, to partially obscure
a host from a potential attacker. It is also common for firewalls to be configured to
block IMCP traffic.
For example, to see if we can reach linuxcommand.org (one of our favorite sites ;-),
we can use use ping like this:
[me@linuxbox ~]$ ping linuxcommand.org
Once started, ping continues to send packets at a specified interval (default is one second) until it is interrupted:
[me@linuxbox ~]$ ping linuxcommand.org
PING linuxcommand.org (66.35.250.210) 56(84) bytes of data.
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=1
ttl=43 time=107 ms
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=2
ttl=43 time=108 ms
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=3
ttl=43 time=106 ms
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=4
ttl=43 time=106 ms
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=5
ttl=43 time=105 ms
64 bytes from vhost.sourceforge.net (66.35.250.210): icmp_seq=6
196
After it is interrupted (in this case after the sixth packet) by pressing Ctrl-c, ping
prints performance statistics. A properly performing network will exhibit zero percent
packet loss. A successful ping will indicate that the elements of the network (its interface cards, cabling, routing, and gateways) are in generally good working order.
traceroute
The traceroute program (some systems use the similar tracepath program instead) displays a listing of all the hops network traffic takes to get from the local system to a specified host. For example, to see the route taken to reach slashdot.org,
we would do this:
[me@linuxbox ~]$ traceroute slashdot.org
197
16 Networking
pos-0-7-3-1.newyork.savvis.net (204.70.195.93) 19.634 ms
12 cr2-pos-0-7-3-0.chicago.savvis.net (204.70.192.109) 41.586 ms
42.843 ms cr2-tengig-0-0-2-0.chicago.savvis.net (204.70.196.242)
43.115 ms
13 hr2-tengigabitethernet-12-1.elkgrovech3.savvis.net
(204.70.195.122) 44.215 ms 41.833 ms 45.658 ms
14 csr1-ve241.elkgrovech3.savvis.net (216.64.194.42) 46.840 ms
43.372 ms 47.041 ms
15 64.27.160.194 (64.27.160.194) 56.137 ms 55.887 ms 52.810 ms
16 slashdot.org (216.34.181.45) 42.727 ms 42.016 ms 41.437 ms
In the output, we can see that connecting from our test system to slashdot.org requires traversing sixteen routers. For routers that provided identifying information, we
see their hostnames, IP addresses, and performance data, which includes three samples of
round-trip time from the local system to the router. For routers that do not provide identifying information (because of router configuration, network congestion, firewalls, etc.),
we see asterisks as in the line for hop number 2.
netstat
The netstat program is used to examine various network settings and statistics.
Through the use of its many options, we can look at a variety of features in our network
setup. Using the -ie option, we can examine the network interfaces in our system:
[me@linuxbox ~]$ netstat -ie
eth0
Link encap:Ethernet HWaddr 00:1d:09:9b:99:67
inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::21d:9ff:fe9b:9967/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:238488 errors:0 dropped:0 overruns:0 frame:0
TX packets:403217 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:153098921 (146.0 MB) TX bytes:261035246 (248.9 MB)
Memory:fdfc0000-fdfe0000
lo
In the example above, we see that our test system has two network interfaces. The first,
198
Flags
*
255.255.255.0 U
192.168.1.1 0.0.0.0
UG
MSS Window
0 0
0 0
irtt Iface
0 eth0
0 eth0
In this simple example, we see a typical routing table for a client machine on a LAN (Local Area Network) behind a firewall/router. The first line of the listing shows the destination 192.168.1.0. IP addresses that end in zero refer to networks rather than individual hosts, so this destination means any host on the LAN. The next field, Gateway, is
the name or IP address of the gateway (router) used to go from the current host to the destination network. An asterisk in this field indicates that no gateway is needed.
The last line contains the destination default. This means any traffic destined for a
network that is not otherwise listed in the table. In our example, we see that the gateway
is defined as a router with the address of 192.168.1.1, which presumably knows what
to do with the destination traffic.
The netstat program has many options and we have only looked at a couple. Check
out the netstat man page for a complete list.
ftp
One of the true classic programs, ftp gets it name from the protocol it uses, the File
Transfer Protocol. FTP is used widely on the Internet for file downloads. Most, if not all,
199
16 Networking
web browsers support it and you often see URIs starting with the protocol ftp://.
Before there were web browsers, there was the ftp program. ftp is used to communicate with FTP servers, machines that contain files that can be uploaded and downloaded
over a network.
FTP (in its original form) is not secure, because it sends account names and passwords in
cleartext. This means that they are not encrypted and anyone sniffing the network can see
them. Because of this, almost all FTP done over the Internet is done by anonymous FTP
servers. An anonymous server allows anyone to login using the login name anonymous
and a meaningless password.
In the example below, we show a typical session with the ftp program downloading an
Ubuntu iso image located in the /pub/cd_images/Ubuntu-8.04 directory of the
anonymous FTP server fileserver:
[me@linuxbox ~]$ ftp fileserver
Connected to fileserver.localdomain.
220 (vsFTPd 2.0.1)
Name (fileserver:me): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd pub/cd_images/Ubuntu-8.04
250 Directory successfully changed.
ftp> ls
200 PORT command successful. Consider using PASV.
150 Here comes the directory listing.
-rw-rw-r-1 500
500
733079552 Apr 25 03:53 ubuntu-8.04desktop-i386.iso
226 Directory send OK.
ftp> lcd Desktop
Local directory now /home/me/Desktop
ftp> get ubuntu-8.04-desktop-i386.iso
local: ubuntu-8.04-desktop-i386.iso remote: ubuntu-8.04-desktopi386.iso
200 PORT command successful. Consider using PASV.
150 Opening BINARY mode data connection for ubuntu-8.04-desktopi386.iso (733079552 bytes).
226 File send OK.
733079552 bytes received in 68.56 secs (10441.5 kB/s)
ftp> bye
200
Meaning
ftp fileserver
anonymous
cd pub/cd_images/Ubuntu-8.04
ls
lcd Desktop
get ubuntu-8.04-desktopi386.iso
bye
Typing help at the ftp> prompt will display a list of the supported commands. Using
ftp on a server where sufficient permissions have been granted, it is possible to perform
201
16 Networking
many ordinary file management tasks. Its clumsy, but it does work.
wget
Another popular command-line program for file downloading is wget. It is useful for
downloading content from both web and FTP sites. Single files, multiple files, and even
entire sites can be downloaded. To download the first page of linuxcommand.org we
could do this:
[me@linuxbox ~]$ wget http://linuxcommand.org/index.php
--11:02:51-- http://linuxcommand.org/index.php
=> `index.php'
Resolving linuxcommand.org... 66.35.250.210
Connecting to linuxcommand.org|66.35.250.210|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=>
] 3,120
--.--K/s
The program's many options allow wget to recursively download, download files in the
background (allowing you to log off but continue downloading), and complete the download of a partially downloaded file. These features are well documented in its better-thanaverage man page.
ssh
To address this problem, a new protocol called SSH (Secure Shell) was developed. SSH
solves the two basic problems of secure communication with a remote host. First, it authenticates that the remote host is who it says it is (thus preventing so-called man in the
middle attacks), and second, it encrypts all of the communications between the local and
remote hosts.
SSH consists of two parts. An SSH server runs on the remote host, listening for incoming
connections on port 22, while an SSH client is used on the local system to communicate
with the remote server.
Most Linux distributions ship an implementation of SSH called OpenSSH from the
OpenBSD project. Some distributions include both the client and the server packages by
default (for example, Red Hat), while others (such as Ubuntu) only supply the client. To
enable a system to receive remote connections, it must have the OpenSSH-server
package installed, configured and running, and (if the system is either running or is behind a firewall) it must allow incoming network connections on TCP port 22.
Tip: If you dont have a remote system to connect to but want to try these examples, make sure the OpenSSH-server package is installed on your system and
use localhost as the name of the remote host. That way, your machine will create network connections with itself.
The SSH client program used to connect to remote SSH servers is called, appropriately
enough, ssh. To connect to a remote host named remote-sys, we would use the ssh
client program like so:
[me@linuxbox ~]$ ssh remote-sys
The authenticity of host 'remote-sys (192.168.1.4)' can't be
established.
RSA key fingerprint is
41:ed:7a:df:23:19:bf:3c:a5:17:bc:61:b3:7f:d9:bb.
Are you sure you want to continue connecting (yes/no)?
The first time the connection is attempted, a message is displayed indicating that the authenticity of the remote host cannot be established. This is because the client program has
never seen this remote host before. To accept the credentials of the remote host, enter
yes when prompted. Once the connection is established, the user is prompted for
his/her password:
203
16 Networking
Warning: Permanently added 'remote-sys,192.168.1.4' (RSA) to the list
of known hosts.
me@remote-sys's password:
After the password is successfully entered, we receive the shell prompt from the remote
system:
Last login: Sat Aug 30 13:00:48 2008
[me@remote-sys ~]$
The remote shell session continues until the user enters the exit command at the remote
shell prompt, thereby closing the remote connection. At this point, the local shell session
resumes and the local shell prompt reappears.
It is also possible to connect to remote systems using a different username. For example,
if the local user me had an account named bob on a remote system, user me could log
in to the account bob on the remote system as follows:
[me@linuxbox ~]$ ssh bob@remote-sys
bob@remote-sys's password:
Last login: Sat Aug 30 13:03:21 2008
[bob@remote-sys ~]$
As stated before, ssh verifies the authenticity of the remote host. If the remote host does
not successfully authenticate, the following message appears:
[me@linuxbox ~]$ ssh remote-sys
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!
@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle
attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
41:ed:7a:df:23:19:bf:3c:a5:17:bc:61:b3:7f:d9:bb.
Please contact your system administrator.
Add correct host key in /home/me/.ssh/known_hosts to get rid of this
message.
Offending key in /home/me/.ssh/known_hosts:1
RSA host key for remote-sys has changed and you have requested strict
204
This message is caused by one of two possible situations. First, an attacker may be attempting a man-in-the-middle attack. This is rare, since everybody knows that ssh
alerts the user to this. The more likely culprit is that the remote system has been changed
somehow; for example, its operating system or SSH server has been reinstalled. In the interests of security and safety however, the first possibility should not be dismissed out of
hand. Always check with the administrator of the remote system when this message occurs.
After it has been determined that the message is due to a benign cause, it is safe to correct
the problem on the client side. This is done by using a text editor (vim perhaps) to remove the obsolete key from the ~/.ssh/known_hosts file. In the example message
above, we see this:
Offending key in /home/me/.ssh/known_hosts:1
This means that line one of the known_hosts file contains the offending key. Delete
this line from the file, and the ssh program will be able to accept new authentication credentials from the remote system.
Besides opening a shell session on a remote system, ssh also allows us to execute a single command on a remote system. For example, to execute the free command on a remote host named remote-sys and have the results displayed on the local system:
[me@linuxbox ~]$ ssh remote-sys free
me@twin4's password:
total
used
free
Mem:
775536
507184
268352
-/+ buffers/cache:
Swap:
1572856
[me@linuxbox ~]$
242520
0
533016
1572856
shared
buffers
cached
110068
154596
Its possible to use this technique in more interesting ways, such as this example in which
we perform an ls on the remote system and redirect the output to a file on the local system:
205
16 Networking
[me@linuxbox ~]$ ssh remote-sys 'ls *' > dirlist.txt
me@twin4's password:
[me@linuxbox ~]$
Notice the use of the single quotes in the command above. This is done because we do
not want the pathname expansion performed on the local machine; rather, we want it to
be performed on the remote system. Likewise, if we had wanted the output redirected to a
file on the remote machine, we could have placed the redirection operator and the filename within the single quotes:
[me@linuxbox ~]$ ssh remote-sys 'ls * > dirlist.txt'
206
5.5KB/s
00:00
As with ssh, you may apply a username to the beginning of the remote hosts name if
the desired remote host account name does not match that of the local system:
[me@linuxbox ~]$ scp bob@remote-sys:document.txt .
The second SSH file-copying program is sftp which, as its name implies, is a secure replacement for the ftp program. sftp works much like the original ftp program that
we used earlier; however, instead of transmitting everything in cleartext, it uses an SSH
encrypted tunnel. sftp has an important advantage over conventional ftp in that it does
not require an FTP server to be running on the remote host. It only requires the SSH
server. This means that any remote machine that can connect with the SSH client can also
be used as a FTP-like server. Here is a sample session:
[me@linuxbox ~]$ sftp remote-sys
Connecting to remote-sys...
me@remote-sys's password:
sftp> ls
ubuntu-8.04-desktop-i386.iso
sftp> lcd Desktop
sftp> get ubuntu-8.04-desktop-i386.iso
Fetching /home/me/ubuntu-8.04-desktop-i386.iso to ubuntu-8.04desktop-i386.iso
/home/me/ubuntu-8.04-desktop-i386.iso 100% 699MB
7.4MB/s
01:35
sftp> bye
207
16 Networking
Tip: The SFTP protocol is supported by many of the graphical file managers found
in Linux distributions. Using either Nautilus (GNOME) or Konqueror (KDE), we
can enter a URI beginning with sftp:// into the location bar and operate on files
stored on a remote system running an SSH server.
Summing Up
In this chapter, we have surveyed the field of networking tools found on most Linux systems. Since Linux is so widely used in servers and networking appliances, there are many
more that can be added by installing additional software. But even with the basic set of
tools, it is possible to perform many useful network related tasks.
Further Reading
For a broad (albeit dated) look at network administration, the Linux Documentation Project provides the Linux Network Administrators Guide:
http://tldp.org/LDP/nag2/index.html
Wikipedia contains many good networking articles. Here are some of the basics:
http://en.wikipedia.org/wiki/Internet_protocol_address
http://en.wikipedia.org/wiki/Host_name
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
208
We will also look at a command that is often used with file-search commands to process
the resulting list of files:
locate will search its database of pathnames and output any that contain the string
209
If the search requirement is not so simple, locate can be combined with other tools
such as grep to design more interesting searches:
[me@linuxbox ~]$ locate zip | grep bin
/bin/bunzip2
/bin/bzip2
/bin/bzip2recover
/bin/gunzip
/bin/gzip
/usr/bin/funzip
/usr/bin/gpg-zip
/usr/bin/preunzip
/usr/bin/prezip
/usr/bin/prezip-bin
/usr/bin/unzip
/usr/bin/unzipsfx
/usr/bin/zip
/usr/bin/zipcloak
/usr/bin/zipgrep
/usr/bin/zipinfo
/usr/bin/zipnote
/usr/bin/zipsplit
The locate program has been around for a number of years, and there are several different variants in common use. The two most common ones found in modern Linux distributions are slocate and mlocate, though they are usually accessed by a symbolic
link named locate. The different versions of locate have overlapping options sets.
Some versions include regular expression matching (which well cover in an upcoming
chapter) and wildcard support. Check the man page for locate to determine which version of locate is installed.
210
On most active user accounts, this will produce a large list. Since the list is sent to standard output, we can pipe the list into other programs. Lets use wc to count the number of
files:
[me@linuxbox ~]$ find ~ | wc -l
47068
Wow, weve been busy! The beauty of find is that it can be used to identify files that
meet specific criteria. It does this through the (slightly strange) application of options,
tests, and actions. Well look at the tests first.
211
Tests
Lets say that we want a list of directories from our search. To do this, we could add the
following test:
[me@linuxbox ~]$ find ~ -type d | wc -l
1695
Adding the test -type d limited the search to directories. Conversely, we could have
limited the search to regular files with this test:
[me@linuxbox ~]$ find ~ -type f | wc -l
38737
Description
Directory
Regular file
Symbolic link
We can also search by file size and filename by adding some additional tests: Lets look
for all the regular files that match the wildcard pattern *.JPG and are larger than one
megabyte:
[me@linuxbox ~]$ find ~ -type f -name "*.JPG" -size +1M | wc -l
840
In this example, we add the -name test followed by the wildcard pattern. Notice how we
enclose it in quotes to prevent pathname expansion by the shell. Next, we add the -size
test followed by the string +1M. The leading plus sign indicates that we are looking for
files larger than the specified number. A leading minus sign would change the meaning of
212
Unit
Bytes
2-byte words
find supports a large number of different tests. Below is a rundown of the common
ones. Note that in cases where a numeric argument is required, the same + and - notation discussed above can be applied:
Table 17-3: find Tests
Test
Description
-cmin n
-cnewer file
-ctime n
-empty
-group name
-iname pattern
-inum n
-mtime n
-name pattern
-newer file
-nouser
-nogroup
-perm mode
-samefile name
-size n
-type c
-user name
This is not a complete list. The find man page has all the details.
Operators
Even with all the tests that find provides, we may still need a better way to describe the
logical relationships between the tests. For example, what if we needed to determine if
all the files and subdirectories in a directory had secure permissions? We would look for
all the files with permissions that are not 0600 and the directories with permissions that
are not 0700. Fortunately, find provides a way to combine tests using logical operators
214
Yikes! That sure looks weird. What is all this stuff? Actually, the operators are not that
complicated once you get to know them. Here is the list:
Table 17-4: find Logical Operators
Operator
Description
-and
-or
-not
( )
With this list of operators in hand, lets deconstruct our find command. When viewed
from the uppermost level, we see that our tests are arranged as two groupings separated
by an -or operator:
( expression 1 ) -or ( expression 2 )
This makes sense, since we are searching for files with a certain set of permissions and
for directories with a different set. If we are looking for both files and directories, why do
215
Operator
expr2 is...
True
-and
Always performed
False
-and
Never performed
True
-or
Never performed
False
-or
Always performed
Why does this happen? Its done to improve performance. Take -and, for example. We
know that the expression expr1 -and expr2 cannot be true if the result of expr1 is
216
Predefined Actions
Lets get some work done! Having a list of results from our find command is useful, but
what we really want to do is act on the items on the list. Fortunately, find allows actions
to be performed based on the search results. There are a set of predefined actions and several ways to apply user-defined actions. First lets look at a few of the predefined actions:
Table 17-6: Predefined find Actions
Action
Description
-delete
-ls
-quit
As with the tests, there are many more actions. See the find man page for full details.
In our very first example, we did this:
find ~
which produced a list of every file and subdirectory contained within our home directory.
It produced a list because the -print action is implied if no other action is specified.
Thus our command could also be expressed as:
find ~ -print
We can use find to delete files that meet certain criteria. For example, to delete files that
217
In this example, every file in the users home directory (and its subdirectories) is searched
for filenames ending in .BAK. When they are found, they are deleted.
Warning: It should go without saying that you should use extreme caution when
using the -delete action. Always test the command first by substituting the
-print action for -delete to confirm the search results.
Before we go on, lets take another look at how the logical operators affect actions. Consider the following command:
find ~ -type f -name '*.BAK' -print
As we have seen, this command will look for every regular file (-type f) whose name
ends with .BAK (-name '*.BAK') and will output the relative pathname of each
matching file to standard output (-print). However, the reason the command performs
the way it does is determined by the logical relationships between each of the tests and
actions. Remember, there is, by default, an implied -and relationship between each test
and action. We could also express the command this way to make the logical relationships easier to see:
find ~ -type f -and -name '*.BAK' -and -print
With our command fully expressed, lets look at how the logical operators affect its execution:
Test/Action
-print
-name *.BAK
-type f is true
-type f
218
Since the logical relationship between the tests and actions determines which of them are
performed, we can see that the order of the tests and actions is important. For instance, if
we were to reorder the tests and actions so that the -print action was the first one, the
command would behave much differently:
find ~ -print -and -type f -and -name '*.BAK'
This version of the command will print each file (the -print action always evaluates to
true) and then test for file type and the specified file extension.
User-Defined Actions
In addition to the predefined actions, we can also invoke arbitrary commands. The traditional way of doing this is with the -exec action. This action works like this:
-exec command {} ;
where command is the name of a command, {} is a symbolic representation of the current
pathname, and the semicolon is a required delimiter indicating the end of the command.
Heres an example of using -exec to act like the -delete action discussed earlier:
-exec rm '{}' ';'
Again, since the brace and semicolon characters have special meaning to the shell, they
must be quoted or escaped.
Its also possible to execute a user-defined action interactively. By using the -ok action
in place of -exec, the user is prompted before execution of each specified command:
find ~ -type f -name 'foo*' -ok ls -l '{}' ';'
< ls ... /home/me/bin/foo > ? y
-rwxr-xr-x 1 me
me 224 2007-10-29 18:44 /home/me/bin/foo
< ls ... /home/me/foo.txt > ? y
-rw-r--r-- 1 me
me
0 2008-09-19 12:53 /home/me/foo.txt
In this example, we search for files with names starting with the string foo and execute
the command ls -l each time one is found. Using the -ok action prompts the user before the ls command is executed.
219
Improving Efficiency
When the -exec action is used, it launches a new instance of the specified command
each time a matching file is found. There are times when we might prefer to combine all
of the search results and launch a single instance of the command. For example, rather
than executing the commands like this:
ls -l file1
ls -l file2
we may prefer to execute them this way:
ls -l file1 file2
thus causing the command to be executed only one time rather than multiple times. There
are two ways we can do this. The traditional way, using the external command xargs
and the alternate way, using a new feature in find itself. Well talk about the alternate
way first.
By changing the trailing semicolon character to a plus sign, we activate the ability of
find to combine the results of the search into an argument list for a single execution of
the desired command. Going back to our example, this:
find ~ -type f -name 'foo*' -exec ls -l '{}' ';'
-rwxr-xr-x 1 me
me 224 2007-10-29 18:44 /home/me/bin/foo
-rw-r--r-- 1 me
me
0 2008-09-19 12:53 /home/me/foo.txt
will execute ls each time a matching file is found. By changing the command to:
find ~ -type f -name 'foo*' -exec ls -l '{}' +
-rwxr-xr-x 1 me
me 224 2007-10-29 18:44 /home/me/bin/foo
-rw-r--r-- 1 me
me
0 2008-09-19 12:53 /home/me/foo.txt
we get the same results, but the system only has to execute the ls command once.
xargs
The xargs command performs an interesting function. It accepts input from standard input and converts it into an argument list for a specified command. With our example, we
would use it like this:
220
Here we see the output of the find command piped into xargs which, in turn, constructs an argument list for the ls command and then executes it.
Note: While the number of arguments that can be placed into a command line is
quite large, its not unlimited. It is possible to create commands that are too long for
the shell to accept. When a command line exceeds the maximum length supported
by the system, xargs executes the specified command with the maximum number
of arguments possible and then repeats this process until standard input is exhausted. To see the maximum size of the command line, execute xargs with the
--show-limits option.
Marvel in the power of the command line! With these two lines, we created a playground
directory containing 100 subdirectories each containing 26 empty files. Try that with the
GUI!
The method we employed to accomplish this magic involved a familiar command
(mkdir), an exotic shell expansion (braces) and a new command, touch. By combining
mkdir with the -p option (which causes mkdir to create the parent directories of the
specified paths) with brace expansion, we were able to create 100 subdirectories.
The touch command is usually used to set or update the access, change, and modify
times of files. However, if a filename argument is that of a nonexistent file, an empty file
is created.
In our playground, we created 100 instances of a file named file-A. Lets find them:
[me@linuxbox ~]$ find playground -type f -name 'file-A'
Note that unlike ls, find does not produce results in sorted order. Its order is determined by the layout of the storage device. We can confirm that we actually have 100 instances of the file this way:
[me@linuxbox ~]$ find playground -type f -name 'file-A' | wc -l
100
Next, lets look at finding files based on their modification times. This will be helpful
when creating backups or organizing files in chronological order. To do this, we will first
create a reference file against which we will compare modification time:
[me@linuxbox ~]$ touch playground/timestamp
This creates an empty file named timestamp and sets its modification time to the current time. We can verify this by using another handy command, stat, which is a kind of
souped-up version of ls. The stat command reveals all that the system understands
about a file and its attributes:
222
If we touch the file again and then examine it with stat, we will see that the files
times have been updated.:
[me@linuxbox ~]$ touch playground/timestamp
[me@linuxbox ~]$ stat playground/timestamp
File: `playground/timestamp'
Size: 0
Blocks: 0
IO Block: 4096 regular empty file
Device: 803h/2051d Inode: 14265061 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1001/ me)
Gid: ( 1001/ me)
Access: 2008-10-08 15:23:33.000000000 -0400
Modify: 2008-10-08 15:23:33.000000000 -0400
Change: 2008-10-08 15:23:33.000000000 -0400
This updates all files in the playground named file-B. Next well use find to identify
the updated files by comparing all the files to the reference file timestamp:
[me@linuxbox ~]$ find playground -type f -newer playground/timestamp
The results contain all 100 instances of file-B. Since we performed a touch on all the
files in the playground named file-B after we updated timestamp, they are now
newer than timestamp and thus can be identified with the -newer test.
Finally, lets go back to the bad permissions test we performed earlier and apply it to
playground:
223
This command lists all 100 directories and 2600 files in playground (as well as
timestamp and playground itself, for a total of 2702) because none of them meets
our definition of good permissions. With our knowledge of operators and actions, we
can add actions to this command to apply new permissions to the files and directories in
our playground:
[me@linuxbox ~]$ find playground \( -type f -not -perm 0600 -exec
chmod 0600 '{}' ';' \) -or \( -type d -not -perm 0700 -exec chmod
0700 '{}' ';' \)
On a day-to-day basis, we might find it easier to issue two commands, one for the directories and one for the files, rather than this one large compound command, but its nice to
know that we can do it this way. The important point here is to understand how the operators and actions can be used together to perform useful tasks.
Options
Finally, we have the options. The options are used to control the scope of a find search.
They may be included with other tests and actions when constructing find expressions.
Here is a list of the most commonly used ones:
Table 17-7: find Options
Option
-depth
Description
-maxdepth levels
-mindepth levels
-mount
224
Summing Up
It's easy to see that locate is as simple as find is complicated. They both have their
uses. Take the time to explore the many features of find. It can, with regular use, improve your understanding of Linux files system operations.
Further Reading
The locate, updatedb, find, and xargs programs are all part the GNU
Projects findutils package. The GNU Project provides a website with extensive
on-line documentation, which is quite good and should be read if you are using
these programs in high security environments:
http://www.gnu.org/software/findutils/
225
Compressing Files
Throughout the history of computing, there has been a struggle to get the most data into
the smallest available space, whether that space be memory, storage devices, or network
bandwidth. Many of the data services that we take for granted today, such as portable music players, high definition television, or broadband Internet, owe their existence to effective data compression techniques.
Data compression is the process of removing redundancy from data. Lets consider an
imaginary example. Say we had an entirely black picture file with the dimensions of 100
pixels by 100 pixels. In terms of data storage (assuming 24 bits, or 3 bytes per pixel), the
image will occupy 30,000 bytes of storage:
100 * 100 * 3 = 30,000
An image that is all one color contains entirely redundant data. If we were clever, we
could encode the data in such a way that we simply describe the fact that we have a block
226
Compressing Files
of 10,000 black pixels. So, instead of storing a block of data containing 30,000 zeros
(black is usually represented in image files as zero), we could compress the data into the
number 10,000, followed by a zero to represent our data. Such a data compression
scheme is called run-length encoding and is one of the most rudimentary compression
techniques. Todays techniques are much more advanced and complex but the basic goal
remains the sameget
rid of redundant data.
Compression algorithms (the mathematical techniques used to carry out the compression)
fall into two general categories, lossless and lossy. Lossless compression preserves all the
data contained in the original. This means that when a file is restored from a compressed
version, the restored file is exactly the same as the original, uncompressed version. Lossy
compression, on the other hand, removes data as the compression is performed, to allow
more compression to be applied. When a lossy file is restored, it does not match the original version; rather, it is a close approximation. Examples of lossy compression are JPEG
(for images) and MP3 (for music). In our discussion, we will look exclusively at lossless
compression, since most data on computers cannot tolerate any data loss.
gzip
The gzip program is used to compress one or more files. When executed, it replaces the
original file with a compressed version of the original. The corresponding gunzip program is used to restore compressed files to their original, uncompressed form. Here is an
example:
[me@linuxbox
[me@linuxbox
-rw-r--r-- 1
[me@linuxbox
[me@linuxbox
-rw-r--r-- 1
[me@linuxbox
[me@linuxbox
-rw-r--r-- 1
~]$
~]$
me
~]$
~]$
me
~]$
~]$
me
In this example, we create a text file named foo.txt from a directory listing. Next, we
run gzip, which replaces the original file with a compressed version named foo.txt.gz. In the directory listing of foo.*, we see that the original file has been replaced
with the compressed version, and that the compressed version is about one-fifth the size
of the original. We can also see that the compressed file has the same permissions and
timestamp as the original.
Next, we run the gunzip program to uncompress the file. Afterward, we can see that the
compressed version of the file has been replaced with the original, again with the permis227
Description
-c
Write output to standard output and keep original files. May also be
specified with --stdout and --to-stdout.
-d
-f
-h
-l
-r
-t
-v
-number
Here, we replaced the file foo.txt with a compressed version named foo.txt.gz.
Next, we tested the integrity of the compressed version, using the -t and -v options. Fi228
Compressing Files
nally, we decompressed the file back to its original form.
gzip can also be used in interesting ways via standard input and output:
[me@linuxbox ~]$ ls -l /etc | gzip > foo.txt.gz
If our goal were only to view the contents of a compressed text file, we could do this:
[me@linuxbox ~]$ gunzip -c foo.txt | less
Alternately, there is a program supplied with gzip, called zcat, that is equivalent to
gunzip with the -c option. It can be used like the cat command on gzip compressed
files:
[me@linuxbox ~]$ zcat foo.txt.gz | less
Tip: There is a zless program, too. It performs the same function as the pipeline
above.
bzip2
The bzip2 program, by Julian Seward, is similar to gzip, but uses a different compression algorithm that achieves higher levels of compression at the cost of compression
speed. In most regards, it works in the same fashion as gzip. A file compressed with
bzip2 is denoted with the extension .bz2:
229
~]$
~]$
me
~]$
~]$
me
~]$
As we can see, bzip2 can be used the same way as gzip. All the options (except for
-r) that we discussed for gzip are also supported in bzip2. Note, however, that the
compression level option (-number) has a somewhat different meaning to bzip2.
bzip2 comes with bunzip2 and bzcat for decompressing files.
bzip2 also comes with the bzip2recover program, which will try to recover damaged .bz2 files.
Archiving Files
A common file-management task often used in conjunction with compression is archiving. Archiving is the process of gathering up many files and bundling them together into a
single large file. Archiving is often done as a part of system backups. It is also used when
old data is moved from a system to some type of long-term storage.
tar
In the Unix-like world of software, the tar program is the classic tool for archiving files.
Its name, short for tape archive, reveals its roots as a tool for making backup tapes. While
it is still used for that traditional task, it is equally adept on other storage devices as well.
230
Archiving Files
We often see filenames that end with the extension .tar or .tgz, which indicate a
plain tar archive and a gzipped archive, respectively. A tar archive can consist of a
group of separate files, one or more directory hierarchies, or a mixture of both. The command syntax works like this:
tar mode[options] pathname...
where mode is one of the following operating modes (only a partial list is shown here; see
the tar man page for a complete list):
Table 18-2: tar Modes
Mode
Description
Extract an archive.
tar uses a slightly odd way of expressing options, so well need some examples to show
how it works. First, lets re-create our playground from the previous chapter:
[me@linuxbox ~]$ mkdir -p playground/dir-{001..100}
[me@linuxbox ~]$ touch playground/dir-{001..100}/file-{A..Z}
This command creates a tar archive named playground.tar that contains the entire
playground directory hierarchy. We can see that the mode and the f option, which is used
to specify the name of the tar archive, may be joined together, and do not require a leading dash. Note, however, that the mode must always be specified first, before any other
option.
To list the contents of the archive, we can do this:
[me@linuxbox ~]$ tar tf playground.tar
231
Now, lets extract the playground in a new location. We will do this by creating a new directory named foo, changing the directory and extracting the tar archive:
[me@linuxbox
[me@linuxbox
[me@linuxbox
[me@linuxbox
playground
If we examine the contents of ~/foo/playground, we see that the archive was successfully installed, creating a precise reproduction of the original files. There is one
caveat, however: Unless you are operating as the superuser, files and directories extracted
from archives take on the ownership of the user performing the restoration, rather than
the original owner.
Another interesting behavior of tar is the way it handles pathnames in archives. The default for pathnames is relative, rather than absolute. tar does this by simply removing
any leading slash from the pathname when creating the archive. To demonstrate, we will
re-create our archive, this time specifying an absolute pathname:
[me@linuxbox foo]$ cd
[me@linuxbox ~]$ tar cf playground2.tar ~/playground
232
Archiving Files
Here we can see that when we extracted our second archive, it re-created the directory
home/me/playground relative to our current working directory, ~/foo, not relative
to the root directory, as would have been the case with an absolute pathname. This may
seem like an odd way for it to work, but its actually more useful this way, as it allows us
to extract archives to any location rather than being forced to extract them to their original locations. Repeating the exercise with the inclusion of the verbose option (v) will
give a clearer picture of whats going on.
Lets consider a hypothetical, yet practical, example of tar in action. Imagine we want
to copy the home directory and its contents from one system to another and we have a
large USB hard drive that we can use for the transfer. On our modern Linux system, the
drive is automagically mounted in the /media directory. Lets also imagine that the
disk has a volume name of BigDisk when we attach it. To make the tar archive, we can
do the following:
[me@linuxbox ~]$ sudo tar cf /media/BigDisk/home.tar /home
After the tar file is written, we unmount the drive and attach it to the second computer.
Again, it is mounted at /media/BigDisk. To extract the archive, we do this:
[me@linuxbox2 ~]$ cd /
[me@linuxbox2 /]$ sudo tar xf /media/BigDisk/home.tar
Whats important to see here is that we must first change directory to /, so that the extraction is relative to the root directory, since all pathnames within the archive are relative.
When extracting an archive, its possible to limit what is extracted from the archive. For
example, if we wanted to extract a single file from an archive, it could be done like this:
tar xf archive.tar pathname
By adding the trailing pathname to the command, tar will only restore the specified file.
Multiple pathnames may be specified. Note that the pathname must be the full, exact relative pathname as stored in the archive. When specifying pathnames, wildcards are not
normally supported; however, the GNU version of tar (which is the version most often
found in Linux distributions) supports them with the --wildcards option. Here is an
example using our previous playground.tar file:
233
This command will extract only files matching the specified pathname including the
wildcard dir-*.
tar is often used in conjunction with find to produce archives. In this example, we will
use find to produce a set of files to include in an archive:
[me@linuxbox ~]$ find playground -name 'file-A' -exec tar rf
playground.tar '{}' '+'
Here we use find to match all the files in playground named file-A and then, using the -exec action, we invoke tar in the append mode (r) to add the matching files
to the archive playground.tar.
Using tar with find is a good way of creating incremental backups of a directory tree
or an entire system. By using find to match files newer than a timestamp file, we could
create an archive that only contains files newer than the last archive, assuming that the
timestamp file is updated right after each archive is created.
tar can also make use of both standard input and output. Here is a comprehensive example:
[me@linuxbox foo]$ cd
[me@linuxbox ~]$ find playground -name 'file-A' | tar cf - --filesfrom=- | gzip > playground.tgz
In this example, we used the find program to produce a list of matching files and piped
them into tar. If the filename - is specified, it is taken to mean standard input or output, as needed (By the way, this convention of using - to represent standard input/output is used by a number of other programs, too.) The --files-from option (which
may also be specified as -T) causes tar to read its list of pathnames from a file rather
than the command line. Lastly, the archive produced by tar is piped into gzip to create
the compressed archive playground.tgz. The .tgz extension is the conventional
extension given to gzip-compressed tar files. The extension .tar.gz is also used sometimes.
While we used the gzip program externally to produced our compressed archive, mod234
Archiving Files
ern versions of GNU tar support both gzip and bzip2 compression directly, with the use
of the z and j options, respectively. Using our previous example as a base, we can simplify it this way:
[me@linuxbox ~]$ find playground -name 'file-A' | tar czf
playground.tgz -T -
If we had wanted to create a bzip2 compressed archive instead, we could have done this:
[me@linuxbox ~]$ find playground -name 'file-A' | tar cjf
playground.tbz -T -
By simply changing the compression option from z to j (and changing the output files
extension to .tbz to indicate a bzip2 compressed file) we enabled bzip2 compression.
Another interesting use of standard input and output with the tar command involves
transferring files between systems over a network. Imagine that we had two machines
running a Unix-like system equipped with tar and ssh. In such a scenario, we could
transfer a directory from a remote system (named remote-sys for this example) to our
local system:
[me@linuxbox ~]$ mkdir remote-stuff
[me@linuxbox ~]$ cd remote-stuff
[me@linuxbox remote-stuff]$ ssh remote-sys 'tar cf - Documents' | tar
xf me@remote-syss password:
[me@linuxbox remote-stuff]$ ls
Documents
Here we were able to copy a directory named Documents from the remote system remote-sys to a directory within the directory named remote-stuff on the local system. How did we do this? First, we launched the tar program on the remote system using ssh. You will recall that ssh allows us to execute a program remotely on a networked computer and see the results on the local systemthe
standard output pro duced on the remote system is sent to the local system for viewing. We can take advantage of this by having tar create an archive (the c mode) and send it to standard output,
rather than a file (the f option with the dash argument), thereby transporting the archive
over the encrypted tunnel provided by ssh to the local system. On the local system, we
execute tar and have it expand an archive (the x mode) supplied from standard input
235
zip
The zip program is both a compression tool and an archiver. The file format used by the
program is familiar to Windows users, as it reads and writes .zip files. In Linux, however, gzip is the predominant compression program with bzip2 being a close second.
In its most basic usage, zip is invoked like this:
zip options zipfile file...
For example, to make a zip archive of our playground, we would do this:
[me@linuxbox ~]$ zip -r playground.zip playground
Unless we include the -r option for recursion, only the playground directory (but
none of its contents) is stored. Although the addition of the extension .zip is automatic,
we will include the file extension for clarity.
During the creation of the zip archive, zip will normally display a series of messages
like this:
adding:
adding:
adding:
adding:
adding:
playground/dir-020/file-Z (stored
playground/dir-020/file-Y (stored
playground/dir-020/file-X (stored
playground/dir-087/ (stored 0%)
playground/dir-087/file-S (stored
0%)
0%)
0%)
0%)
These messages show the status of each file added to the archive. zip will add files to
the archive using one of two storage methods: Either it will store a file without compression, as shown here, or it will deflate the file which performs compression. The numeric value displayed after the storage method indicates the amount of compression
achieved. Since our playground only contains empty files, no compression is performed
on its contents.
Extracting the contents of a zip file is straightforward when using the unzip program:
[me@linuxbox ~]$ cd foo
[me@linuxbox foo]$ unzip ../playground.zip
236
Archiving Files
One thing to note about zip (as opposed to tar) is that if an existing archive is specified, it is updated rather than replaced. This means that the existing archive is preserved,
but new files are added and matching files are replaced.
Files may be listed and extracted selectively from a zip archive by specifying them to
unzip:
[me@linuxbox ~]$ unzip -l playground.zip playground/dir-087/file-Z
Archive: ../playground.zip
Length
Date
Time
Name
----------------0 10-05-08 09:25
playground/dir-087/file-Z
-------------0
1 file
[me@linuxbox ~]$ cd foo
[me@linuxbox foo]$ unzip ../playground.zip playground/dir-087/file-Z
Archive: ../playground.zip
replace playground/dir-087/file-Z? [y]es, [n]o, [A]ll, [N]one,
[r]ename: y
extracting: playground/dir-087/file-Z
Using the -l option causes unzip to merely list the contents of the archive without extracting the file. If no file(s) are specified, unzip will list all files in the archive. The -v
option can be added to increase the verbosity of the listing. Note that when the archive
extraction conflicts with an existing file, the user is prompted before the file is replaced.
Like tar, zip can make use of standard input and output, though its implementation is
somewhat less useful. It is possible to pipe a list of filenames to zip via the -@ option:
[me@linuxbox foo]$ cd
[me@linuxbox ~]$ find playground -name "file-A" | zip -@ file-A.zip
Here we use find to generate a list of files matching the test -name "file-A", and
then pipe the list into zip, which creates the archive file-A.zip containing the selected files.
zip also supports writing its output to standard output, but its use is limited because very
few programs can make use of the output. Unfortunately, the unzip program does not
accept standard input. This prevents zip and unzip from being used together to perform network file copying like tar.
zip can, however, accept standard input, so it can be used to compress the output of
other programs:
237
In this example we pipe the output of ls into zip. Like tar, zip interprets the trailing
dash as use standard input for the input file.
The unzip program allows its output to be sent to standard output when the -p (for
pipe) option is specified:
[me@linuxbox ~]$ unzip -p ls-etc.zip | less
We touched on some of the basic things that zip/unzip can do. They both have a lot of
options that add to their flexibility, though some are platform specific to other systems.
The man pages for both zip and unzip are pretty good and contain useful examples.
However, the main use of these programs is for exchanging files with Windows systems,
rather than performing compression and archiving on Linux, where tar and gzip are
greatly preferred.
Note that either the source or the destination must be a local file. Remote-to-remote copy238
Next, well synchronize the playground directory with a corresponding copy in foo:
[me@linuxbox ~]$ rsync -av playground foo
387258.00 bytes/sec
indicating the amount of copying performed. If we run the command again, we will see a
different result:
[me@linuxbox ~]$ rsync -av playgound foo
building file list ... done
sent 22635 bytes received 20 bytes
total size is 3230 speedup is 0.14
45310.00 bytes/sec
Notice that there was no listing of files. This is because rsync detected that there were
no differences between ~/playground and ~/foo/playground, and therefore it
didnt need to copy anything. If we modify a file in playground and run rsync again:
[me@linuxbox ~]$ touch playground/dir-099/file-Z
[me@linuxbox ~]$ rsync -av playground foo
building file list ... done
playground/dir-099/file-Z
sent 22685 bytes received 42 bytes 45454.00 bytes/sec
total size is 3230 speedup is 0.14
239
In this example, we copied the /etc, /home, and /usr/local directories from our
system to our imaginary storage device. We included the --delete option to remove
files that may have existed on the backup device that no longer existed on the source device (this is irrelevant the first time we make a backup, but will be useful on subsequent
copies). Repeating the procedure of attaching the external drive and running this rsync
command would be a useful (though not ideal) way of keeping a small system backed up.
Of course, an alias would be helpful here, too. We could create an alias and add it to our
.bashrc file to provide this feature:
alias backup='sudo rsync -av --delete /etc /home /usr/local
/media/BigDisk/backup'
Now all we have to do is attach our external drive and run the backup command to do
the job.
240
We made two changes to our command to facilitate the network copy. First, we added the
--rsh=ssh option, which instructs rsync to use the ssh program as its remote shell.
In this way, we were able to use an ssh encrypted tunnel to securely transfer the data from
the local system to the remote host. Second, we specified the remote host by prefixing its
name (in this case the remote host is named remote-sys) to the destination pathname.
The second way that rsync can be used to synchronize files over a network is by using
an rysnc server. rsync can be configured to run as a daemon and listen to incoming requests for synchronization. This is often done to allow mirroring of a remote system. For
example, Red Hat Software maintains a large repository of software packages under development for its Fedora distribution. It is useful for software testers to mirror this collection during the testing phase of the distribution release cycle. Since files in the repository
change frequently (often more than once a day), it is desirable to maintain a local mirror
by periodic synchronization, rather than by bulk copying of the repository. One of these
repositories is kept at Georgia Tech; we could mirror it using our local copy of rsync
and their rsync server like this:
[me@linuxbox ~]$ mkdir fedora-devel
[me@linuxbox ~]$ rsync -av -delete rsync://rsync.gtlib.gatech.edu/fed
ora-linux-core/development/i386/os fedora-devel
In this example, we use the URI of the remote rsync server, which consists of a protocol
(rsync://), followed by the remote host-name (rsync.gtlib.gatech.edu), followed by the pathname of the repository.
Summing Up
We've looked at the common compression and archiving programs used on Linux and
other Unix-like operating systems. For archiving files, the tar/gzip combination is the
preferred method on Unix-like systems while zip/unzip is used for interoperability
with Windows systems. Finally, we looked at the rsync program (a personal favorite)
which is very handy for efficient synchronization of files and directories across systems.
Further Reading
The man pages for all of the commands discussed here are pretty clear and contain useful examples. In addition, the GNU Project has a good online manual for
its version of tar. It can be found here:
241
242
19 Regular Expressions
19 Regular Expressions
In the next few chapters, we are going to look at tools used to manipulate text. As we
have seen, text data plays an important role on all Unix-like systems, such as Linux. But
before we can fully appreciate all of the features offered by these tools, we have to first
examine a technology that is frequently associated with the most sophisticated uses of
these toolsregular expressions.
As we have navigated the many features and facilities offered by the command line, we
have encountered some truly arcane shell features and commands, such as shell expansion and quoting, keyboard shortcuts, and command history, not to mention the vi editor.
Regular expressions continue this tradition and may be (arguably) the most arcane feature of them all. This is not to suggest that the time it takes to learn about them is not
worth the effort. Quite the contrary. A good understanding will enable us to perform
amazing feats, though their full value may not be immediately apparent.
grep
The main program we will use to work with regular expressions is our old pal, grep.
The name grep is actually derived from the phrase global regular expression print, so
we can see that grep has something to do with regular expressions. In essence, grep
searches text files for the occurrence of a specified regular expression and outputs any
line containing a match to standard output.
243
19 Regular Expressions
So far, we have used grep with fixed strings, like so:
[me@linuxbox ~]$ ls /usr/bin | grep zip
This will list all the files in the /usr/bin directory whose names contain the substring
zip.
The grep program accepts options and arguments this way:
grep [options] regex [file...]
where regex is a regular expression.
Here is a list of the commonly used grep options:
Table20-1: grep Options
Option
-i
Description
-v
-c
-l
Print the name of each file that contains a match instead of the lines
themselves. May also be specified --files-with-matches.
-L
Like the -l option, but print only the names of files that do not
contain matches. May also be specified --files-withoutmatch.
-n
Prefix each matching line with the number of the line within the
file. May also be specified --line-number.
-h
In order to more fully explore grep, lets create some text files to search:
244
grep
[me@linuxbox ~]$ ls /bin > dirlist-bin.txt
[me@linuxbox ~]$ ls /usr/bin > dirlist-usr-bin.txt
[me@linuxbox ~]$ ls /sbin > dirlist-sbin.txt
[me@linuxbox ~]$ ls /usr/sbin > dirlist-usr-sbin.txt
[me@linuxbox ~]$ ls dirlist*.txt
dirlist-bin.txt
dirlist-sbin.txt
dirlist-usr-sbin.txt
dirlist-usr-bin.txt
In this example, grep searches all of the listed files for the string bzip and finds two
matches, both in the file dirlist-bin.txt. If we were only interested in the list of
files that contained matches rather than the matches themselves, we could specify the -l
option:
[me@linuxbox ~]$ grep -l bzip dirlist*.txt
dirlist-bin.txt
Conversely, if we wanted only to see a list of the files that did not contain a match, we
could do this:
[me@linuxbox ~]$ grep -L bzip dirlist*.txt
dirlist-sbin.txt
dirlist-usr-bin.txt
dirlist-usr-sbin.txt
19 Regular Expressions
clude metacharacters that are used to specify more complex matches. Regular expression
metacharacters consist of the following:
^ $ . [ ] { } - ? * + ( ) | \
All other characters are considered literals, though the backslash character is used in a
few cases to create meta sequences, as well as allowing the metacharacters to be escaped
and treated as literals instead of being interpreted as metacharacters.
Note: As we can see, many of the regular expression metacharacters are also characters that have meaning to the shell when expansion is performed. When we pass
regular expressions containing metacharacters on the command line, it is vital that
they be enclosed in quotes to prevent the shell from attempting to expand them.
We searched for any line in our files that matches the regular expression .zip. There are
a couple of interesting things to note about the results. Notice that the zip program was
not found. This is because the inclusion of the dot metacharacter in our regular expression
increased the length of the required match to four characters, and because the name zip
only contains three, it does not match. Also, if any files in our lists had contained the file
extension .zip, they would have been matched as well, because the period character in
the file extension is treated as any character, too.
246
Anchors
Anchors
The caret (^) and dollar sign ($) characters are treated as anchors in regular expressions.
This means that they cause the match to occur only if the regular expression is found at
the beginning of the line (^) or at the end of the line ($):
[me@linuxbox ~]$ grep -h '^zip' dirlist*.txt
zip
zipcloak
zipgrep
zipinfo
zipnote
zipsplit
[me@linuxbox ~]$ grep -h 'zip$' dirlist*.txt
gunzip
gzip
funzip
gpg-zip
preunzip
prezip
unzip
zip
[me@linuxbox ~]$ grep -h '^zip$' dirlist*.txt
zip
Here we searched the list of files for the string zip located at the beginning of the line,
the end of the line, and on a line where it is at both the beginning and the end of the line
(i.e., by itself on the line.) Note that the regular expression ^$ (a beginning and an end
with nothing in between) will match blank lines.
247
19 Regular Expressions
Using this regular expression, we can find all the words in our dictionary file that
are five letters long and have a j in the third position and an r in the last position.
Negation
If the first character in a bracket expression is a caret (^), the remaining characters are
taken to be a set of characters that must not be present at the given character position. We
do this by modifying our previous example:
[me@linuxbox ~]$ grep -h '[^bg]zip' dirlist*.txt
bunzip2
gunzip
248
With negation activated, we get a list of files that contain the string zip preceded by any
character except b or g. Notice that the file zip was not found. A negated character
set still requires a character at the given position, but the character must not be a member
of the negated set.
The caret character only invokes negation if it is the first character within a bracket expression; otherwise, it loses its special meaning and becomes an ordinary character in the
set.
Its just a matter of putting all 26uppercase letters in a bracket expression. But the idea of
all that typing is deeply troubling, so there is another way:
[me@linuxbox ~]$ grep -h '^[A-Z]' dirlist*.txt
MAKEDEV
ControlPanel
GET
HEAD
POST
X
X11
Xorg
MAKEFLOPPIES
NetworkManager
NetworkManagerDispatcher
By using a three character range, we can abbreviate the 26 letters. Any range of characters can be expressed this way including multiple ranges, such as this expression that
249
19 Regular Expressions
matches all filenames starting with letters and numbers:
[me@linuxbox ~]$ grep -h '^[A-Za-z0-9]' dirlist*.txt
In character ranges, we see that the dash character is treated specially, so how do we actually include a dash character in a bracket expression? By making it the first character in
the expression. Consider these two examples:
[me@linuxbox ~]$ grep -h '[A-Z]' dirlist*.txt
(Depending on the Linux distribution, we will get a different list of files, possibly an
empty list. This example is from Ubuntu). This command produces the expected result
a list of only the files whose names begin with an uppercase letter, but:
250
with this command we get an entirely different result (only a partial listing of the results
is shown). Why is that? Its a long story, but heres the short version:
Back when Unix was first developed, it only knew about ASCII characters, and this feature reflects that fact. In ASCII, the first 32 characters (numbers 0-31) are control codes
(things like tabs, backspaces, and carriage returns). The next 32 (32-63) contain printable
characters, including most punctuation characters and the numerals zero through nine.
The next 32 (numbers 64-95) contain the uppercase letters and a few more punctuation
symbols. The final 31 (numbers 96-127) contain the lowercase letters and yet more punctuation symbols. Based on this arrangement, systems using ASCII used a collation order
that looked like this:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
This differs from proper dictionary order, which is like this:
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
As the popularity of Unix spread beyond the United States, there grew a need to support
characters not found in U.S. English. The ASCII table was expanded to use a full eight
bits, adding characters numbers 128-255, which accommodated many more languages.
To support this ability, the POSIX standards introduced a concept called a locale, which
could be adjusted to select the character set needed for a particular location. We can see
the language setting of our system using this command:
[me@linuxbox ~]$ echo $LANG
en_US.UTF-8
With this setting, POSIX compliant applications will use a dictionary collation order
rather than ASCII order. This explains the behavior of the commands above. A character
range of [A-Z] when interpreted in dictionary order includes all of the alphabetic characters except the lowercase a, hence our results.
To partially work around this problem, the POSIX standard includes a number of character classes which provide useful ranges of characters. They are described in the table be251
19 Regular Expressions
low:
Table 19-2: POSIX Character Classes
Character Class
Description
[:alnum:]
[:word:]
[:alpha:]
[:blank:]
[:cntrl:]
[:digit:]
[:graph:]
[:lower:]
[:punct:]
[:print:]
[:space:]
[:upper:]
[:xdigit:]
Even with the character classes, there is still no convenient way to express partial ranges,
such as [A-M].
Using character classes, we can repeat our directory listing and see an improved result:
252
Remember, however, that this is not an example of a regular expression, rather it is the
shell performing pathname expansion. We show it here because POSIX character classes
can be used for both.
253
19 Regular Expressions
export LANG=POSIX
POSIX
During the 1980s, Unix became a very popular commercial operating system, but
by 1988, the Unix world was in turmoil. Many computer manufacturers had licensed the Unix source code from its creators, AT&T, and were supplying various
versions of the operating system with their systems. However, in their efforts to
create product differentiation, each manufacturer added proprietary changes and
extensions. This started to limit the compatibility of the software. As always with
proprietary vendors, each was trying to play a winning game of lock-in with
their customers. This dark time in the history of Unix is known today as the
Balkanization.
254
Enter the IEEE (Institute of Electrical and Electronics Engineers). In the mid1980s, the IEEE began developing a set of standards that would define how Unix
(and Unix-like) systems would perform. These standards, formally known as
IEEE 1003, define the application programming interfaces (APIs), shell and utilities that are to be found on a standard Unix-like system. The name POSIX,
which stands for Portable Operating System Interface (with the X added to the
end for extra snappiness), was suggested by Richard Stallman (yes, that Richard
Stallman), and was adopted by the IEEE.
Alternation
The first of the extended regular expression features we will discuss is called alternation,
which is the facility that allows a match to occur from among a set of expressions. Just as
a bracket expression allows a single character to match from a set of specified characters,
alternation allows matches from a set of strings or other regular expressions.
To demonstrate, well use grep in conjunction with echo. First, lets try a plain old
string match:
[me@linuxbox ~]$ echo "AAA" | grep AAA
AAA
[me@linuxbox ~]$ echo "BBB" | grep AAA
[me@linuxbox ~]$
A pretty straightforward example, in which we pipe the output of echo into grep and
see the results. When a match occurs, we see it printed out; when no match occurs, we
see no results.
Now well add alternation, signified by the vertical-bar metacharacter:
[me@linuxbox
AAA
[me@linuxbox
BBB
[me@linuxbox
[me@linuxbox
Here we see the regular expression 'AAA|BBB', which means match either the string
AAA or the string BBB. Notice that since this is an extended feature, we added the -E
255
19 Regular Expressions
option to grep (though we could have just used the egrep program instead), and we
enclosed the regular expression in quotes to prevent the shell from interpreting the vertical-bar metacharacter as a pipe operator. Alternation is not limited to two choices:
[me@linuxbox ~]$ echo "AAA" | grep -E 'AAA|BBB|CCC'
AAA
To combine alternation with other regular expression elements, we can use () to separate
the alternation:
[me@linuxbox ~]$ grep -Eh '^(bz|gz|zip)' dirlist*.txt
This expression will match the filenames in our lists that start with either bz, gz, or
zip. Had we left off the parentheses, the meaning of this regular expression :
[me@linuxbox ~]$ grep -Eh '^bz|gz|zip' dirlist*.txt
changes to match any filename that begins with bz or contains gz or contains zip.
Quantifiers
Extended regular expressions support several ways to specify the number of times an element is matched.
[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$
In this expression, we follow the parentheses characters with question marks to indicate
that they are to be matched zero or one time. Again, since the parentheses are normally
metacharacters (in ERE), we precede them with backslashes to cause them to be treated
as literals instead.
256
Quantifiers
Lets try it:
[me@linuxbox ~]$ echo "(555) 123-4567" | grep -E '^\(?[0-9][0-9][0-9]
\)? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'
(555) 123-4567
[me@linuxbox ~]$ echo "555 123-4567" | grep -E '^\(?[0-9][0-9][0-9]\)
? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'
555 123-4567
[me@linuxbox ~]$ echo "AAA 123-4567" | grep -E '^\(?[0-9][0-9][0-9]\)
? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'
[me@linuxbox ~]$
Here we see that the expression matches both forms of the phone number, but does not
match one containing non-numeric characters.
The expression consists of three items: a bracket expression containing the [:upper:]
character class, a bracket expression containing both the [:upper:] and [:lower:]
character classes and a space, and a period escaped with a backslash. The second element
is trailed with an * metacharacter, so that after the leading uppercase letter in our sentence, any number of upper and lowercase letters and spaces may follow it and still
match:
[me@linuxbox ~]$ echo "This works." | grep -E '[[:upper:]][[:upper:][
:lower:] ]*\.'
This works.
[me@linuxbox ~]$ echo "This Works." | grep -E '[[:upper:]][[:upper:][
:lower:] ]*\.'
This Works.
[me@linuxbox ~]$ echo "this does not" | grep -E '[[:upper:]][[:upper:
][:lower:] ]*\.'
[me@linuxbox ~]$
The expression matches the first two tests, but not the third, since it lacks the required
257
19 Regular Expressions
leading uppercase character and trailing period.
[me@linuxbox
This that
[me@linuxbox
a b c
[me@linuxbox
[me@linuxbox
[me@linuxbox
We see that this expression does not match the line a b 9, because it contains a non-alphabetic character; nor does it match abc d, because more than one space character
separates the characters c and d.
Meaning
{n}
{n,m}
{n,}
{,m}
Going back to our earlier example with the phone numbers, we can use this method of
specifying repetitions to simplify our original regular expression from:
^\(?[0-9][0-9][0-9]\)?
258
[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$
Quantifiers
to:
^\(?[0-9]{3}\)?
[0-9]{3}-[0-9]{4}$
As we can see, our revised expression can successfully validate numbers both with and
without the parentheses, while rejecting those numbers that are not properly formatted.
This command will produce a file named phonelist.txt containing ten phone numbers. Each time the command is repeated, another ten numbers are added to the list. We
can also change the value 10 near the beginning of the command to produce more or
fewer phone numbers. If we examine the contents of the file, however, we see we have a
problem:
259
19 Regular Expressions
[me@linuxbox ~]$ cat phonelist.txt
(232) 298-2265
(624) 381-1078
(540) 126-1980
(874) 163-2885
(286) 254-2860
(292) 108-518
(129) 44-1379
(458) 273-1642
(686) 299-8268
(198) 307-2440
Some of the numbers are malformed, which is perfect for our purposes, since we will use
grep to validate them.
One useful method of validation would be to scan the file for invalid numbers and display
the resulting list on the display:
[me@linuxbox ~]$ grep -Ev '^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$'
phonelist.txt
(292) 108-518
(129) 44-1379
[me@linuxbox ~]$
Here we use the -v option to produce an inverse match so that we will only output the
lines in the list that do not match the specified expression. The expression itself includes
the anchor metacharacters at each end to ensure that the number has no extra characters at
either end. This expression also requires that the parentheses be present in a valid number, unlike our earlier phone number example.
260
Due to the requirement for an exact match of the entire pathname, we use .* at both ends
of the expression to match zero or more instances of any character. In the middle of the
expression, we use a negated bracket expression containing our set of acceptable pathname characters.
Using alternation, we perform a search for pathnames that contain either bin/bz,
bin/gz, or /bin/zip.
261
19 Regular Expressions
[me@linuxbox ~]$ less phonelist.txt
less will highlight the strings that match, leaving the invalid ones easy to spot:
(232)
(624)
(540)
(874)
(286)
(292)
(129)
(458)
(686)
(198)
~
~
~
(END)
298-2265
381-1078
126-1980
163-2885
254-2860
108-518
44-1379
273-1642
299-8268
307-2440
vim, on the other hand, supports basic regular expressions, so our search expression
would look like this:
/([0-9]\{3\}) [0-9]\{3\}-[0-9]\{4\}
We can see that the expression is mostly the same; however, many of the characters that
are considered metacharacters in extended expressions are considered literals in basic expressions. They are only treated as metacharacters when escaped with a backslash. De262
Summing Up
In this chapter, weve seen a few of the many uses of regular expressions. We can find
even more if we use regular expressions to search for additional applications that use
them. We can do that by searching the man pages:
[me@linuxbox ~]$ cd /usr/share/man/man1
[me@linuxbox man1]$ zgrep -El 'regex|regular expression' *.gz
The zgrep program provides a front end for grep, allowing it to read compressed files.
In our example, we search the compressed section one man page files in their usual location. The result of this command is a list of files containing either the string regex or
regular expression. As we can see, regular expressions show up in a lot of programs.
There is one feature found in basic regular expressions that we did not cover. Called back
references, this feature will be discussed in the next chapter.
Further Reading
There are many online resources for learning regular expressions, including various tutorials and cheat sheets.
In addition, the Wikipedia has good articles on the following background topics:
POSIX: http://en.wikipedia.org/wiki/Posix
ASCII: http://en.wikipedia.org/wiki/Ascii
263
20 Text Processing
20 Text Processing
All Unix-like operating systems rely heavily on text files for several types of data storage. So it makes sense that there are many tools for manipulating text. In this chapter, we
will look at programs that are used to slice and dice text. In the next chapter, we will
look at more text processing, focusing on programs that are used to format text for printing and other kinds of human consumption.
This chapter will revisit some old friends and introduce us to some new ones:
Applications Of Text
So far, we have learned a couple of text editors (nano and vim), looked at a bunch of
configuration files, and have witnessed the output of dozens of commands, all in text. But
what else is text used for? For many things, it turns out.
264
Applications Of Text
Documents
Many people write documents using plain text formats. While it is easy to see how a
small text file could be useful for keeping simple notes, it is also possible to write large
documents in text format, as well. One popular approach is to write a large document in a
text format and then use a markup language to describe the formatting of the finished
document. Many scientific papers are written using this method, as Unix-based text processing systems were among the first systems that supported the advanced typographical
layout needed by writers in technical disciplines.
Web Pages
The worlds most popular type of electronic document is probably the web page. Web
pages are text documents that use either HTML (Hypertext Markup Language) or XML
(Extensible Markup Language) as markup languages to describe the documents visual
format.
Email
Email is an intrinsically text-based medium. Even non-text attachments are converted
into a text representation for transmission. We can see this for ourselves by downloading
an email message and then viewing it in less. We will see that the message begins with
a header that describes the source of the message and the processing it received during its
journey, followed by the body of the message with its content.
Printer Output
On Unix-like systems, output destined for a printer is sent as plain text or, if the page
contains graphics, is converted into a text format page description language known as
PostScript, which is then sent to a program that generates the graphic dots to be printed.
20 Text Processing
cept standard input in addition to command line arguments. We only touched on them
briefly then, but now we will take a closer look at how they can be used to perform text
processing.
cat
The cat program has a number of interesting options. Many of them are used to help
better visualize text content. One example is the -A option, which is used to display nonprinting characters in the text. There are times when we want to know if control characters are embedded in our otherwise visible text. The most common of these are tab characters (as opposed to spaces) and carriage returns, often present as end-of-line characters
in MS-DOS-style text files. Another common situation is a file containing lines of text
with trailing spaces.
Lets create a test file using cat as a primitive word processor. To do this, well just enter the command cat (along with specifying a file for redirected output) and type our
text, followed by Enter to properly end the line, then Ctrl-d, to indicate to cat that
we have reached end-of-file. In this example, we enter a leading tab character and follow
the line with some trailing spaces:
[me@linuxbox ~]$ cat > foo.txt
The quick brown fox jumped over the lazy dog.
[me@linuxbox ~]$
Next, we will use cat with the -A option to display the text:
[me@linuxbox ~]$ cat -A foo.txt
^IThe quick brown fox jumped over the lazy dog.
[me@linuxbox ~]$
As we can see in the results, the tab character in our text is represented by ^I. This is a
common notation that means Control-I which, as it turns out, is the same as a tab character. We also see that a $ appears at the true end of the line, indicating that our text contains trailing spaces.
266
In this example, we create a new version of our foo.txt test file, which contains two
lines of text separated by two blank lines. After processing by cat with the -ns options,
the extra blank line is removed and the remaining lines are numbered. While this is not
much of a process to perform on text, it is a process.
sort
The sort program sorts the contents of standard input, or one or more files specified on
the command line, and sends the results to standard output. Using the same technique that
we used with cat, we can demonstrate processing of standard input directly from the
267
20 Text Processing
keyboard:
[me@linuxbox ~]$ sort > foo.txt
c
b
a
[me@linuxbox ~]$ cat foo.txt
a
b
c
After entering the command, we type the letters c, b, and a, followed once again by
Ctrl-d to indicate end-of-file. We then view the resulting file and see that the lines now
appear in sorted order.
Since sort can accept multiple files on the command line as arguments, it is possible to
merge multiple files into a single sorted whole. For example, if we had three text files and
wanted to combine them into a single sorted file, we could do something like this:
sort file1.txt file2.txt file3.txt > final_sorted_list.txt
Long Option
--ignore-leading-blanks
Description
-f
--ignore-case
-n
--numeric-sort
268
--reverse
-k
--key=field1[,field2]
-m
--merge
-o
--output=file
-t
--field-separator=char
Although most of the options above are pretty self-explanatory, some are not. First, lets
look at the -n option, used for numeric sorting. With this option, it is possible to sort values based on numeric values. We can demonstrate this by sorting the results of the du
command to determine the largest users of disk space. Normally, the du command lists
the results of a summary in pathname order:
[me@linuxbox ~]$ du -s /usr/share/* | head
252
/usr/share/aclocal
96
/usr/share/acpi-support
8
/usr/share/adduser
196
/usr/share/alacarte
344
/usr/share/alsa
8
/usr/share/alsa-base
12488
/usr/share/anthy
8
/usr/share/apmd
21440
/usr/share/app-install
48
/usr/share/application-registry
In this example, we pipe the results into head to limit the results to the first ten lines. We
can produce a numerically sorted list to show the ten largest consumers of space this way:
269
20 Text Processing
[me@linuxbox ~]$ du -s /usr/share/* | sort -nr | head
509940
/usr/share/locale-langpack
242660
/usr/share/doc
197560
/usr/share/fonts
179144
/usr/share/gnome
146764
/usr/share/myspell
144304
/usr/share/gimp
135880
/usr/share/dict
76508
/usr/share/icons
68072
/usr/share/apps
62844
/usr/share/foomatic
By using the -nr options, we produce a reverse numerical sort, with the largest values
appearing first in the results. This sort works because the numerical values occur at the
beginning of each line. But what if we want to sort a list based on some value found
within the line? For example, the results of an ls -l:
[me@linuxbox
total 152948
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
root
root
root
root
root
root
root
root
root
34824
101556
13036
10552
3800
7536
3576
20808
489704
2008-04-04
2007-11-27
2008-02-27
2007-08-15
2008-04-14
2008-04-19
2008-04-29
2008-01-03
2008-10-09
02:42
06:08
08:22
10:34
03:51
00:19
07:57
18:02
17:02
[
a2p
aconnect
acpi
acpi_fakekey
acpi_listen
addpart
addr2line
adept_batch
Ignoring, for the moment, that ls can sort its results by size, we could use sort to sort
this list by file size, as well:
[me@linuxbox
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
-rwxr-xr-x 1
270
~]$ ls
root
root
root
root
root
root
root
root
root
root
Shotts
By default, sort sees this line as having two fields. The first field contains the characters:
William
and the second field contains the characters:
Shotts
meaning that whitespace characters (spaces and tabs) are used as delimiters between
fields and that the delimiters are included in the field when sorting is performed.
Looking again at a line from our ls output, we can see that a line contains eight fields
and that the fifth field is the file size:
-rwxr-xr-x 1 root
root
For our next series of experiments, lets consider the following file containing the history
of three popular Linux distributions released from 2006 to 2008. Each line in the file has
three fields: the distribution name, version number, and date of release in
MM/DD/YYYY format:
SUSE
Fedora
SUSE
Ubuntu
Fedora
SUSE
10.2
10
11.0
8.04
8
10.3
12/07/2006
11/25/2008
06/19/2008
04/24/2008
11/08/2007
10/04/2007
271
20 Text Processing
Ubuntu
Fedora
Ubuntu
Ubuntu
SUSE
Fedora
Fedora
Ubuntu
Ubuntu
Fedora
6.10
7
7.10
7.04
10.1
6
9
6.06
8.10
5
10/26/2006
05/31/2007
10/18/2007
04/19/2007
05/11/2006
10/24/2006
05/13/2008
06/01/2006
10/30/2008
03/20/2006
Using a text editor (perhaps vim), well enter this data and name the resulting file distros.txt.
Next, well try sorting the file and observe the results:
[me@linuxbox ~]$
Fedora
10
Fedora
5
Fedora
6
Fedora
7
Fedora
8
Fedora
9
SUSE
10.1
SUSE
10.2
SUSE
10.3
SUSE
11.0
Ubuntu
6.06
Ubuntu
6.10
Ubuntu
7.04
Ubuntu
7.10
Ubuntu
8.04
Ubuntu
8.10
sort distros.txt
11/25/2008
03/20/2006
10/24/2006
05/31/2007
11/08/2007
05/13/2008
05/11/2006
12/07/2006
10/04/2007
06/19/2008
06/01/2006
10/26/2006
04/19/2007
10/18/2007
04/24/2008
10/30/2008
Well, it mostly worked. The problem occurs in the sorting of the Fedora version numbers.
Since a 1 comes before a 5 in the character set, version 10 ends up at the top while
version 9 falls to the bottom.
To fix this problem we are going to have to sort on multiple keys. We want to perform an
alphabetic sort on the first field and then a numeric sort on the second field. sort allows
multiple instances of the -k option so that multiple sort keys can be specified. In fact, a
key may include a range of fields. If no range is specified (as has been the case with our
previous examples), sort uses a key that begins with the specified field and extends to
the end of the line. Here is the syntax for our multi-key sort:
272
Though we used the long form of the option for clarity, -k 1,1 -k 2n would be exactly equivalent. In the first instance of the key option, we specified a range of fields to
include in the first key. Since we wanted to limit the sort to just the first field, we speci fied 1,1 which means start at field one and end at field one. In the second instance, we
specified 2n, which means that field 2 is the sort key and that the sort should be numeric.
An option letter may be included at the end of a key specifier to indicate the type of sort
to be performed. These option letters are the same as the global options for the sort program: b (ignore leading blanks), n (numeric sort), r (reverse sort), and so on.
The third field in our list contains a date in an inconvenient format for sorting. On computers, dates are usually formatted in YYYY-MM-DD order to make chronological sorting easy, but ours are in the American format of MM/DD/YYYY. How can we sort this
list in chronological order?
Fortunately, sort provides a way. The key option allows specification of offsets within
fields, so we can define keys within fields:
[me@linuxbox ~]$
Fedora
10
Ubuntu
8.10
SUSE
11.0
Fedora
9
Ubuntu
8.04
Fedora
8
Ubuntu
7.10
SUSE
10.3
Fedora
7
273
20 Text Processing
Ubuntu
SUSE
Ubuntu
Fedora
Ubuntu
SUSE
Fedora
7.04
10.2
6.10
6
6.06
10.1
5
04/19/2007
12/07/2006
10/26/2006
10/24/2006
06/01/2006
05/11/2006
03/20/2006
By specifying -k 3.7 we instruct sort to use a sort key that begins at the seventh
character within the third field, which corresponds to the start of the year. Likewise, we
specify -k 3.1 and -k 3.4 to isolate the month and day portions of the date. We also
add the n and r options to achieve a reverse numeric sort. The b option is included to
suppress the leading spaces (whose numbers vary from line to line, thereby affecting the
outcome of the sort) in the date field.
Some files dont use tabs and spaces as field delimiters; for example, the /etc/passwd
file:
[me@linuxbox ~]$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
The fields in this file are delimited with colons (:), so how would we sort this file using a
key field? sort provides the -t option to define the field separator character. To sort the
passwd file on the seventh field (the accounts default shell), we could do this:
[me@linuxbox ~]$ sort -t ':' -k 7 /etc/passwd | head
me:x:1001:1001:Myself,,,:/home/me:/bin/bash
root:x:0:0:root:/root:/bin/bash
dhcp:x:101:102::/nonexistent:/bin/false
gdm:x:106:114:Gnome Display Manager:/var/lib/gdm:/bin/false
hplip:x:104:7:HPLIP system user,,,:/var/run/hplip:/bin/false
klog:x:103:104::/home/klog:/bin/false
messagebus:x:108:119::/var/run/dbus:/bin/false
polkituser:x:110:122:PolicyKit,,,:/var/run/PolicyKit:/bin/false
274
By specifying the colon character as the field separator, we can sort on the seventh field.
uniq
Compared to sort, the uniq program is a lightweight. uniq performs a seemingly
trivial task. When given a sorted file (including standard input), it removes any duplicate
lines and sends the results to standard output. It is often used in conjunction with sort to
clean the output of duplicates.
Tip: While uniq is a traditional Unix tool often used with sort, the GNU version
of sort supports a -u option, which removes duplicates from the sorted output.
Lets make a text file to try this out:
[me@linuxbox ~]$ cat > foo.txt
a
b
c
a
b
c
Remember to type Ctrl-d to terminate standard input. Now, if we run uniq on our text
file:
[me@linuxbox ~]$ uniq foo.txt
a
b
c
a
b
c
the results are no different from our original file; the duplicates were not removed. For
uniq to actually do its job, the input must be sorted first:
275
20 Text Processing
[me@linuxbox ~]$ sort foo.txt | uniq
a
b
c
This is because uniq only removes duplicate lines which are adjacent to each other.
uniq has several options. Here are the common ones:
Table 20-2: Common uniq Options
Option
Description
-c
-d
-f n
-i
-s n
-u
Here we see uniq used to report the number of duplicates found in our text file, using
the -c option:
[me@linuxbox ~]$ sort foo.txt | uniq -c
2 a
2 b
2 c
cut
The cut program is used to extract a section of text from a line and output the extracted
276
Description
-c char_list
-f field_list
-d delim_char
--complement
As we can see, the way cut extracts text is rather inflexible. cut is best used to extract
text from files that are produced by other programs, rather than text directly typed by humans. Well take a look at our distros.txt file to see if it is clean enough to be a
good specimen for our cut examples. If we use cat with the -A option, we can see if
the file meets our requirements of tab-separated fields:
[me@linuxbox ~]$ cat -A distros.txt
SUSE^I10.2^I12/07/2006$
Fedora^I10^I11/25/2008$
SUSE^I11.0^I06/19/2008$
Ubuntu^I8.04^I04/24/2008$
Fedora^I8^I11/08/2007$
SUSE^I10.3^I10/04/2007$
Ubuntu^I6.10^I10/26/2006$
Fedora^I7^I05/31/2007$
Ubuntu^I7.10^I10/18/2007$
Ubuntu^I7.04^I04/19/2007$
SUSE^I10.1^I05/11/2006$
Fedora^I6^I10/24/2006$
Fedora^I9^I05/13/2008$
277
20 Text Processing
Ubuntu^I6.06^I06/01/2006$
Ubuntu^I8.10^I10/30/2008$
Fedora^I5^I03/20/2006$
It looks good. No embedded spaces, just single tab characters between the fields. Since
the file uses tabs rather than spaces, well use the -f option to extract a field:
[me@linuxbox ~]$ cut -f 3 distros.txt
12/07/2006
11/25/2008
06/19/2008
04/24/2008
11/08/2007
10/04/2007
10/26/2006
05/31/2007
10/18/2007
04/19/2007
05/11/2006
10/24/2006
05/13/2008
06/01/2006
10/30/2008
03/20/2006
Because our distros file is tab-delimited, it is best to use cut to extract fields rather
than characters. This is because when a file is tab-delimited, it is unlikely that each line
will contain the same number of characters, which makes calculating character positions
within the line difficult or impossible. In our example above, however, we now have extracted a field that luckily contains data of identical length, so we can show how character
extraction works by extracting the year from each line:
[me@linuxbox ~]$ cut -f 3 distros.txt | cut -c 7-10
2006
2008
2008
2008
2007
2007
2006
2007
2007
2007
2006
278
By running cut a second time on our list, we are able to extract character positions 7
through 10, which corresponds to the year in our date field. The 7-10 notation is an example of a range. The cut man page contains a complete description of how ranges can
be specified.
Expanding Tabs
Our distros.txt file is ideally formatted for extracting fields using cut. But
what if we wanted a file that could be fully manipulated with cut by characters,
rather than fields? This would require us to replace the tab characters within the
file with the corresponding number of spaces. Fortunately, the GNU Coreutils
package includes a tool for that. Named expand, this program accepts either one
or more file arguments or standard input, and outputs the modified text to standard output.
If we process our distros.txt file with expand, we can use the cut -c to
extract any range of characters from the file. For example, we could use the following command to extract the year of release from our list, by expanding the file
and using cut to extract every character from the twenty-third position to the end
of the line:
[me@linuxbox ~]$ expand distros.txt | cut -c 23-
Coreutils also provides the unexpand program to substitute tabs for spaces.
When working with fields, it is possible to specify a different field delimiter rather than
the tab character. Here we will extract the first field from the /etc/passwd file:
[me@linuxbox ~]$ cut -d ':' -f 1 /etc/passwd | head
root
daemon
bin
sys
sync
games
man
lp
279
20 Text Processing
mail
news
Using the -d option, we are able to specify the colon character as the field delimiter.
paste
The paste command does the opposite of cut. Rather than extracting a column of text
from a file, it adds one or more columns of text to a file. It does this by reading multiple
files and combining the fields found in each file into a single stream on standard output.
Like cut, paste accepts multiple file arguments and/or standard input. To demonstrate
how paste operates, we will perform some surgery on our distros.txt file to produce a chronological list of releases.
From our earlier work with sort, we will first produce a list of distros sorted by date
and store the result in a file called distros-by-date.txt:
[me@linuxbox ~]$ sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt > dis
tros-by-date.txt
Next, we will use cut to extract the first two fields from the file (the distro name and
version), and store that result in a file named distro-versions.txt:
[me@linuxbox ~]$ cut -f 1,2 distros-by-date.txt > distros-versions.t
xt
[me@linuxbox ~]$ head distros-versions.txt
Fedora
10
Ubuntu
8.10
SUSE
11.0
Fedora
9
Ubuntu
8.04
Fedora
8
Ubuntu
7.10
SUSE
10.3
Fedora
7
Ubuntu
7.04
The final piece of preparation is to extract the release dates and store them a file named
distro-dates.txt:
280
We now have the parts we need. To complete the process, use paste to put the column
of dates ahead of the distro names and versions, thus creating a chronological list. This is
done simply by using paste and ordering its arguments in the desired arrangement:
[me@linuxbox ~]$ paste distros-dates.txt distros-versions.txt
11/25/2008 Fedora
10
10/30/2008 Ubuntu
8.10
06/19/2008 SUSE
11.0
05/13/2008 Fedora
9
04/24/2008 Ubuntu
8.04
11/08/2007 Fedora
8
10/18/2007 Ubuntu
7.10
10/04/2007 SUSE
10.3
05/31/2007 Fedora
7
04/19/2007 Ubuntu
7.04
12/07/2006 SUSE
10.2
10/26/2006 Ubuntu
6.10
10/24/2006 Fedora
6
06/01/2006 Ubuntu
6.06
05/11/2006 SUSE
10.1
03/20/2006 Fedora
5
join
In some ways, join is like paste in that it adds columns to a file, but it uses a unique
way to do it. A join is an operation usually associated with relational databases where
data from multiple tables with a shared key field is combined to form a desired result.
The join program performs the same operation. It joins data from multiple files based
on a shared key field.
To see how a join operation is used in a relational database, lets imagine a very small
database consisting of two tables, each containing a single record. The first table, called
281
20 Text Processing
CUSTOMERS, has three fields: a customer number (CUSTNUM), the customers first
name (FNAME), and the customers last name (LNAME):
CUSTNUM
========
4681934
FNAME
=====
John
LNAME
======
Smith
The second table is called ORDERS and contains four fields: an order number (ORDERNUM), the customer number (CUSTNUM), the quantity (QUAN), and the item ordered
(ITEM).
ORDERNUM
========
3014953305
CUSTNUM
=======
4681934
QUAN ITEM
==== ====
1
Blue Widget
Note that both tables share the field CUSTNUM. This is important, as it allows a relationship between the tables.
Performing a join operation would allow us to combine the fields in the two tables to
achieve a useful result, such as preparing an invoice. Using the matching values in the
CUSTNUM fields of both tables, a join operation could produce the following:
FNAME
=====
John
LNAME
=====
Smith
QUAN ITEM
==== ====
1
Blue Widget
To demonstrate the join program, well need to make a couple of files with a shared
key. To do this, we will use our distros-by-date.txt file. From this file, we will
construct two additional files, one containing the release dates (which will be our shared
key for this demonstration) and the release names:
[me@linuxbox ~]$ cut -f 1,1 distros-by-date.txt > distros-names.txt
[me@linuxbox ~]$ paste distros-dates.txt distros-names.txt > distroskey-names.txt
[me@linuxbox ~]$ head distros-key-names.txt
11/25/2008 Fedora
10/30/2008 Ubuntu
06/19/2008 SUSE
05/13/2008 Fedora
04/24/2008 Ubuntu
11/08/2007 Fedora
10/18/2007 Ubuntu
10/04/2007 SUSE
05/31/2007 Fedora
04/19/2007 Ubuntu
and the second file, which contains the release dates and the version numbers:
282
We now have two files with a shared key (the release date field). It is important to point
out that the files must be sorted on the key field for join to work properly.
[me@linuxbox ~]$ join distros-key-names.txt distros-key-vernums.txt |
head
11/25/2008 Fedora 10
10/30/2008 Ubuntu 8.10
06/19/2008 SUSE 11.0
05/13/2008 Fedora 9
04/24/2008 Ubuntu 8.04
11/08/2007 Fedora 8
10/18/2007 Ubuntu 7.10
10/04/2007 SUSE 10.3
05/31/2007 Fedora 7
04/19/2007 Ubuntu 7.04
Note also that, by default, join uses whitespace as the input field delimiter and a single
space as the output field delimiter. This behavior can be modified by specifying options.
See the join man page for details.
Comparing Text
It is often useful to compare versions of text files. For system administrators and software
developers, this is particularly important. A system administrator may, for example, need
to compare an existing configuration file to a previous version to diagnose a system problem. Likewise, a programmer frequently needs to see what changes have been made to
programs over time.
283
20 Text Processing
comm
The comm program compares two text files and displays the lines that are unique to each
one and the lines they have in common. To demonstrate, we will create two nearly identical text files using cat:
[me@linuxbox ~]$ cat > file1.txt
a
b
c
d
[me@linuxbox ~]$ cat > file2.txt
b
c
d
e
As we can see, comm produces three columns of output. The first column contains lines
unique to the first file argument; the second column, the lines unique to the second file argument; the third column contains the lines shared by both files. comm supports options
in the form -n where n is either 1, 2 or 3. When used, these options specify which column(s) to suppress. For example, if we only wanted to output the lines shared by both
files, we would suppress the output of columns one and two:
[me@linuxbox ~]$ comm -12 file1.txt file2.txt
b
c
d
diff
Like the comm program, diff is used to detect the differences between files. However,
284
Comparing Text
diff is a much more complex tool, supporting many output formats and the ability to
process large collections of text files at once. diff is often used by software developers
to examine changes between different versions of program source code, and thus has the
ability to recursively examine directories of source code, often referred to as source trees.
One common use for diff is the creation of diff files or patches that are used by programs such as patch (which well discuss shortly) to convert one version of a file (or
files) to another version.
If we use diff to look at our previous example files:
[me@linuxbox ~]$ diff file1.txt file2.txt
1d0
< a
4a4
> e
we see its default style of output: a terse description of the differences between the two
files. In the default format, each group of changes is preceded by a change command in
the form of range operation range to describe the positions and types of changes required
to convert the first file to the second file:
Table 20-4: diff Change Commands
Change
r1ar2
Description
r1cr2
r1dr2
Delete the lines in the first file at position r1, which would have
appeared at range r2 in the second file
Append the lines at the position r2 in the second file to the position
r1 in the first file.
In this format, a range is a comma-separated list of the starting line and the ending line.
While this format is the default (mostly for POSIX compliance and backward compatibility with traditional Unix versions of diff), it is not as widely used as other, optional formats. Two of the more popular formats are the context format and the unified format.
When viewed using the context format (the -c option), we will see this:
[me@linuxbox ~]$ diff -c file1.txt file2.txt
285
20 Text Processing
*** file1.txt
2008-12-23 06:40:13.000000000 -0500
--- file2.txt
2008-12-23 06:40:34.000000000 -0500
***************
*** 1,4 ****
- a
b
c
d
--- 1,4 ---b
c
d
+ e
The output begins with the names of the two files and their timestamps. The first file is
marked with asterisks and the second file is marked with dashes. Throughout the remainder of the listing, these markers will signify their respective files. Next, we see groups of
changes, including the default number of surrounding context lines. In the first group, we
see:
*** 1,4 ***
which indicates lines 1 through 4 in the first file. Later we see:
--- 1,4 --which indicates lines 1 through 4 in the second file. Within a change group, lines begin
with one of four indicators:
Table 20-5: diff Context Format Change Indicators
Indicator
blank
Meaning
A line deleted. This line will appear in the first file but not in the
second file.
A line added. This line will appear in the second file but not in the
first file.
A line changed. The two versions of the line will be displayed, each
in its respective section of the change group.
The unified format is similar to the context format but is more concise. It is specified
with the -u option:
286
Comparing Text
[me@linuxbox ~]$ diff -u file1.txt file2.txt
--- file1.txt
2008-12-23 06:40:13.000000000 -0500
+++ file2.txt
2008-12-23 06:40:34.000000000 -0500
@@ -1,4 +1,4 @@
-a
b
c
d
+e
The most notable difference between the context and unified formats is the elimination of
the duplicated lines of context, making the results of the unified format shorter than those
of the context format. In our example above, we see file timestamps like those of the context format, followed by the string @@ -1,4 +1,4 @@. This indicates the lines in the
first file and the lines in the second file described in the change group. Following this are
the lines themselves, with the default three lines of context. Each line starts with one of
three possible characters:
Table 20-6: diff Unified Format Change Indicators
Character
blank
Meaning
patch
The patch program is used to apply changes to text files. It accepts output from diff
and is generally used to convert older version of files into newer versions. Lets consider
a famous example. The Linux kernel is developed by a large, loosely organized team of
contributors who submit a constant stream of small changes to the source code. The
Linux kernel consists of several million lines of code, while the changes that are made by
one contributor at one time are quite small. It makes no sense for a contributor to send
each developer an entire kernel source tree each time a small change is made. Instead, a
diff file is submitted. The diff file contains the change from the previous version of the
kernel to the new version with the contributor's changes. The receiver then uses the
patch program to apply the change to his own source tree. Using diff/patch offers
two significant advantages:
1. The diff file is very small, compared to the full size of the source tree.
2. The diff file concisely shows the change being made, allowing reviewers of the
287
20 Text Processing
patch to quickly evaluate it.
Of course, diff/patch will work on any text file, not just source code. It would be
equally applicable to configuration files or any other text.
To prepare a diff file for use with patch, the GNU documentation (see Further Reading
below) suggests using diff as follows:
diff -Naur old_file new_file > diff_file
Where old_file and new_file are either single files or directories containing files. The r
option supports recursion of a directory tree.
Once the diff file has been created, we can apply it to patch the old file into the new file:
patch < diff_file
Well demonstrate with our test file:
[me@linuxbox ~]$ diff -Naur file1.txt file2.txt > patchfile.txt
[me@linuxbox ~]$ patch < patchfile.txt
patching file file1.txt
[me@linuxbox ~]$ cat file1.txt
b
c
d
e
In this example, we created a diff file named patchfile.txt and then used the
patch program to apply the patch. Note that we did not have to specify a target file to
patch, as the diff file (in unified format) already contains the filenames in the header.
Once the patch is applied, we can see that file1.txt now matches file2.txt.
patch has a large number of options, and there are additional utility programs that can
be used to analyze and edit patches.
tr
The tr program is used to transliterate characters. We can think of this as a sort of char288
As we can see, tr operates on standard input, and outputs its results on standard output.
tr accepts two arguments: a set of characters to convert from and a corresponding set of
characters to convert to. Character sets may be expressed in one of three ways:
1. An enumerated list. For example, ABCDEFGHIJKLMNOPQRSTUVWXYZ
2. A character range. For example, A-Z. Note that this method is sometimes subject
to the same issues as other commands, due to the locale collation order, and thus
should be used with caution.
3. POSIX character classes. For example, [:upper:].
In most cases, both character sets should be of equal length; however, it is possible for
the first set to be larger than the second, particularly if we wish to convert multiple characters to a single character:
[me@linuxbox ~]$ echo "lowercase letters" | tr [:lower:] A
AAAAAAAAA AAAAAAA
289
20 Text Processing
A number of email programs and Usenet news readers support ROT13 encoding.
Wikipedia contains a good article on the subject:
http://en.wikipedia.org/wiki/ROT13
tr can perform another trick, too. Using the -s option, tr can squeeze (delete) repeated instances of a character:
[me@linuxbox ~]$ echo "aaabbbccc" | tr -s ab
abccc
Here we have a string containing repeated characters. By specifying the set ab to tr,
we eliminate the repeated instances of the letters in the set, while leaving the character
that is missing from the set (c) unchanged. Note that the repeating characters must be
adjoining. If they are not:
[me@linuxbox ~]$ echo "abcabcabc" | tr -s ab
abcabcabc
sed
The name sed is short for stream editor. It performs text editing on a stream of text, ei290
In this example, we produce a one-word stream of text using echo and pipe it into sed.
sed, in turn, carries out the instruction s/front/back/ upon the text in the stream
and produces the output back as a result. We can also recognize this command as resembling the substitution (search-and-replace) command in vi.
Commands in sed begin with a single letter. In the example above, the substitution command is represented by the letter s and is followed by the search-and-replace strings, separated by the slash character as a delimiter. The choice of the delimiter character is arbitrary. By convention, the slash character is often used, but sed will accept any character
that immediately follows the command as the delimiter. We could perform the same command this way:
[me@linuxbox ~]$ echo "front" | sed 's_front_back_'
back
By using the underscore character immediately after the command, it becomes the delimiter. The ability to set the delimiter can be used to make commands more readable, as we
shall see.
Most commands in sed may be preceded by an address, which specifies which line(s) of
the input stream will be edited. If the address is omitted, then the editing command is carried out on every line in the input stream. The simplest form of address is a line number.
We can add one to our example:
[me@linuxbox ~]$ echo "front" | sed '1s/front/back/'
back
Adding the address 1 to our command causes our substitution to be performed on the first
291
20 Text Processing
line of our one-line input stream. If we specify another number:
[me@linuxbox ~]$ echo "front" | sed '2s/front/back/'
front
we see that the editing is not carried out, since our input stream does not have a line 2.
Addresses may be expressed in many ways. Here are the most common:
Table 20-7: sed Address Notation
Address
Description
/regexp/
addr1,addr2
first~step
addr1,+n
addr!
Match all lines except addr, which may be any of the forms
above.
Well demonstrate different kinds of addresses using the distros.txt file from earlier
in this chapter. First, a range of line numbers:
[me@linuxbox ~]$
SUSE
10.2
Fedora
10
SUSE
11.0
Ubuntu
8.04
292
11/08/2007
In this example, we print a range of lines, starting with line 1 and continuing to line 5. To
do this, we use the p command, which simply causes a matched line to be printed. For
this to be effective however, we must include the option -n (the no auto-print option) to
cause sed not to print every line by default.
Next, well try a regular expression:
[me@linuxbox ~]$
SUSE
10.2
SUSE
11.0
SUSE
10.3
SUSE
10.1
By including the slash-delimited regular expression /SUSE/, we are able to isolate the
lines containing it in much the same manner as grep.
Finally, well try negation by adding an exclamation point (!) to the address:
[me@linuxbox ~]$
Fedora
10
Ubuntu
8.04
Fedora
8
Ubuntu
6.10
Fedora
7
Ubuntu
7.10
Ubuntu
7.04
Fedora
6
Fedora
9
Ubuntu
6.06
Ubuntu
8.10
Fedora
5
Here we see the expected result: all of the lines in the file except the ones matched by the
regular expression.
So far, weve looked at two of the sed editing commands, s and p. Here is a more complete list of the basic editing commands:
Table 20-8: sed Basic Editing Commands
Command
Description
293
20 Text Processing
=
s/regexp/replacement/
y/set1/set2
The s command is by far the most commonly used editing command. We will demonstrate just some of its power by performing an edit on our distros.txt file. We discussed before how the date field in distros.txt was not in a computer-friendly format. While the date is formatted MM/DD/YYYY, it would be better (for ease of sorting)
if the format were YYYY-MM-DD. To perform this change on the file by hand would be
both time consuming and error prone, but with sed, this change can be performed in one
step:
[me@linuxbox ~]$ sed 's/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\
294
Wow! Now that is an ugly looking command. But it works. In just one step, we have
changed the date format in our file. It is also a perfect example of why regular expressions are sometimes jokingly referred to as a write-only medium. We can write them,
but we sometimes cannot read them. Before we are tempted to run away in terror from
this command, lets look at how it was constructed. First, we know that the command will
have this basic structure:
sed 's/regexp/replacement/' distros.txt
Our next step is to figure out a regular expression that will isolate the date. Since it is in
MM/DD/YYYY format and appears at the end of the line, we can use an expression like
this:
[0-9]{2}/[0-9]{2}/[0-9]{4}$
which matches two digits, a slash, two digits, a slash, four digits, and the end of line. So
that takes care of regexp, but what about replacement? To handle that, we must introduce
a new regular expression feature that appears in some applications which use BRE. This
feature is called back references and works like this: If the sequence \n appears in replacement where n is a number from 1 to 9, the sequence will refer to the corresponding
subexpression in the preceding regular expression. To create the subexpressions, we simply enclose them in parentheses like so:
295
20 Text Processing
([0-9]{2})/([0-9]{2})/([0-9]{4})$
We now have three subexpressions. The first contains the month, the second contains the
day of the month, and the third contains the year. Now we can construct replacement as
follows:
\3-\1-\2
which gives us the year, a dash, the month, a dash, and the day.
Now, our command looks like this:
sed 's/([0-9]{2})/([0-9]{2})/([0-9]{4})$/\3-\1-\2/' distros.txt
We have two remaining problems. The first is that the extra slashes in our regular expression will confuse sed when it tries to interpret the s command. The second is that since
sed, by default, accepts only basic regular expressions, several of the characters in our
regular expression will be taken as literals, rather than as metacharacters. We can solve
both these problems with a liberal application of backslashes to escape the offending
characters:
sed 's/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/' dis
tros.txt
We see that the replacement was performed, but only to the first instance of the letter b,
while the remaining instances were left unchanged. By adding the g flag, we are able to
change all the instances:
296
So far, we have only given sed single commands via the command line. It is also possible to construct more complex commands in a script file using the -f option. To demonstrate, we will use sed with our distros.txt file to build a report. Our report will
feature a title at the top, our modified dates, and all the distribution names converted to
uppercase. To do this, we will need to write a script, so well fire up our text editor and
enter the following:
# sed script to produce Linux distributions report
1 i\
\
Linux Distributions Report\
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
We will save our sed script as distros.sed and run it like this:
[me@linuxbox ~]$ sed -f distros.sed distros.txt
Linux Distributions Report
SUSE
FEDORA
SUSE
UBUNTU
FEDORA
SUSE
UBUNTU
FEDORA
UBUNTU
UBUNTU
SUSE
FEDORA
FEDORA
UBUNTU
UBUNTU
FEDORA
10.2
10
11.0
8.04
8
10.3
6.10
7
7.10
7.04
10.1
6
9
6.06
8.10
5
2006-12-07
2008-11-25
2008-06-19
2008-04-24
2007-11-08
2007-10-04
2006-10-26
2007-05-31
2007-10-18
2007-04-19
2006-05-11
2006-10-24
2008-05-13
2006-06-01
2008-10-30
2006-03-20
As we can see, our script produces the desired results, but how does it do it? Lets take
297
20 Text Processing
another look at our script. Well use cat to number the lines:
[me@linuxbox ~]$ cat -n distros.sed
1
# sed script to produce Linux distributions report
2
3
1 i\
4
\
5
Linux Distributions Report\
6
7
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)
$/\3-\1-\2/
8
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
Line one of our script is a comment. Like many configuration files and programming languages on Linux systems, comments begin with the # character and are followed by human-readable text. Comments can be placed anywhere in the script (though not within
commands themselves) and are helpful to any humans who might need to identify and/or
maintain the script.
Line 2 is a blank line. Like comments, blank lines may be added to improve readability.
Many sed commands support line addresses. These are used to specify which lines of
the input are to be acted upon. Line addresses may be expressed as single line numbers,
line number ranges, and the special line number $ which indicates the last line of input.
Lines 3 through 6 contain text to be inserted at the address 1, the first line of the input.
The i command is followed by the sequence backslash-carriage return to produce an escaped carriage return, or what is called a line-continuation character. This sequence,
which can be used in many circumstances including shell scripts, allows a carriage return
to be embedded in a stream of text without signaling the interpreter (in this case sed)
that the end of the line has been reached. The i, and likewise, the a (which appends text,
rather than inserting it) and c (which replaces text) commands, allow multiple lines of
text as long as each line, except the last, ends with a line-continuation character. The sixth
line of our script is actually the end of our inserted text and ends with a plain carriage return rather than a line-continuation character, signaling the end of the i command.
Note: A line-continuation character is formed by a backslash followed immediately
by a carriage return. No intermediary spaces are permitted.
Line 7 is our search-and-replace command. Since it is not preceded by an address, each
line in the input stream is subject to its action.
Line 8 performs transliteration of the lowercase letters into uppercase letters. Note that
298
aspell
The last tool we will look at is aspell, an interactive spelling checker. The aspell
program is the successor to an earlier program named ispell, and can be used, for the
most part, as a drop-in replacement. While the aspell program is mostly used by other
programs that require spell-checking capability, it can also be used very effectively as a
stand-alone tool from the command line. It has the ability to intelligently check various
type of text files, including HTML documents, C/C++ programs, email messages, and
other kinds of specialized texts.
To spell check a text file containing simple prose, it could be used like this:
aspell check textfile
where textfile is the name of the file to check. As a practical example, lets create a simple
text file named foo.txt containing some deliberate spelling errors:
[me@linuxbox ~]$ cat > foo.txt
299
20 Text Processing
The quick brown fox jimped over the laxy dog.
As aspell is interactive in the check mode, we will see a screen like this:
The quick brown fox jimped over the laxy dog.
1)
2)
3)
4)
5)
i)
r)
a)
b)
jumped
gimped
comped
limped
pimped
Ignore
Replace
Add
Abort
6)
7)
8)
9)
0)
I)
R)
l)
x)
wimped
camped
humped
impede
umped
Ignore all
Replace all
Add Lower
Exit
At the top of the display, we see our text with a suspiciously spelled word highlighted. In
the middle, we see ten spelling suggestions numbered zero through nine, followed by a
list of other possible actions. Finally, at the very bottom, we see a prompt ready to accept
our choice.
If we press the 1 key, aspell replaces the offending word with the word jumped and
moves on to the next misspelled word, which is laxy. If we select the replacement
lazy, aspell replaces it and terminates. Once aspell has finished, we can examine
our file and see that the misspellings have been corrected:
[me@linuxbox ~]$ cat foo.txt
The quick brown fox jumped over the lazy dog.
Unless told otherwise via the command line option --dont-backup, aspell creates
a backup file containing the original text by appending the extension .bak to the filename.
300
The sed option -i tells sed to edit the file in-place, meaning that rather than sending
the edited output to standard output, it will rewrite the file with the changes applied. We
also see the ability to place more than one editing command on the line by separating
them with a semicolon.
Next, well look at how aspell can handle different kinds of text files. Using a text editor such as vim (the adventurous may want to try sed), we will add some HTML
markup to our file:
<html>
<head>
<title>Mispelled HTML file</title>
</head>
<body>
<p>The quick brown fox jimped over the laxy dog.</p>
</body>
</html>
Now, if we try to spell check our modified file, we run into a problem. If we do it this
way:
[me@linuxbox ~]$ aspell check foo.txt
</head>
<body>
</body>
</html>
301
20 Text Processing
1) HTML
2) ht ml
3) ht-ml
4) Hamel
5) Hamil
6) hotel
i)
r)
a)
b)
I)
R)
l)
x)
Ignore
Replace
Add
Abort
Ignore all
Replace all
Add Lower
Exit
aspell will see the contents of the HTML tags as misspelled. This problem can be
overcome by including the -H (HTML) checking-mode option, like this:
[me@linuxbox ~]$ aspell -H check foo.txt
</head>
<body>
</body>
</html>
1)
2)
3)
4)
5)
i)
r)
a)
b)
Mi spelled
Mi-spelled
Misspelled
Dispelled
Spelled
Ignore
Replace
Add
Abort
6)
7)
8)
9)
0)
I)
R)
l)
x)
Misapplied
Miscalled
Respelled
Misspell
Misled
Ignore all
Replace all
Add Lower
Exit
The HTML is ignored and only the non-markup portions of the file are checked. In this
mode, the contents of HTML tags are ignored and not checked for spelling. However, the
contents of ALT tags, which benefit from checking, are checked in this mode.
302
Summing Up
In this chapter, we have looked at a few of the many command line tools that operate on
text. In the next chapter, we will look at several more. Admittedly, it may not seem immediately obvious how or why you might use some of these tools on a day-to-day basis,
though we have tried to show some semi-practical examples of their use. We will find in
later chapters that these tools form the basis of a tool set that is used to solve a host of
practical problems. This will be particularly true when we get into shell scripting, where
these tools will really show their worth.
Further Reading
The GNU Project website contains many online guides to the tools discussed in this chapter.
sed:
http://www.gnu.org/software/sed/manual/sed.html
aspell:
http://aspell.net/man-html/index.html
303
20 Text Processing
Extra Credit
There are a few more interesting text-manipulation commands worth investigating.
Among these are: split (split files into pieces), csplit (split files into pieces based
on context), and sdiff (side-by-side merge of file differences.)
304
21 Formatting Output
21 Formatting Output
In this chapter, we continue our look at text-related tools, focusing on programs that are
used to format text output, rather than changing the text itself. These tools are often used
to prepare text for eventual printing, a subject that we will cover in the next chapter. The
programs that we will cover in this chapter include:
nl Number lines
nl Number Lines
The nl program is a rather arcane tool used to perform a simple task. It numbers lines. In
its simplest use, it resembles cat -n:
[me@linuxbox ~]$ nl distros.txt | head
1
SUSE
10.2 12/07/2006
2
Fedora
10
11/25/2008
3
SUSE
11.0 06/19/2008
4
Ubuntu
8.04 04/24/2008
5
Fedora
8
11/08/2007
6
SUSE
10.3 10/04/2007
7
Ubuntu
6.10 10/26/2006
305
21 Formatting Output
8
9
10
Fedora
Ubuntu
Ubuntu
7
7.10
7.04
05/31/2007
10/18/2007
04/19/2007
Like cat, nl can accept either multiple files as command line arguments, or standard input. However, nl has a number of options and supports a primitive form of markup to allow more complex kinds of numbering.
nl supports a concept called logical pages when numbering. This allows nl to reset
(start over) the numerical sequence when numbering. Using options, it is possible to set
the starting number to a specific value and, to a limited extent, its format. A logical page
is further broken down into a header, body, and footer. Within each of these sections, line
numbering may be reset and/or be assigned a different style. If nl is given multiple files,
it treats them as a single stream of text. Sections in the text stream are indicated by the
presence of some rather odd-looking markup added to the text:
Table 21-1: nl Markup
Markup
\:\:\:
Meaning
\:\:
\:
Each of the above markup elements must appear alone on its own line. After processing a
markup element, nl deletes it from the text stream.
Here are the common options for nl:
Table 21-2: Common nl Options
Option
-b style
Meaning
-f style
-h style
306
-n format
-p
-s string
-v number
Set first line number of each logical page to number. Default is one.
-w width
Admittedly, we probably wont be numbering lines that often, but we can use nl to look
at how we can combine multiple tools to perform more complex tasks. We will build on
our work in the previous chapter to produce a Linux distributions report. Since we will be
using nl, it will be useful to include its header/body/footer markup. To do this, we will
add it to the sed script from the last chapter. Using our text editor, we will change the
script as follows and save it as distros-nl.sed:
# sed script to produce Linux distributions report
1 i\
\\:\\:\\:\
\
Linux Distributions Report\
\
Name
Ver. Released\
------- --------\
\\:\\:
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
$ a\
\\:\
\
End Of Report
The script now inserts the nl logical page markup and adds a footer at the end of the report. Note that we had to double up the backslashes in our markup, because they are normally interpreted as an escape character by sed.
Next, well produce our enhanced report by combining sort, sed, and nl:
307
21 Formatting Output
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-nl.s
ed | nl
Linux Distributions Report
Name
---1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Fedora
Fedora
Fedora
Fedora
Fedora
Fedora
SUSE
SUSE
SUSE
SUSE
Ubuntu
Ubuntu
Ubuntu
Ubuntu
Ubuntu
Ubuntu
Ver.
----
Released
--------
5
6
7
8
9
10
10.1
10.2
10.3
11.0
6.06
6.10
7.04
7.10
8.04
8.10
2006-03-20
2006-10-24
2007-05-31
2007-11-08
2008-05-13
2008-11-25
2006-05-11
2006-12-07
2007-10-04
2008-06-19
2006-06-01
2006-10-26
2007-04-19
2007-10-18
2008-04-24
2008-10-30
End Of Report
Our report is the result of our pipeline of commands. First, we sort the list by distribution
name and version (fields 1 and 2), then we process the results with sed, adding the report header (including the logical page markup for nl) and footer. Finally, we process the
result with nl, which, by default, only numbers the lines of the text stream that belong to
the body section of the logical page.
We can repeat the command and experiment with different options for nl. Some interesting ones are:
nl -n rz
and
nl -w 3 -s ' '
308
Here we see fold in action. The text sent by the echo command is broken into segments specified by the -w option. In this example, we specify a line width of 12 characters. If no width is specified, the default is 80 characters. Notice how the lines are broken
regardless of word boundaries. The addition of the -s option will cause fold to break
the line at the last available space before the line width is reached:
[me@linuxbox ~]$ echo "The quick brown fox jumped over the lazy dog."
| fold -w 12 -s
The quick
brown fox
jumped over
the lazy
dog.
309
21 Formatting Output
preserved in the output; successive input lines with different
indentation are not joined; tabs are expanded on input and introduced
on output.
`fmt' prefers breaking lines at the end of a sentence, and tries
to avoid line breaks after the first word of a sentence or before the
last word of a sentence. A "sentence break" is defined as either the
end of a paragraph or a word ending in any of `.?!', followed by two
spaces or end of line, ignoring any intervening parentheses or
quotes. Like TeX, `fmt' reads entire "paragraphs" before choosing
line breaks; the algorithm is a variant of that given by Donald E.
Knuth and Michael F. Plass in "Breaking Paragraphs Into Lines",
`Software--Practice & Experience' 11, 11 (November 1981), 1119-1184.
Well copy this text into our text editor and save the file as fmt-info.txt. Now, lets
say we wanted to reformat this text to fit a fifty character wide column. We could do this
by processing the file with fmt and the -w option:
[me@linuxbox ~]$ fmt -w 50 fmt-info.txt | head
`fmt' reads from the specified FILE arguments
(or standard input if
none are given), and writes to standard output.
By default, blank lines, spaces between words,
and indentation are
preserved in the output; successive input lines
with different indentation are not joined; tabs
are expanded on input and introduced on output.
Well, thats an awkward result. Perhaps we should actually read this text, since it explains
whats going on:
By default, blank lines, spaces between words, and indentation are preserved in the
output; successive input lines with different indentation are not joined; tabs are
expanded on input and introduced on output.
So, fmt is preserving the indentation of the first line. Fortunately, fmt provides an option to correct this:
[me@linuxbox ~]$ fmt -cw 50 fmt-info.txt
`fmt' reads from the specified FILE arguments
(or standard input if none are given), and writes
to standard output.
310
Much better. By adding the -c option, we now have the desired result.
fmt has some interesting options:
Table 21-3: fmt Options
Option
Description
-c
-p string
Only format those lines beginning with the prefix string. After
formatting, the contents of string are prefixed to each reformatted
line. This option can be used to format text in source code
comments. For example, any programming language or
configuration file that uses a # character to delineate a comment
could be formatted by specifying -p '# ' so that only the
comments will be formatted. See the example below.
-s
Split-only mode. In this mode, lines will only be split to fit the
specified column width. Short lines will not be joined to fill lines.
This mode is useful when formatting text such as code where
joining is not desired.
-u
311
21 Formatting Output
style formatting to the text. This means a single space between
words and two spaces between sentences. This mode is useful for
removing justification, that is, text that has been padded with
spaces to force alignment on both the left and right margins.
-w width
The -p option is particularly interesting. With it, we can format selected portions of a
file, provided that the lines to be formatted all begin with the same sequence of characters. Many programming languages use the pound sign (#) to indicate the beginning of a
comment and thus can be formatted using this option. Lets create a file that simulates a
program that uses comments:
[me@linuxbox ~]$ cat > fmt-code.txt
# This file contains code with comments.
# This line is a comment.
# Followed by another comment line.
# And another.
This, on the other hand, is a line of code.
And another line of code.
And another.
Our sample file contains comments which begin with the string # (a # followed by a
space) and lines of code which do not. Now, using fmt, we can format the comments
and leave the code untouched:
[me@linuxbox ~]$ fmt -w 50 -p '# ' fmt-code.txt
# This file contains code with comments.
# This line is a comment. Followed by another
# comment line. And another.
This, on the other hand, is a line of code.
And another line of code.
And another.
Notice that the adjoining comment lines are joined, while the blank lines and the lines
that do not begin with the specified prefix are preserved.
312
2008-12-11 18:27
SUSE
Fedora
SUSE
Ubuntu
Fedora
10.2
10
11.0
8.04
8
10.3
6.10
7
7.10
7.04
Page 1
distros.txt
Page 2
12/07/2006
11/25/2008
06/19/2008
04/24/2008
11/08/2007
2008-12-11 18:27
SUSE
Ubuntu
Fedora
Ubuntu
Ubuntu
distros.txt
10/04/2007
10/26/2006
05/31/2007
10/18/2007
04/19/2007
In this example, we employ the -l option (for page length) and the -w option (page
width) to define a page that is 65 columns wide and 15 lines long. pr paginates the
contents of the distros.txt file, separates each page with several lines of whitespace
and creates a default header containing the file modification time, filename, and page
number. The pr program provides many options to control page layout. Well take a look
at more of them in the next chapter.
313
21 Formatting Output
The format string may contain literal text (like I formatted the string:), escape sequences (such as \n, a newline character), and sequences beginning with the % character,
which are called conversion specifications. In the example above, the conversion specification %s is used to format the string foo and place it in the commands output. Here it
is again:
[me@linuxbox ~]$ printf "I formatted '%s' as a string.\n" foo
I formatted 'foo' as a string.
As we can see, the %s conversion specification is replaced by the string foo in the commands output. The s conversion is used to format string data. There are other specifiers
for other kinds of data. This table lists the commonly used data types:
Table 21-4: Common printf Data Type Specifiers
Specifier
d
Description
314
Format a string.
Well demonstrate the effect each of the conversion specifiers on the string 380:
[me@linuxbox ~]$ printf "%d, %f, %o, %s, %x, %X\n" 380 380 380 380
380 380
380, 380.000000, 574, 380, 17c, 17C
Since we specified six conversion specifiers, we must also supply six arguments for
printf to process. The six results show the effect of each specifier.
Several optional components may be added to the conversion specifier to adjust its output. A complete conversion specification may consist of the following:
%[flags][width][.precision]conversion_specification
Multiple optional components, when used, must appear in the order specified above to be
properly interpreted. Here is a description of each:
Table 21-5: printf Conversion Specification Components
Component
Description
flags
315
21 Formatting Output
signs negative numbers.
width
.precision
Format
Result
Notes
380
"%d"
380
Simple formatting of an
integer.
380
"%#x"
0x17c
Integer formatted as a
hexadecimal number using
the alternate format flag.
380
"%05d"
00380
380
"%05.5f"
380.00000
Number formatted as a
floating point number with
padding and five decimal
places of precision. Since
the specified minimum
field width (5) is less than
the actual width of the
formatted number, the
padding has no effect.
380
"%010.5f"
0380.00000
By increasing the
minimum field width to 10
the padding is now visible.
380
"%+d"
+380
380
"%-d"
380
316
"%5s"
abcedfghijk
abcdefghijk
"%.5s"
abcde
By applying precision to a
string, it is truncated.
Again, printf is used mostly in scripts where it is employed to format tabular data,
rather than on the command line directly. But we can still show how it can be used to
solve various formatting problems. First, lets output some fields separated by tab characters:
[me@linuxbox ~]$ printf "%s\t%s\t%s\n" str1 str2 str3
str1 str2 str3
By inserting \t (the escape sequence for a tab), we achieve the desired effect. Next,
some numbers with neat formatting:
[me@linuxbox ~]$ printf "Line: %05d %15.3f Result: %+15d\n" 1071
3.14156295 32589
Line: 01071
3.142 Result:
+32589
This shows the effect of minimum field width on the spacing of the fields. Or how about
formatting a tiny web page:
[me@linuxbox ~]$ printf "<html>\n\t<head>\n\t\t<title>%s</title>\n
\t</head>\n\t<body>\n\t\t<p>%s</p>\n\t</body>\n</html>\n" "Page Tit
le" "Page Content"
<html>
<head>
<title>Page Title</title>
</head>
<body>
<p>Page Content</p>
</body>
</html>
21 Formatting Output
ple tasks, but what about larger jobs? One of the reasons that Unix became a popular operating system among technical and scientific users (aside from providing a powerful
multitasking, multiuser environment for all kinds of software development) is that it offered tools that could be used to produce many types of documents, particularly scientific
and academic publications. In fact, as the GNU documentation describes, document
preparation was instrumental to the development of Unix:
The first version of UNIX was developed on a PDP-7 which was sitting around Bell
Labs. In 1971 the developers wanted to get a PDP-11 for further work on the
operating system. In order to justify the cost for this system, they proposed that they
would implement a document formatting system for the AT&T patents division. This
first formatting program was a reimplementation of McIllroy's `roff', written by J.
F. Ossanna.
Two main families of document formatters dominate the field: those descended from the
original roff program, including nroff and troff, and those based on Donald
Knuths TEX (pronounced tek) typesetting system. And yes, the dropped E in the
middle is part of its name.
The name roff is derived from the term run off as in, Ill run off a copy for you.
The nroff program is used to format documents for output to devices that use
monospaced fonts, such as character terminals and typewriter-style printers. At the time
of its introduction, this included nearly all printing devices attached to computers. The
later troff program formats documents for output on typesetters, devices used to produce camera-ready type for commercial printing. Most computer printers today are able
to simulate the output of typesetters. The roff family also includes some other programs
that are used to prepare portions of documents. These include eqn (for mathematical
equations) and tbl (for tables).
The TEX system (in stable form) first appeared in 1989 and has, to some degree, displaced troff as the tool of choice for typesetter output. We wont be covering T EX
here, due both to its complexity (there are entire books about it) and to the fact that it is
not installed by default on most modern Linux systems.
Tip: For those interested in installing TEX, check out the texlive package
which can be found in most distribution repositories, and the LyX graphical content
editor.
groff
groff is a suite of programs containing the GNU implementation of troff. It also includes a script that is used to emulate nroff and the rest of the roff family as well.
318
Compared to the man page in its normal presentation, we can begin to see a correlation
between the markup language and its results:
[me@linuxbox ~]$ man ls | head
LS(1)
User Commands
NAME
LS(1)
319
21 Formatting Output
SYNOPSIS
ls [OPTION]... [FILE]...
The reason this is of interest is that man pages are rendered by groff, using the mandoc macro package. In fact, we can simulate the man command with the following pipeline:
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc -T
ascii | head
LS(1)
User Commands
LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
Here we use the groff program with the options set to specify the mandoc macro
package and the output driver for ASCII. groff can produce output in several formats.
If no format is specified, PostScript is output by default:
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | groff -mandoc |
head
%!PS-Adobe-3.0
%%Creator: groff version 1.18.1
%%CreationDate: Thu Feb 5 13:44:37 2009
%%DocumentNeededResources: font Times-Roman
%%+ font Times-Bold
%%+ font Times-Italic
%%DocumentSuppliedResources: procset grops 1.18 1
%%Pages: 4
%%PageOrder: Ascend
%%Orientation: Portrait
We briefly mentioned PostScript in the previous chapter, and will again in the next chapter. PostScript is a page description language that is used to describe the contents of a
printed page to a typesetter-like device. If we take the output of our command and store it
to a file (assuming that we are using a graphical desktop with a Desktop directory):
320
An icon for the output file should appear on the desktop. By double-clicking the icon, a
page viewer should start up and reveal the file in its rendered form:
The ps2pdf program is part of the ghostscript package, which is installed on most
Linux systems that support printing.
Tip: Linux systems often include many command line programs for file format
321
21 Formatting Output
conversion. They are often named using the convention of format2format. Try using the command ls /usr/bin/*[[:alpha:]]2[[:alpha:]]* to identify them. Also try searching for programs named formattoformat.
For our last exercise with groff, we will revisit our old friend distros.txt once
more. This time, we will use the tbl program which is used to format tables to typeset
our list of Linux distributions. To do this, we are going to use our earlier sed script to
add markup to a text stream that we will feed to groff.
First, we need to modify our sed script to add the necessary requests that tbl requires.
Using a text editor, we will change distros.sed to the following:
# sed script to produce Linux distributions report
1 i\
.TS\
center box;\
cb s s\
cb cb cb\
l n c.\
Linux Distributions Report\
=\
Name Version
Released\
_
s/\([0-9]\{2\}\)\/\([0-9]\{2\}\)\/\([0-9]\{4\}\)$/\3-\1-\2/
$ a\
.TE
Note that for the script to work properly, care must been taken to see that the words
Name Version Released are separated by tabs, not spaces. Well save the resulting file
as distros-tbl.sed. tbl uses the .TS and .TE requests to start and end the table.
The rows following the .TS request define global properties of the table which, for our
example, are centered horizontally on the page and surrounded by a box. The remaining
lines of the definition describe the layout of each table row. Now, if we run our reportgenerating pipeline again with the new sed script, well get the following :
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t -T ascii 2>/dev/null
+------------------------------+
| Linux Distributions Report
|
+------------------------------+
| Name
Version
Released |
322
Adding the -t option to groff instructs it to pre-process the text stream with tbl.
Likewise, the -T option is used to output to ASCII rather than the default output medium,
PostScript.
The format of the output is the best we can expect if we are limited to the capabilities of a
terminal screen or typewriter-style printer. If we specify PostScript output and graphically
view the resulting output, we get a much more satisfying result:
[me@linuxbox ~]$ sort -k 1,1 -k 2n distros.txt | sed -f distros-tbl
.sed | groff -t > ~/Desktop/foo.ps
323
21 Formatting Output
Summing Up
Given that text is so central to the character of Unix-like operating systems, it makes
sense that there would be many tools that are used to manipulate and format text. As we
have seen, there are! The simple formatting tools like fmt and pr will find many uses in
scripts that produce short documents, while groff (and friends) can be used to write
books. We may never write a technical paper using command line tools (though there are
many people who do!), but its good to know that we could.
Further Reading
groff Users Guide
http://www.gnu.org/software/groff/manual/
324
Further Reading
http://docs.freebsd.org/44doc/usd/20.meref/paper.pdf
325
22 Printing
22 Printing
After spending the last couple of chapters manipulating text, its time to put that text on
paper. In this chapter, well look at the command line tools that are used to print files and
control printer operation. We wont be looking at how to configure printing, as that varies
from distribution to distribution and is usually set up automatically during installation.
Note that we will need a working printer configuration to perform the exercises in this
chapter.
We will discuss the following commands:
Character-based Printers
The printer technology of the 80s was very different in two respects. First, printers of that
period were almost always impact printers. Impact printers use a mechanical mechanism
which strikes a ribbon against the paper to form character impressions on the page. Two
of the popular technologies of that time were daisy-wheel printing and dot-matrix printing.
The second, and more important characteristic of early printers was that printers used a
fixed set of characters that were intrinsic to the device itself. For example, a daisy-wheel
printer could only print the characters actually molded into the petals of the daisy wheel.
This made the printers much like high-speed typewriters. As with most typewriters, they
printed using monospaced (fixed width) fonts. This means that each character has the
same width. Printing was done at fixed positions on the page, and the printable area of a
page contained a fixed number of characters. Most printers printed ten characters per inch
(CPI) horizontally and six lines per inch (LPI) vertically. Using this scheme, a US-letter
sheet of paper is 85 characters wide and 66 lines high. Taking into account a small margin
on each side, 80 characters was considered the maximum width of a print line. This explains why terminal displays (and our terminal emulators) are normally 80 characters
wide. It provides a WYSIWYG (What You See Is What You Get) view of printed output,
using a monospaced font.
Data is sent to a typewriter-like printer in a simple stream of bytes containing the characters to be printed. For example, to print an a, the ASCII character code 97 is sent. In addition, the low-numbered ASCII control codes provided a means of moving the printers
carriage and paper, using codes for carriage return, line feed, form feed, etc. Using the
control codes, its possible to achieve some limited font effects, such as boldface, by having the printer print a character, backspace, and print the character again to get a darker
print impression on the page. We can actually witness this if we use nroff to render a
man page and examine the output using cat -A:
[me@linuxbox ~]$ zcat /usr/share/man/man1/ls.1.gz | nroff -man | cat
-A | head
LS(1)
User Commands
LS(1)
$
$
$
N^HNA^HAM^HME^HE$
ls - list directory contents$
327
22 Printing
$
S^HSY^HYN^HNO^HOP^HPS^HSI^HIS^HS$
l^Hls^Hs [_^HO_^HP_^HT_^HI_^HO_^HN]... [_^HF_^HI_^HL_^HE]...$
The ^H (Control-h) characters are the backspaces used to create the boldface effect. Likewise, we can also see a backspace/underscore sequence used to produce underlining.
Graphical Printers
The development of GUIs led to major changes in printer technology. As computers
moved to more picture-based displays, printing moved from character-based to graphical
techniques. This was facilitated by the advent of the low-cost laser printer which, instead
of printing fixed characters, could print tiny dots anywhere in the printable area of the
page. This made printing proportional fonts (like those used by typesetters), and even
photographs and high-quality diagrams, possible.
However, moving from a character-based scheme to a graphical scheme presented a formidable technical challenge. Heres why: The number of bytes needed to fill a page using
a character-based printer can be calculated this way (assuming 60 lines per page each
containing 80 characters):
60 X 80 = 4800 bytes
In comparison, a 300 dot per inch (DPI) laser printer (assuming an 8 by 10 inch print area
per page) requires:
(8 X 300) X (10 X 300) / 8 = 900000 bytes
Many of the slow PC networks simply could not handle the nearly one megabyte of data
required to print a full page on a laser printer, so it was clear that a clever invention was
needed.
That invention turned out to be the page description language (PDL). A page description
language is a programming language that describes the contents of a page. Basically it
says, go to this position, draw the character a in 10 point Helvetica, go to this
position... until everything on the page is described. The first major PDL was PostScript
from Adobe Systems, which is still in wide use today. The PostScript language is a complete programming language tailored for typography and other kinds of graphics and
imaging. It includes built-in support for 35 standard, high-quality fonts, plus the ability to
accept additional font definitions at run time. At first, support for PostScript was built
into the printers themselves. This solved the data transmission problem. While the typical
PostScript program was very verbose in comparison to the simple byte stream of character-based printers, it was much smaller than the number of bytes required to represent the
entire printed page.
A PostScript printer accepted a PostScript program as input. The printer contained its
328
Description
329
22 Printing
+first[:last]
-columns
-a
-d
Double-space output.
-D format
-f
-h header
-l length
-n
Number lines.
-o offset
-w width
pr is often used in pipelines as a filter. In this example, we will produce a directory listing of /usr/bin and format it into paginated, three-column output using pr:
[me@linuxbox ~]$
ls /usr/bin | pr -3 -w 65 | head
2009-02-18 14:00
[
411toppm
a2p
a2ps
a2ps-lpr-wrapper
330
Page 1
apturl
ar
arecord
arecordmidi
ark
bsd-write
bsh
btcflash
bug-buddy
buildhash
and the report would be sent to the systems default printer. To send the file to a different
printer, the -P option can used like this:
lpr -P printer_name
where printer_name is the name of the desired printer. To see a list of printers known to
the system:
[me@linuxbox ~]$ lpstat -a
Tip: Many Linux distributions allow you to define a printer that outputs files in
PDF (Portable Document Format), rather than printing on the physical printer. This
is very handy for experimenting with printing commands. Check your printer configuration program to see if it supports this configuration. On some distributions,
you may need to install additional packages (such as cups-pdf) to enable this capability.
Here are some of the common options for lpr:
331
22 Printing
Table 22-2: Common lpr Options
Option
Description
-# number
-p
Print each page with a shaded header with the date, time, job
name, and page number. This so-called pretty print option
can be used when printing text files.
-P printer
-r
Description
-n number
-o landscape
-o fitplot
-o scaling=number
-o cpi=number
-o lpi=number
332
page-bottom=points
page-left=points
page-right=points
page-top=points
-P pages
Well produce our directory listing again, this time printing 12 CPI and 8 LPI with a left
margin of one half inch. Note that we have to adjust the pr options to account for the
new page size:
[me@linuxbox ~]$ ls /usr/bin | pr -4 -w 90 -l 88 | lp -o page-left=36
-o cpi=12 -o lpi=8
This pipeline produces a four-column listing using smaller type than the default. The increased number of characters per inch allows us to fit more columns on the page.
Here we filter the stream with pr, using the -t option (omit headers and footers) and
then with a2ps, specifying an output file (-o option) and 66 lines per page (-L option)
333
22 Printing
to match the output pagination of pr. If we view the resulting file with a suitable file
viewer, we will see this:
Description
--center-title text
--columns number
334
--guess
--left-footer text
--left-title text
--line-numbers=interval
--list=defaults
--list=topic
--pages range
--right-footer text
--right-title text
--rows number
-B
No page headers.
-b text
-f size
-l number
-L number
-M name
-n number
335
22 Printing
-o file
-P printer
-R
Portrait orientation.
-r
Landscape orientation.
-T number
-u text
336
Further, we could determine a more detailed description of the print system configuration
this way:
[me@linuxbox ~]$ lpstat -s
system default destination: printer
device for PDF: cups-pdf:/
device for printer: ipp://print-server:631/printers/printer
In this example, we see that printer is the systems default printer and that it is a network printer using Internet Printing Protocol (ipp://) attached to a system named printserver.
The commonly useful options include:
Table 22-5: Common lpstat Options
Option
-a [printer...]
Description
-d
-p [printer...]
-r
-s
-t
Display the state of the printer queue for printer. Note that
this is the status of the printer queues ability to accept
jobs, not the status of the physical printers. If no printers
are specified, all print queues are shown.
337
22 Printing
[me@linuxbox ~]$ lpq
printer is ready
no entries
If we do not specify a printer (using the -P option), the systems default printer is shown.
If we send a job to the printer and then look at the queue, we will see it listed:
[me@linuxbox ~]$ ls *.txt | pr -3 | lp
request id is printer-603 (1 file(s))
[me@linuxbox ~]$ lpq
printer is ready and printing
Rank
Owner
Job
File(s)
active me
603
(stdin)
Total Size
1024 bytes
Each command has options for removing all the jobs belonging to a particular user, particular printer, and multiple job numbers. Their respective man pages have all the details.
Summing Up
In this chapter, we have seen how the printers of the past influenced the design of the
printing systems on Unix-like machines, and how much control is available on the command line to control not only the scheduling and execution of print jobs, but also the various output options.
Further Reading
338
Further Reading
339
23 Compiling Programs
23 Compiling Programs
In this chapter, we will look at how to build programs by compiling source code. The
availability of source code is the essential freedom that makes Linux possible. The entire
ecosystem of Linux development relies on free exchange between developers. For many
desktop users, compiling is a lost art. It used to be quite common, but today, distribution
providers maintain huge repositories of precompiled binaries, ready to download and use.
At the time of this writing, the Debian repository (one of the largest of any of the distributions) contains almost 23,000 packages.
So why compile software? There are two reasons:
1. Availability. Despite the number of precompiled programs in distribution repositories, some distributions may not include all the desired applications. In this case,
the only way to get the desired program is to compile it from source.
2. Timeliness. While some distributions specialize in cutting edge versions of programs, many do not. This means that in order to have the very latest version of a
program, compiling is necessary.
Compiling software from source code can become very complex and technical; well beyond the reach of many users. However, many compiling tasks are quite easy and involve
only a few steps. It all depends on the package. We will look at a very simple case in order to provide an overview of the process and as a starting point for those who wish to
undertake further study.
We will introduce one new command:
What Is Compiling?
Simply put, compiling is the process of translating source code (the human-readable description of a program written by a programmer) into the native language of the computers processor.
The computers processor (or CPU) works at a very elemental level, executing programs
in what is called machine language. This is a numeric code that describes very small operations, such as add this byte, point to this location in memory, or copy this byte.
340
What Is Compiling?
Each of these instructions is expressed in binary (ones and zeros). The earliest computer
programs were written using this numeric code, which may explain why programmers
who wrote it were said to smoke a lot, drink gallons of coffee, and wear thick glasses.
This problem was overcome by the advent of assembly language, which replaced the numeric codes with (slightly) easier to use character mnemonics such as CPY (for copy) and
MOV (for move). Programs written in assembly language are processed into machine
language by a program called an assembler. Assembly language is still used today for
certain specialized programming tasks, such as device drivers and embedded systems.
We next come to what are called high-level programming languages. They are called this
because they allow the programmer to be less concerned with the details of what the processor is doing and more with solving the problem at hand. The early ones (developed
during the 1950s) included FORTRAN (designed for scientific and technical tasks) and
COBOL (designed for business applications). Both are still in limited use today.
While there are many popular programming languages, two predominate. Most programs
written for modern systems are written in either C or C++. In the examples to follow, we
will be compiling a C program.
Programs written in high-level programming languages are converted into machine language by processing them with another program, called a compiler. Some compilers
translate high-level instructions into assembly language and then use an assembler to perform the final stage of translation into machine language.
A process often used in conjunction with compiling is called linking. There are many
common tasks performed by programs. Take, for instance, opening a file. Many programs
perform this task, but it would be wasteful to have each program implement its own routine to open files. It makes more sense to have a single piece of programming that knows
how to open files and to allow all programs that need it to share it. Providing support for
common tasks is accomplished by what are called libraries. They contain multiple routines, each performing some common task that multiple programs can share. If we look in
the /lib and /usr/lib directories, we can see where many of them live. A program
called a linker is used to form the connections between the output of the compiler and the
libraries that the compiled program requires. The final result of this process is the executable program file, ready for use.
23 Compiling Programs
general, interpreted programs execute much more slowly than compiled programs. This is
because that each source code instruction in an interpreted program is translated every
time it is carried out, whereas with a compiled program, a source code instruction is only
translated once, and this translation is permanently recorded in the final executable file.
So why are interpreted languages so popular? For many programming chores, the results
are fast enough, but the real advantage is that it is generally faster and easier to develop
interpreted programs than compiled programs. Programs are usually developed in a repeating cycle of code, compile, test. As a program grows in size, the compilation phase of
the cycle can become quite long. Interpreted languages remove the compilation step and
thus speed up program development.
Compiling A C Program
Lets compile something. Before we do that however, were going to need some tools like
the compiler, the linker, and make. The C compiler used almost universally in the Linux
environment is called gcc (GNU C Compiler), originally written by Richard Stallman.
Most distributions do not install gcc by default. We can check to see if the compiler is
present like this:
[me@linuxbox ~]$ which gcc
/usr/bin/gcc
342
Compiling A C Program
[me@linuxbox ~]$ mkdir src
[me@linuxbox ~]$ cd src
[me@linuxbox src]$ ftp ftp.gnu.org
Connected to ftp.gnu.org.
220 GNU FTP server ready.
Name (ftp.gnu.org:me): anonymous
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd gnu/diction
250 Directory successfully changed.
ftp> ls
200 PORT command successful. Consider using PASV.
150 Here comes the directory listing.
-rw-r--r-1 1003 65534
68940 Aug 28 1998 diction-0.7.tar.gz
-rw-r--r-1 1003 65534
90957 Mar 04 2002 diction-1.02.tar.gz
-rw-r--r-1 1003 65534 141062 Sep 17 2007 diction-1.11.tar.gz
226 Directory send OK.
ftp> get diction-1.11.tar.gz
local: diction-1.11.tar.gz remote: diction-1.11.tar.gz
200 PORT command successful. Consider using PASV.
150 Opening BINARY mode data connection for diction-1.11.tar.gz
(141062 bytes).
226 File send OK.
141062 bytes received in 0.16 secs (847.4 kB/s)
ftp> bye
221 Goodbye.
[me@linuxbox src]$ ls
diction-1.11.tar.gz
Note: Since we are the maintainer of this source code while we compile it, we
will keep it in ~/src. Source code installed by your distribution will be installed
in /usr/src, while source code intended for use by multiple users is usually installed in /usr/local/src.
As we can see, source code is usually supplied in the form of a compressed tar file.
Sometimes called a tarball, this file contains the source tree, or hierarchy of directories
and files that comprise the source code. After arriving at the ftp site, we examine the list
of tar files available and select the newest version for download. Using the get command within ftp, we copy the file from the ftp server to the local machine.
Once the tar file is downloaded, it must be unpacked. This is done with the tar program:
[me@linuxbox src]$ tar xzf diction-1.11.tar.gz
343
23 Compiling Programs
[me@linuxbox src]$ ls
diction-1.11
diction-1.11.tar.gz
Tip: The diction program, like all GNU Project software, follows certain standards for source code packaging. Most other source code available in the Linux
ecosystem also follows this standard. One element of the standard is that when the
source code tar file is unpacked, a directory will be created which contains the
source tree, and that this directory will be named project-x.xx, thus containing both
the projects name and its version number. This scheme allows easy installation of
multiple versions of the same program. However, it is often a good idea to examine
the layout of the tree before unpacking it. Some projects will not create the directory, but instead will deliver the files directly into the current directory. This will
make a mess in your otherwise well-organized src directory. To avoid this, use the
following command to examine the contents of the tar file:
tar tzvf tarfile | head
nl
nl.po
README
sentence.c
sentence.h
style.1.in
style.c
test
In it, we see a number of files. Programs belonging to the GNU Project, as well as many
others, will supply the documentation files README, INSTALL, NEWS, and COPYING.
These files contain the description of the program, information on how to build and install it, and its licensing terms. It is always a good idea to carefully read the README and
INSTALL files before attempting to build the program.
344
Compiling A C Program
The other interesting files in this directory are the ones ending with .c and .h:
[me@linuxbox diction-1.11]$ ls *.c
diction.c getopt1.c getopt.c misc.c sentence.c
[me@linuxbox diction-1.11]$ ls *.h
getopt.h getopt_int.h misc.h sentence.h
style.c
The .c files contain the two C programs supplied by the package (style and diction), divided into modules. It is common practice for large programs to be broken into
smaller, easier to manage pieces. The source code files are ordinary text and can be examined with less:
[me@linuxbox diction-1.11]$ less diction.c
The .h files are known as header files. These, too, are ordinary text. Header files contain
descriptions of the routines included in a source code file or library. In order for the compiler to connect the modules, it must receive a description of all the modules needed to
complete the entire program. Near the beginning of the diction.c file, we see this
line:
#include "getopt.h"
This instructs the compiler to read the file getopt.h as it reads the source code in
diction.c in order to know whats in getopt.c. The getopt.c file supplies
routines that are shared by both the style and diction programs.
Above the include statement for getopt.h, we see some other include statements
such as these:
#include
#include
#include
#include
#include
<regex.h>
<stdio.h>
<stdlib.h>
<string.h>
<unistd.h>
These also refer to header files, but they refer to header files that live outside the current
source tree. They are supplied by the system to support the compilation of every program.
If we look in /usr/include, we can see them:
345
23 Compiling Programs
[me@linuxbox diction-1.11]$ ls /usr/include
The header files in this directory were installed when we installed the compiler.
The configure program is a shell script which is supplied with the source tree. Its job
is to analyze the build environment. Most source code is designed to be portable. That is,
it is designed to build on more than one kind of Unix-like system. But in order to do that,
the source code may need to undergo slight adjustments during the build to accommodate
differences between systems. configure also checks to see that necessary external
tools and components are installed. Lets run configure. Since configure is not located where the shell normally expects programs to be located, we must explicitly tell the
shell its location by prefixing the command with ./ to indicate that the program is located in the current working directory:
[me@linuxbox diction-1.11]$ ./configure
configure will output a lot of messages as it tests and configures the build. When it
finishes, it will look something like this:
checking libintl.h presence... yes
checking for libintl.h... yes
checking for library containing gettext... none required
configure: creating ./config.status
config.status: creating Makefile
config.status: creating diction.1
config.status: creating diction.texi
config.status: creating diction.spec
config.status: creating style.1
config.status: creating test/rundiction
config.status: creating config.h
[me@linuxbox diction-1.11]$
346
Compiling A C Program
Whats important here is that there are no error messages. If there were, the configuration
failed, and the program will not build until the errors are corrected.
We see configure created several new files in our source directory. The most important one is Makefile. Makefile is a configuration file that instructs the make program exactly how to build the program. Without it, make will refuse to run. Makefile
is an ordinary text file, so we can view it:
[me@linuxbox diction-1.11]$ less Makefile
The make program takes as input a makefile (which is normally named Makefile), that
describes the relationships and dependencies among the components that comprise the
finished program.
The first part of the makefile defines variables that are substituted in later sections of the
makefile. For example we see the line:
CC=
gcc
which defines the C compiler to be gcc. Later in the makefile, we see one instance
where it gets used:
diction:
A substitution is performed here, and the value $(CC) is replaced by gcc at run time.
Most of the makefile consists of lines, which define a target, in this case the executable
file diction, and the files on which it is dependent. The remaining lines describe the
command(s) needed to create the target from its components. We see in this example that
the executable file diction (one of the final end products) depends on the existence of
diction.o, sentence.o, misc.o, getopt.o, and getopt1.o. Later on, in the
makefile, we see definitions of each of these as targets:
diction.o:
getopt.o:
getopt1.o:
misc.o:
347
23 Compiling Programs
sentence.o:
style.o:
However, we dont see any command specified for them. This is handled by a general target, earlier in the file, that describes the command used to compile any .c file into a .o
file:
.c.o:
$(CC) -c $(CPPFLAGS) $(CFLAGS) $<
This all seems very complicated. Why not simply list all the steps to compile the parts
and be done with it? The answer to this will become clear in a moment. In the meantime,
lets run make and build our programs:
[me@linuxbox diction-1.11]$ make
The make program will run, using the contents of Makefile to guide its actions. It will
produce a lot of messages.
When it finishes, we will see that all the targets are now present in our directory:
[me@linuxbox diction-1.11]$ ls
config.guess
de.po
config.h
diction
config.h.in
diction.1
config.log
diction.1.in
config.status diction.c
config.sub
diction.o
configure
diction.pot
configure.in
diction.spec
COPYING
diction.spec.in
de
diction.texi
de.mo
diction.texi.in
en
en_GB
en_GB.mo
en_GB.po
getopt1.c
getopt1.o
getopt.c
getopt.h
getopt_int.h
getopt.o
INSTALL
install-sh
Makefile
Makefile.in
misc.c
misc.h
misc.o
NEWS
nl
nl.mo
nl.po
README
sentence.c
sentence.h
sentence.o
style
style.1
style.1.in
style.c
style.o
test
Among the files, we see diction and style, the programs that we set out to build.
Congratulations are in order! We just compiled our first programs from source code!
But just out of curiosity, lets run make again:
348
Compiling A C Program
[me@linuxbox diction-1.11]$ make
make: Nothing to be done for `all'.
It only produces this strange message. Whats going on? Why didnt it build the program
again? Ah, this is the magic of make. Rather than simply building everything again,
make only builds what needs building. With all of the targets present, make determined
that there was nothing to do. We can demonstrate this by deleting one of the targets and
running make again to see what it does. Lets get rid of one of the intermediate targets:
[me@linuxbox diction-1.11]$ rm getopt.o
[me@linuxbox diction-1.11]$ make
We see that make rebuilds it and re-links the diction and style programs, since they
depend on the missing module. This behavior also points out another important feature of
make: it keeps targets up to date. make insists that targets be newer than their dependencies. This makes perfect sense, as a programmer will often update a bit of source code
and then use make to build a new version of the finished product. make ensures that everything that needs building based on the updated code is built. If we use the touch program to update one of the source code files, we can see this happen:
[me@linuxbox
-rwxr-xr-x 1
-rw-r--r-- 1
[me@linuxbox
[me@linuxbox
-rwxr-xr-x 1
-rw-r--r-- 1
[me@linuxbox
diction-1.11]$
me
me
me
me
diction-1.11]$
diction-1.11]$
me
me
me
me
diction-1.11]$
ls -l diction getopt.c
37164 2009-03-05 06:14
33125 2007-03-30 17:45
touch getopt.c
ls -l diction getopt.c
37164 2009-03-05 06:14
33125 2009-03-05 06:23
make
diction
getopt.c
diction
getopt.c
After make runs, we see that it has restored the target to being newer than the dependency:
[me@linuxbox diction-1.11]$ ls -l diction getopt.c
-rwxr-xr-x 1 me
me
37164 2009-03-05 06:24 diction
-rw-r--r-- 1 me
me
33125 2009-03-05 06:23 getopt.c
The ability of make to intelligently build only what needs building is a great benefit to
programmers. While the time savings may not be very apparent with our small project, it
349
23 Compiling Programs
is very significant with larger projects. Remember, the Linux kernel (a program that undergoes continuous modification and improvement) contains several million lines of
code.
After we perform the installation, we can check that the program is ready to go:
[me@linuxbox diction-1.11]$ which diction
/usr/local/bin/diction
[me@linuxbox diction-1.11]$ man diction
Summing Up
In this chapter, we have seen how three simple commands:
./configure
make
make install
can be used to build many source code packages. We have also seen the important role
that make plays in the maintenance of programs. The make program can be used for any
task that needs to maintain a target/dependency relationship, not just for compiling source
code.
Further Reading
350
The Wikipedia has good articles on compilers and the make program:
http://en.wikipedia.org/wiki/Compiler
http://en.wikipedia.org/wiki/Make_(software)
Further Reading
351
353
The last line of our script is pretty familiar, just an echo command with a string argument. The second line is also familiar. It looks like a comment that we have seen used in
many of the configuration files we have examined and edited. One thing about comments
in shell scripts is that they may also appear at the ends of lines, like so:
echo 'Hello World!' # This is a comment too
Though comments are of little use on the command line, they will work.
The first line of our script is a little mysterious. It looks as if it should be a comment,
since it starts with #, but it looks too purposeful to be just that. The #! character sequence is, in fact, a special construct called a shebang. The shebang is used to tell the
system the name of the interpreter that should be used to execute the script that follows.
Every shell script should include this as its first line.
Lets save our script file as hello_world.
355
Executable Permissions
The next thing we have to do is make our script executable. This is easily done using
chmod:
[me@linuxbox
-rw-r--r-- 1
[me@linuxbox
[me@linuxbox
-rwxr-xr-x 1
~]$ ls -l hello_world
me
me
63 2009-03-07 10:10 hello_world
~]$ chmod 755 hello_world
~]$ ls -l hello_world
me
me
63 2009-03-07 10:10 hello_world
There are two common permission settings for scripts; 755 for scripts that everyone can
execute, and 700 for scripts that only the owner can execute. Note that scripts must be
readable in order to be executed.
In order for the script to run, we must precede the script name with an explicit path. If we
dont, we get this:
[me@linuxbox ~]$ hello_world
bash: hello_world: command not found
Why is this? What makes our script different from other programs? As it turns out, nothing. Our script is fine. Its location is the problem. Back in Chapter 11, we discussed the
PATH environment variable and its effect on how the system searches for executable programs. To recap, the system searches a list of directories each time it needs to find an executable program, if no explicit path is specified. This is how the system knows to execute
/bin/ls when we type ls at the command line. The /bin directory is one of the directories that the system automatically searches. The list of directories is held within an
environment variable named PATH. The PATH variable contains a colon-separated list of
directories to be searched. We can view the contents of PATH:
356
Here we see our list of directories. If our script were located in any of the directories in
the list, our problem would be solved. Notice the first directory in the list,
/home/me/bin. Most Linux distributions configure the PATH variable to contain a
bin directory in the users home directory, to allow users to execute their own programs.
So if we create the bin directory and place our script within it, it should start to work
like other programs:
[me@linuxbox ~]$ mkdir bin
[me@linuxbox ~]$ mv hello_world bin
[me@linuxbox ~]$ hello_world
Hello World!
And so it does.
If the PATH variable does not contain the directory, we can easily add it by including this
line in our .bashrc file:
export PATH=~/bin:"$PATH"
After this change is made, it will take effect in each new terminal session. To apply the
change to the current terminal session, we must have the shell re-read the .bashrc file.
This can be done by sourcing it:
[me@linuxbox ~]$ . .bashrc
The dot (.) command is a synonym for the source command, a shell builtin which
reads a specified file of shell commands and treats it like input from the keyboard.
Note: Ubuntu automatically adds the ~/bin directory to the PATH variable if the
~/bin directory exists when the users .bashrc file is executed. So, on Ubuntu
systems, if we create the ~/bin directory and then log out and log in again, everything works.
357
and:
[me@linuxbox ~]$ ls --all --directory
are equivalent commands. In the interests of reduced typing, short options are preferred
when entering options on the command line, but when writing scripts, long options can
provide improved readability.
358
Obviously, this command is a little hard to figure out at first glance. In a script, this command might be easier to understand if written this way:
find playground \
\( \
-type f \
-not -perm 0600 \
-exec chmod 0600 {} ; \
\) \
-or \
\( \
-type d \
-not -perm 0700 \
-exec chmod 0700 {} ; \
\)
359
turns on the option to highlight search results. Say we search for the word echo.
With this option on, each instance of the word will be highlighted.
:set tabstop=4
sets the number of columns occupied by a tab character. The default is 8 columns.
Setting the value to 4 (which is a common practice) allows long lines to fit more
easily on the screen.
:set autoindent
turns on the auto indent feature. This causes vim to indent a new line the same
amount as the line just typed. This speeds up typing on many kinds of programming constructs. To stop indentation, type Ctrl-d.
These changes can be made permanent by adding these commands (without the
leading colon characters) to your ~/.vimrc file.
Summing Up
In this first chapter of scripting, we have looked at how scripts are written and made to
easily execute on our system. We also saw how we may use various formatting techniques to improve the readability (and thus, the maintainability) of our scripts. In future
chapters, ease of maintenance will come up again and again as a central principle in good
script writing.
Further Reading
For Hello World programs and examples in various programming languages,
see:
http://en.wikipedia.org/wiki/Hello_world
360
25 Starting A Project
25 Starting A Project
Starting with this chapter, we will begin to build a program. The purpose of this project is
to see how various shell features are used to create programs and, more importantly, create good programs.
The program we will write is a report generator. It will present various statistics about
our system and its status, and will produce this report in HTML format, so we can view it
with a web browser such as Firefox or Chrome.
Programs are usually built up in a series of stages, with each stage adding features and
capabilities. The first stage of our program will produce a very minimal HTML page that
contains no system information. That will come later.
If we enter this into our text editor and save the file as foo.html, we can use the following URL in Firefox to view the file:
file:///home/username/foo.html
The first stage of our program will be able to output this HTML file to standard output.
We can write a program to do this pretty easily. Lets start our text editor and create a new
file named ~/bin/sys_info_page:
361
25 Starting A Project
[me@linuxbox ~]$ vim ~/bin/sys_info_page
"<HTML>"
"
<HEAD>"
"
<TITLE>Page Title</TITLE>"
"
</HEAD>"
"
<BODY>"
"
Page body."
"
</BODY>"
"</HTML>"
Our first attempt at this problem contains a shebang, a comment (always a good idea) and
a sequence of echo commands, one for each line of output. After saving the file, well
make it executable and attempt to run it:
[me@linuxbox ~]$ chmod 755 ~/bin/sys_info_page
[me@linuxbox ~]$ sys_info_page
When the program runs, we should see the text of the HTML document displayed on the
screen, since the echo commands in the script send their output to standard output. Well
run the program again and redirect the output of the program to the file
sys_info_page.html, so that we can view the result with a web browser:
[me@linuxbox ~]$ sys_info_page > sys_info_page.html
[me@linuxbox ~]$ firefox sys_info_page.html
So far, so good.
When writing programs, its always a good idea to strive for simplicity and clarity. Maintenance is easier when a program is easy to read and understand, not to mention that it
can make the program easier to write by reducing the amount of typing. Our current version of the program works fine, but it could be simpler. We could actually combine all the
echo commands into one, which will certainly make it easier to add more lines to the programs output. So, lets change our program to this:
362
A quoted string may include newlines, and therefore contain multiple lines of text. The
shell will keep reading the text until it encounters the closing quotation mark. It works
this way on the command line, too:
[me@linuxbox ~]$ echo "<HTML>
>
<HEAD>
>
<TITLE>Page Title</TITLE>
>
</HEAD>
>
<BODY>
>
Page body.
>
</BODY>
> </HTML>"
The leading > character is the shell prompt contained in the PS2 shell variable. It appears whenever we type a multi-line statement into the shell. This feature is a little obscure right now, but later, when we cover multi-line programming statements, it will turn
out to be quite handy.
363
25 Starting A Project
</HEAD>
<BODY>
</BODY>
</HTML>"
<TITLE>$title</TITLE>
<H1>$title</H1>
</BODY>
</HTML>"
By creating a variable named title and assigning it the value System Information Report, we can take advantage of parameter expansion and place the string in multiple locations.
So, how do we create a variable? Simple, we just use it. When the shell encounters a variable, it automatically creates it. This differs from many programming languages in which
variables must be explicitly declared or defined before use. The shell is very lax about
this, which can lead to some problems. For example, consider this scenario played out on
the command line:
364
We first assign the value yes to the variable foo, and then display its value with echo.
Next we display the value of the variable name misspelled as fool and get a blank re sult. This is because the shell happily created the variable fool when it encountered it,
and gave it the default value of nothing, or empty. From this, we learn that we must pay
close attention to our spelling! Its also important to understand what really happened in
this example. From our previous look at how the shell performs expansions, we know
that the command:
[me@linuxbox ~]$ echo $foo
expands into:
[me@linuxbox ~]$ echo
The empty variable expands into nothing! This can play havoc with commands that require arguments. Heres an example:
[me@linuxbox ~]$ foo=foo.txt
[me@linuxbox ~]$ foo1=foo1.txt
[me@linuxbox ~]$ cp $foo $fool
cp: missing destination file operand after `foo.txt'
365
25 Starting A Project
Try `cp --help' for more information.
We assign values to two variables, foo and foo1. We then perform a cp, but misspell
the name of the second argument. After expansion, the cp command is only sent one argument, though it requires two.
There are some rules about variable names:
1. Variable names may consist of alphanumeric characters (letters and numbers) and
underscore characters.
2. The first character of a variable name must be either a letter or an underscore.
3. Spaces and punctuation symbols are not allowed.
The word variable implies a value that changes, and in many applications, variables are
used this way. However, the variable in our application, title, is used as a constant. A
constant is just like a variable in that it has a name and contains a value. The difference is
that the value of a constant does not change. In an application that performs geometric
calculations, we might define PI as a constant, and assign it the value of 3.1415, instead of using the number literally throughout our program. The shell makes no distinction between variables and constants; they are mostly for the programmers convenience.
A common convention is to use uppercase letters to designate constants and lower case
letters for true variables. We will modify our script to comply with this convention:
#!/bin/bash
# Program to output a system information page
TITLE="System Information Report For $HOSTNAME"
echo "<HTML>
<HEAD>
</HEAD>
<BODY>
<TITLE>$TITLE</TITLE>
<H1>$TITLE</H1>
</BODY>
</HTML>"
We also took the opportunity to jazz up our title by adding the value of the shell variable
HOSTNAME. This is the network name of the machine.
366
#
#
#
#
#
#
#
During expansion, variable names may be surrounded by optional curly braces {}. This
is useful in cases where a variable name becomes ambiguous due to its surrounding con367
25 Starting A Project
text. Here, we try to change the name of a file from myfile to myfile1, using a variable:
[me@linuxbox ~]$ filename="myfile"
[me@linuxbox ~]$ touch $filename
[me@linuxbox ~]$ mv $filename $filename1
mv: missing destination file operand after `myfile'
Try `mv --help' for more information.
This attempt fails because the shell interprets the second argument of the mv command as
a new (and empty) variable. The problem can be overcome this way:
[me@linuxbox ~]$ mv $filename ${filename}1
By adding the surrounding braces, the shell no longer interprets the trailing 1 as part of
the variable name.
Well take this opportunity to add some data to our report, namely the date and time the
report was created and the username of the creator:
#!/bin/bash
# Program to output a system information page
TITLE="System Information Report For $HOSTNAME"
CURRENT_TIME=$(date +"%x %r %Z")
TIMESTAMP="Generated $CURRENT_TIME, by $USER"
echo "<HTML>
<HEAD>
<TITLE>$TITLE</TITLE>
</HEAD>
<BODY>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
</BODY>
</HTML>"
Here Documents
Weve looked at two different methods of outputting our text, both using the echo com368
Here Documents
mand. There is a third way called a here document or here script. A here document is an
additional form of I/O redirection in which we embed a body of text into our script and
feed it into the standard input of a command. It works like this:
command << token
text
token
where command is the name of command that accepts standard input and token is a string
used to indicate the end of the embedded text. Well modify our script to use a here document:
#!/bin/bash
# Program to output a system information page
TITLE="System Information Report For $HOSTNAME"
CURRENT_TIME=$(date +"%x %r %Z")
TIMESTAMP="Generated $CURRENT_TIME, by $USER"
cat << _EOF_
<HTML>
<HEAD>
</HEAD>
<BODY>
</BODY>
<TITLE>$TITLE</TITLE>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
</HTML>
_EOF_
Instead of using echo, our script now uses cat and a here document. The string _EOF_
(meaning End Of File, a common convention) was selected as the token, and marks the
end of the embedded text. Note that the token must appear alone and that there must not
be trailing spaces on the line.
So whats the advantage of using a here document? Its mostly the same as echo, except
that, by default, single and double quotes within here documents lose their special meaning to the shell. Here is a command line example:
[me@linuxbox ~]$ foo="some text"
[me@linuxbox ~]$ cat << _EOF_
> $foo
369
25 Starting A Project
> "$foo"
> '$foo'
> \$foo
> _EOF_
some text
"some text"
'some text'
$foo
As we can see, the shell pays no attention to the quotation marks. It treats them as ordinary characters. This allows us to embed quotes freely within a here document. This
could turn out to be handy for our report program.
Here documents can be used with any command that accepts standard input. In this example, we use a here document to pass a series of commands to the ftp program in order to retrieve a file from a remote FTP server:
#!/bin/bash
# Script to retrieve a file via FTP
FTP_SERVER=ftp.nl.debian.org
FTP_PATH=/debian/dists/lenny/main/installer-i386/current/images/cdrom
REMOTE_FILE=debian-cd_info.tar.gz
ftp -n << _EOF_
open $FTP_SERVER
user anonymous me@linuxbox
cd $FTP_PATH
hash
get $REMOTE_FILE
bye
_EOF_
ls -l $REMOTE_FILE
If we change the redirection operator from << to <<-, the shell will ignore leading
tab characters in the here document. This allows a here document to be indented, which
can improve readability:
#!/bin/bash
# Script to retrieve a file via FTP
FTP_SERVER=ftp.nl.debian.org
370
Here Documents
FTP_PATH=/debian/dists/lenny/main/installer-i386/current/images/cdrom
REMOTE_FILE=debian-cd_info.tar.gz
ftp -n <<- _EOF_
open $FTP_SERVER
user anonymous me@linuxbox
cd $FTP_PATH
hash
get $REMOTE_FILE
bye
_EOF_
ls -l $REMOTE_FILE
Summing Up
In this chapter, we started a project that will carry us through the process of building a
successful script. We introduced the concept of variables and constants and how they can
be employed. They are the first of many applications we will find for parameter expansion. We also looked at how to produce output from our script, and various methods for
embedding blocks of text.
Further Reading
For more information about HTML, see the following articles and tutorials:
http://en.wikipedia.org/wiki/Html
http://en.wikibooks.org/wiki/HTML_Programming
http://html.net/tutorials/html/
The bash man page includes a section entitled HERE DOCUMENTS, which
has a full description of this feature.
371
26 Top-Down Design
26 Top-Down Design
As programs get larger and more complex, they become more difficult to design, code
and maintain. As with any large project, it is often a good idea to break large, complex
tasks into a series of small, simple tasks. Lets imagine that we are trying to describe a
common, everyday task, going to the market to buy food, to a person from Mars. We
might describe the overall process as the following series of steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Get in car.
Drive to market.
Park car.
Enter market.
Purchase food.
Return to car.
Drive home.
Park car.
Enter house.
However, a person from Mars is likely to need more detail. We could further break down
the subtask Park car into this series of steps:
1.
2.
3.
4.
5.
6.
The Turn off motor subtask could further be broken down into steps including Turn
off ignition, Remove ignition key, and so on, until every step of the entire process of
going to the market has been fully defined.
This process of identifying the top-level steps and developing increasingly detailed views
of those steps is called top-down design. This technique allows us to break large complex
tasks into many small, simple tasks. Top-down design is a common method of designing
372
26 Top-Down Design
programs and one that is well suited to shell programming in particular.
In this chapter, we will use top-down design to further develop our report-generator
script.
Shell Functions
Our script currently performs the following steps to generate the HTML document:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Open page.
Open page header.
Set page title.
Close page header.
Open page body.
Output page heading.
Output timestamp.
Close page body.
Close page.
For our next stage of development, we will add some tasks between steps 7 and 8. These
will include:
System uptime and load. This is the amount of time since the last shutdown or reboot and the average number of tasks currently running on the processor over several time intervals.
Disk space. The overall use of space on the systems storage devices.
Home space. The amount of storage space being used by each user.
If we had a command for each of these tasks, we could add them to our script simply
through command substitution:
#!/bin/bash
# Program to output a system information page
TITLE="System Information Report For $HOSTNAME"
CURRENT_TIME=$(date +"%x %r %Z")
TIMESTAMP="Generated $CURRENT_TIME, by $USER"
cat << _EOF_
<HTML>
<HEAD>
373
26 Top-Down Design
<TITLE>$TITLE</TITLE>
</HEAD>
<BODY>
</BODY>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
$(report_uptime)
$(report_disk_space)
$(report_home_space)
</HTML>
_EOF_
We could create these additional commands two ways. We could write three separate
scripts and place them in a directory listed in our PATH, or we could embed the scripts
within our program as shell functions. As we have mentioned before, shell functions are
mini-scripts that are located inside other scripts and can act as autonomous programs.
Shell functions have two syntactic forms:
function name {
commands
return
}
and
name () {
commands
return
}
where name is the name of the function and commands is a series of commands contained
within the function. Both forms are equivalent and may be used interchangeably. Below
we see a script that demonstrates the use of a shell function:
1
2
3
4
5
6
7
8
9
10
11
12
374
#!/bin/bash
# Shell function demo
function funct {
echo "Step 2"
return
}
# Main program starts here
echo "Step 1"
Shell Functions
13
14
funct
echo "Step 3"
As the shell reads the script, it passes over lines 1 through 11, as those lines consist of
comments and the function definition. Execution begins at line 12, with an echo command. Line 13 calls the shell function funct and the shell executes the function just as
it would any other command. Program control then moves to line 6, and the second echo
command is executed. Line 7 is executed next. Its return command terminates the
function and returns control to the program at the line following the function call (line
14), and the final echo command is executed. Note that in order for function calls to be
recognized as shell functions and not interpreted as the names of external programs, shell
function definitions must appear in the script before they are called.
Well add minimal shell function definitions to our script:
#!/bin/bash
# Program to output a system information page
TITLE="System Information Report For $HOSTNAME"
CURRENT_TIME=$(date +"%x %r %Z")
TIMESTAMP="Generated $CURRENT_TIME, by $USER"
report_uptime () {
return
}
report_disk_space () {
return
}
report_home_space () {
return
}
cat << _EOF_
<HTML>
<HEAD>
<TITLE>$TITLE</TITLE>
</HEAD>
<BODY>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
$(report_uptime)
$(report_disk_space)
$(report_home_space)
375
26 Top-Down Design
</BODY>
</HTML>
_EOF_
Shell function names follow the same rules as variables. A function must contain at least
one command. The return command (which is optional) satisfies the requirement.
Local Variables
In the scripts we have written so far, all the variables (including constants) have been
global variables. Global variables maintain their existence throughout the program. This
is fine for many things, but it can sometimes complicate the use of shell functions. Inside
shell functions, it is often desirable to have local variables. Local variables are only accessible within the shell function in which they are defined and cease to exist once the
shell function terminates.
Having local variables allows the programmer to use variables with names that may already exist, either in the script globally or in other shell functions, without having to
worry about potential name conflicts.
Here is an example script that demonstrates how local variables are defined and used:
#!/bin/bash
# local-vars: script to demonstrate local variables
foo=0
funct_1 () {
local foo
foo=1
echo "funct_1: foo = $foo"
}
funct_2 () {
local foo
foo=2
echo "funct_2: foo = $foo"
}
echo "global:
funct_1
376
foo = $foo"
Local Variables
echo "global:
funct_2
echo "global:
foo = $foo"
foo = $foo"
As we can see, local variables are defined by preceding the variable name with the word
local. This creates a variable that is local to the shell function in which it is defined.
Once outside the shell function, the variable no longer exists. When we run this script, we
see the results:
[me@linuxbox
global: foo
funct_1: foo
global: foo
funct_2: foo
global: foo
~]$ local-vars
= 0
= 1
= 0
= 2
= 0
We see that the assignment of values to the local variable foo within both shell functions
has no effect on the value of foo defined outside the functions.
This feature allows shell functions to be written so that they remain independent of each
other and of the script in which they appear. This is very valuable, as it helps prevent one
part of a program from interfering with another. It also allows shell functions to be written so that they can be portable. That is, they may be cut and pasted from script to script,
as needed.
377
26 Top-Down Design
</HEAD>
<BODY>
<H1>System Information Report For linuxbox</H1>
<P>Generated 03/19/2009 04:02:10 PM EDT, by me</P>
</BODY>
</HTML>
we see that there are some blank lines in our output after the timestamp, but we cant be
sure of the cause. If we change the functions to include some feedback:
report_uptime () {
echo "Function report_uptime executed."
return
}
report_disk_space () {
echo "Function report_disk_space executed."
return
}
report_home_space () {
echo "Function report_home_space executed."
return
}
378
Its pretty straightforward. We use a here document to output a section header and the
output of the uptime command, surrounded by <PRE> tags to preserve the formatting
of the command. The report_disk_space function is similar:
report_disk_space () {
cat <<- _EOF_
<H2>Disk Space Utilization</H2>
<PRE>$(df -h)</PRE>
_EOF_
return
}
This function uses the df -h command to determine the amount of disk space. Lastly,
well build the report_home_space function:
report_home_space () {
cat <<- _EOF_
<H2>Home Space Utilization</H2>
<PRE>$(du -sh /home/*)</PRE>
_EOF_
return
}
We use the du command with the -sh options to perform this task. This, however, is not
a complete solution to the problem. While it will work on some systems (Ubuntu, for example), it will not work on others. The reason is that many systems set the permissions of
home directories to prevent them from being world-readable, which is a reasonable security measure. On these systems, the report_home_space function, as written, will
379
26 Top-Down Design
only work if our script is run with superuser privileges. A better solution would be to
have the script adjust its behavior according to the privileges of the user. We will take this
up in the next chapter.
Summing Up
In this chapter, we have introduced a common method of program design called topdown design, and we have seen how shell functions are used to build the stepwise refinement that it requires. We have also seen how local variables can be used to make shell
functions independent from one another and from the program in which they are placed.
This makes it possible for shell functions to be written in a portable manner and to be reusable by allowing them to be placed in multiple programs; a great time saver.
Further Reading
380
The Wikipedia has many articles on software design philosophy. Here are a couple of good ones:
http://en.wikipedia.org/wiki/Top-down_design
http://en.wikipedia.org/wiki/Subroutines
if
Using the shell, we can code the logic above as follows:
x=5
if [ $x -eq 5 ]; then
echo "x equals 5."
else
echo "x does not equal 5."
fi
381
In this example, we execute the command twice. Once, with the value of x set to 5,
which results in the string equals 5 being output, and the second time with the value of
x set to 0, which results in the string does not equal 5 being output.
The if statement has the following syntax:
if commands; then
commands
[elif commands; then
commands...]
[else
commands]
fi
where commands is a list of commands. This is a little confusing at first glance. But before we can clear this up, we have to look at how the shell evaluates the success or failure
of a command.
Exit Status
Commands (including the scripts and shell functions we write) issue a value to the system
when they terminate, called an exit status. This value, which is an integer in the range of
0 to 255, indicates the success or failure of the commands execution. By convention, a
value of zero indicates success and any other value indicates failure. The shell provides a
parameter that we can use to examine the exit status. Here we see it in action:
[me@linuxbox ~]$ ls -d /usr/bin
/usr/bin
[me@linuxbox ~]$ echo $?
0
[me@linuxbox ~]$ ls -d /bin/usr
ls: cannot access /bin/usr: No such file or directory
[me@linuxbox ~]$ echo $?
2
382
Exit Status
In this example, we execute the ls command twice. The first time, the command executes successfully. If we display the value of the parameter $?, we see that it is zero. We
execute the ls command a second time, producing an error, and examine the parameter
$? again. This time it contains a 2, indicating that the command encountered an error.
Some commands use different exit status values to provide diagnostics for errors, while
many commands simply exit with a value of one when they fail. Man pages often include
a section entitled Exit Status, describing what codes are used. However, a zero always
indicates success.
The shell provides two extremely simple builtin commands that do nothing except terminate with either a zero or one exit status. The true command always executes successfully and the false command always executes unsuccessfully:
[me@linuxbox
[me@linuxbox
0
[me@linuxbox
[me@linuxbox
1
~]$ true
~]$ echo $?
~]$ false
~]$ echo $?
We can use these commands to see how the if statement works. What the if statement
really does is evaluate the success or failure of commands:
[me@linuxbox ~]$ if true; then echo "It's true."; fi
It's true.
[me@linuxbox ~]$ if false; then echo "It's true."; fi
[me@linuxbox ~]$
The command echo "It's true." is executed when the command following if executes successfully, and is not executed when the command following if does not execute
successfully. If a list of commands follows if, the last command in the list is evaluated:
[me@linuxbox ~]$ if false; true; then echo "It's true."; fi
It's true.
[me@linuxbox ~]$ if true; false; then echo "It's true."; fi
[me@linuxbox ~]$
383
test
By far, the command used most frequently with if is test. The test command performs a variety of checks and comparisons. It has two equivalent forms:
test expression
and the more popular:
[ expression ]
where expression is an expression that is evaluated as either true or false. The test command returns an exit status of zero when the expression is true and a status of one when
the expression is false.
File Expressions
The following expressions are used to evaluate the status of files:
Table 27-1: test File Expressions
Expression
file1 -ef file2
Is True If:
-b file
-c file
-d file
-e file
file exists.
-f file
-g file
-G file
-k file
-L file
-O file
-p file
-r file
384
file1 and file2 have the same inode numbers (the two
filenames refer to the same file by hard linking).
test
the effective user).
-s file
-S file
-t fd
-u file
-w file
-x file
385
386
test
fi
}
String Expressions
The following expressions are used to evaluate strings:
Table 27-2: test String Expressions
Expression
string
Is True If...
-n string
-z string
string1 = string2
string1 == string2
string1 != string2
Warning: the > and < expression operators must be quoted (or escaped with a
backslash) when used with test. If they are not, they will be interpreted by the
shell as redirection operators, with potentially destructive results. Also note that
while the bash documentation states that the sorting order conforms to the collation order of the current locale, it does not. ASCII (POSIX) order is used in versions of bash up to and including 4.0.
Here is a script that incorporates string expressions:
#!/bin/bash
# test-string: evaluate the value of a string
ANSWER=maybe
if [ -z "$ANSWER" ]; then
387
fi
In this script, we evaluate the constant ANSWER. We first determine if the string is empty.
If it is, we terminate the script and set the exit status to one. Notice the redirection that is
applied to the echo command. This redirects the error message There is no answer. to
standard error, which is the proper thing to do with error messages. If the string is not
empty, we evaluate the value of the string to see if it is equal to either yes, no, or
maybe. We do this by using elif, which is short for else if. By using elif, we are
able to construct a more complex logical test.
Integer Expressions
The following expressions are used with integers:
Table 27-3: test Integer Expressions
Expression
integer1 -eq integer2
Is True If...
388
test
The interesting part of the script is how it determines whether an integer is even or odd.
By performing a modulo 2 operation on the number, which divides the number by two
and returns the remainder, it can tell if the number is odd or even.
By applying the regular expression, we are able to limit the value of INT to only strings
that begin with an optional minus sign, followed by one or more numerals. This expression also eliminates the possibility of empty values.
Another added feature of [[ ]] is that the == operator supports pattern matching the
same way pathname expansion does. For example:
[me@linuxbox ~]$ FILE=foo.bar
[me@linuxbox ~]$ if [[ $FILE == foo.* ]]; then
> echo "$FILE matches pattern 'foo.*'"
> fi
foo.bar matches pattern 'foo.*'
390
Using (( )), we can slightly simplify the test-integer2 script like this:
#!/bin/bash
# test-integer2a: evaluate the value of an integer.
INT=-5
if [[ "$INT" =~ ^-?[0-9]+$ ]]; then
if ((INT == 0)); then
echo "INT is zero."
else
if ((INT < 0)); then
echo "INT is negative."
else
echo "INT is positive."
fi
if (( ((INT % 2)) == 0)); then
echo "INT is even."
else
echo "INT is odd."
fi
fi
else
echo "INT is not an integer." >&2
exit 1
fi
Notice that we use less-than and greater-than signs and that == is used to test for equivalence. This is a more natural-looking syntax for working with integers. Notice too, that
because the compound command (( )) is part of the shell syntax rather than an ordi391
Combining Expressions
Its also possible to combine expressions to create more complex evaluations. Expressions are combined by using logical operators. We saw these in Chapter 17, when we
learned about the find command. There are three logical operations for test and
[[ ]]. They are AND, OR and NOT. test and [[ ]] use different operators to represent these operations :
Table 27-4: Logical Operators
Operation
test
[[ ]] and (( ))
AND
-a
&&
OR
-o
||
NOT
392
Combining Expressions
In this script, we determine if the value of integer INT lies between the values of
MIN_VAL and MAX_VAL. This is performed by a single use of [[ ]], which includes
two expressions separated by the && operator. We could have also coded this using
test:
if [ $INT -ge $MIN_VAL -a $INT -le $MAX_VAL ]; then
echo "$INT is within $MIN_VAL to $MAX_VAL."
else
echo "$INT is out of range."
fi
The ! negation operator reverses the outcome of an expression. It returns true if an expression is false, and it returns false if an expression is true. In the following script, we
modify the logic of our evaluation to find values of INT that are outside the specified
range:
#!/bin/bash
# test-integer4: determine if an integer is outside a
# specified range of values.
MIN_VAL=1
MAX_VAL=100
INT=50
if [[ "$INT" =~ ^-?[0-9]+$ ]]; then
if [[ ! (INT -ge MIN_VAL && INT -le MAX_VAL) ]]; then
echo "$INT is outside $MIN_VAL to $MAX_VAL."
else
echo "$INT is in range."
fi
else
echo "INT is not an integer." >&2
exit 1
fi
We also include parentheses around the expression, for grouping. If these were not included, the negation would only apply to the first expression and not the combination of
the two. Coding this with test would be done this way:
if [ ! \( $INT -ge $MIN_VAL -a $INT -le $MAX_VAL \) ]; then
393
fi
Since all expressions and operators used by test are treated as command arguments by
the shell (unlike [[ ]] and (( )) ), characters which have special meaning to bash,
such as <, >, (, and ), must be quoted or escaped.
Seeing that test and [[ ]] do roughly the same thing, which is preferable? test is
traditional (and part of POSIX), whereas [[ ]] is specific to bash. Its important to
know how to use test, since it is very widely used, but [[ ]] is clearly more useful
and is easier to code.
This will create a directory named temp, and if it succeeds, the current working directory
will be changed to temp. The second command is attempted only if the mkdir command is successful. Likewise, a command like this:
[me@linuxbox ~]$ [ -d temp ] || mkdir temp
will test for the existence of the directory temp, and only if the test fails, will the directory be created. This type of construct is very handy for handling errors in scripts, a subject we will discuss more in later chapters. For example, we could do this in a script:
[ -d temp ] || exit 1
If the script requires the directory temp, and it does not exist, then the script will terminate with an exit status of one.
Summing Up
We started this chapter with a question. How could we make our sys_info_page
script detect if the user had permission to read all the home directories? With our knowledge of if, we can solve the problem by adding this code to the
report_home_space function:
report_home_space () {
if [[ $(id -u) -eq 0 ]]; then
395
else
fi
return
}
We evaluate the output of the id command. With the -u option, id outputs the numeric
user ID number of the effective user. The superuser is always zero and every other user is
a number greater than zero. Knowing this, we can construct two different here documents, one taking advantage of superuser privileges, and the other, restricted to the users
own home directory.
We are going to take a break from the sys_info_page program, but dont worry. It
will be back. In the meantime, well cover some topics that well need when we resume
our work.
Further Reading
There are several sections of the bash man page that provide further detail on the topics
covered in this chapter:
CONDITIONAL EXPRESSIONS
396
Each time we want to change the value of INT, we have to edit the script. It would be
much more useful if the script could ask the user for a value. In this chapter, we will begin to look at how we can add interactivity to our programs.
397
We use echo with the -n option (which suppresses the trailing newline on output) to
display a prompt, and then use read to input a value for the variable int. Running this
script results in this:
398
"var1
"var2
"var3
"var4
"var5
=
=
=
=
=
'$var1'"
'$var2'"
'$var3'"
'$var4'"
'$var5'"
In this script, we assign and display up to five values. Notice how read behaves when
given different numbers of values:
[me@linuxbox ~]$ read-multiple
Enter one or more values > a b c d e
var1 = 'a'
var2 = 'b'
var3 = 'c'
var4 = 'd'
var5 = 'e'
[me@linuxbox ~]$ read-multiple
Enter one or more values > a
var1 = 'a'
var2 = ''
var3 = ''
var4 = ''
var5 = ''
[me@linuxbox ~]$ read-multiple
Enter one or more values > a b c d e f g
var1 = 'a'
var2 = 'b'
var3 = 'c'
var4 = 'd'
var5 = 'e f g'
399
Options
read supports the following options:
Table 28-1: read Options
Option
-a array
Description
-d delimiter
-e
-i string
-n num
-p prompt
400
-s
-t seconds
-u fd
Using the various options, we can do interesting things with read. For example, with the
-p option, we can provide a prompt string:
#!/bin/bash
# read-single: read multiple values into default variable
read -p "Enter one or more values > "
echo "REPLY = '$REPLY'"
With the -t and -s options we can write a script that reads secret input and times out
if the input is not completed in a specified time:
#!/bin/bash
# read-secret: input a secret passphrase
if read -t 10 -sp "Enter secret passphrase > " secret_pass; then
echo -e "\nSecret passphrase = '$secret_pass'"
else
echo -e "\nInput timed out" >&2
exit 1
fi
The script prompts the user for a secret passphrase and waits 10 seconds for input. If the
entry is not completed within the specified time, the script exits with an error. Since the
-s option is included, the characters of the passphrase are not echoed to the display as
they are typed.
401
In this script, we prompt the user to enter his/her user name and use the environment variable USER to provide a default value. When the script is run it displays the default string
and if the user simply presses the Enter key, read will assign the default string to the
REPLY variable.
[me@linuxbox ~]$ read-default
What is your user name? me
You answered: 'me'
IFS
Normally, the shell performs word splitting on the input provided to read. As we have
seen, this means that multiple words separated by one or more spaces become separate
items on the input line, and are assigned to separate variables by read. This behavior is
configured by a shell variable named IFS (for Internal Field Separator). The default
value of IFS contains a space, a tab, and a newline character, each of which will separate
items from one another.
We can adjust the value of IFS to control the separation of fields input to read. For example, the /etc/passwd file contains lines of data that use the colon character as a
field separator. By changing the value of IFS to a single colon, we can use read to input
the contents of /etc/passwd and successfully separate fields into different variables.
Here we have a script that does just that:
#!/bin/bash
# read-ifs: read fields from a file
FILE=/etc/passwd
402
This script prompts the user to enter the username of an account on the system, then displays the different fields found in the users record in the /etc/passwd file. The script
contains two interesting lines. The first is:
file_info=$(grep "^$user_name:" $FILE)
This line assigns the results of a grep command to the variable file_info. The regular expression used by grep assures that the username will only match a single line in
the /etc/passwd file.
The second interesting line is this one:
IFS=":" read user pw uid gid name home shell <<< "$file_info"
The line consists of three parts: a variable assignment, a read command with a list of
variable names as arguments, and a strange new redirection operator. Well look at the
variable assignment first.
The shell allows one or more variable assignments to take place immediately before a
command. These assignments alter the environment for the command that follows. The
effect of the assignment is temporary; only changing the environment for the duration of
the command. In our case, the value of IFS is changed to a colon character. Alternately,
we could have coded it this way:
OLD_IFS="$IFS"
IFS=":"
read user pw uid gid name home shell <<< "$file_info"
IFS="$OLD_IFS"
where we store the value of IFS, assign a new value, perform the read command, and
then restore IFS to its original value. Clearly, placing the variable assignment in front of
403
We would expect this to work, but it does not. The command will appear to succeed but the REPLY variable will always be empty. Why is this?
The explanation has to do with the way the shell handles pipelines. In bash (and
other shells such as sh), pipelines create subshells. These are copies of the shell
and its environment which are used to execute the command in the pipeline. In
our example above, read is executed in a subshell.
Subshells in Unix-like systems create copies of the environment for the processes
to use while they execute. When the processes finishes the copy of the environment is destroyed. This means that a subshell can never alter the environment of
its parent process. read assigns variables, which then become part of the environment. In the example above, read assigns the value foo to the variable REPLY in its subshells environment, but when the command exits, the subshell and
its environment are destroyed, and the effect of the assignment is lost.
Using here strings is one way to work around this behavior. Another method is
discussed in Chapter 36.
Validating Input
With our new ability to have keyboard input comes an additional programming challenge,
validating input. Very often the difference between a well-written program and a poorly
written one lies in the programs ability to deal with the unexpected. Frequently, the unexpected appears in the form of bad input. Weve done a little of this with our evaluation
programs in the previous chapter, where we checked the values of integers and screened
out empty values and non-numeric characters. It is important to perform these kinds of
programming checks every time a program receives input, to guard against invalid data.
This is especially important for programs that are shared by multiple users. Omitting
404
Validating Input
these safeguards in the interests of economy might be excused if a program is to be used
once and only by the author to perform some special task. Even then, if the program performs dangerous tasks such as deleting files, it would be wise to include data validation,
just in case.
Here we have an example program that validates various kinds of input:
#!/bin/bash
# read-validate: validate input
invalid_input () {
echo "Invalid input '$REPLY'" >&2
exit 1
}
read -p "Enter a single item > "
# input is empty (invalid)
[[ -z $REPLY ]] && invalid_input
# input is multiple items (invalid)
(( $(echo $REPLY | wc -w) > 1 )) && invalid_input
# is input a valid filename?
if [[ $REPLY =~ ^[-[:alnum:]\._]+$ ]]; then
echo "'$REPLY' is a valid filename."
if [[ -e $REPLY ]]; then
echo "And file '$REPLY' exists."
else
echo "However, file '$REPLY' does not exist."
fi
# is input a floating point number?
if [[ $REPLY =~ ^-?[[:digit:]]*\.[[:digit:]]+$ ]]; then
echo "'$REPLY' is a floating point number."
else
echo "'$REPLY' is not a floating point number."
fi
# is input an integer?
if [[ $REPLY =~ ^-?[[:digit:]]+$ ]]; then
echo "'$REPLY' is an integer."
else
echo "'$REPLY' is not an integer."
fi
else
fi
405
Menus
A common type of interactivity is called menu-driven. In menu-driven programs, the user
is presented with a list of choices and is asked to choose one. For example, we could
imagine a program that presented the following:
Please Select:
1.
2.
3.
0.
Using what we learned from writing our sys_info_page program, we can construct a
menu-driven program to perform the tasks on the above menu:
#!/bin/bash
# read-menu: a menu driven system information program
clear
echo "
Please Select:
1. Display System Information
2. Display Disk Space
3. Display Home Space Utilization
0. Quit
"
read -p "Enter selection [0-3] > "
if [[ $REPLY =~ ^[0-3]$ ]]; then
if [[ $REPLY == 0 ]]; then
echo "Program terminated."
exit
fi
if [[ $REPLY == 1 ]]; then
406
Menus
echo "Hostname: $HOSTNAME"
uptime
exit
else
fi
if [[ $REPLY == 2 ]]; then
df -h
exit
fi
if [[ $REPLY == 3 ]]; then
if [[ $(id -u) -eq 0 ]]; then
echo "Home Space Utilization (All Users)"
du -sh /home/*
else
echo "Home Space Utilization ($USER)"
du -sh $HOME
fi
exit
fi
echo "Invalid entry." >&2
exit 1
fi
This script is logically divided into two parts. The first part displays the menu and inputs
the response from the user. The second part identifies the response and carries out the selected action. Notice the use of the exit command in this script. It is used here to prevent the script from executing unnecessary code after an action has been carried out. The
presence of multiple exit points in a program is generally a bad idea (it makes program
logic harder to understand), but it works in this script.
Summing Up
In this chapter, we took our first steps toward interactivity; allowing users to input data
into our programs via the keyboard. Using the techniques presented thus far, it is possible
to write many useful programs, such as specialized calculation programs and easy-to-use
front-ends for arcane command line tools. In the next chapter, we will build on the menudriven program concept to make it even better.
Extra Credit
It is important to study the programs in this chapter carefully and have a complete understanding of the way they are logically structured, as the programs to come will be increasingly complex. As an exercise, rewrite the programs in this chapter using the test command rather than the [[ ]] compound command. Hint: Use grep to evaluate the regular expressions and evaluate the exit status. This will be good practice.
407
Further Reading
408
The Bash Reference Manual contains a chapter on builtins, which includes the
read command:
http://www.gnu.org/software/bash/manual/bashref.html#Bash-Builtins
Looping
Daily life is full of repeated activities. Going to work each day, walking the dog, slicing a
carrot are all tasks that involve repeating a series of steps. Lets consider slicing a carrot.
If we express this activity in pseudocode, it might look something like this:
1. get cutting board
2. get knife
3. place carrot on cutting board
4. lift knife
5. advance carrot
6. slice carrot
7. if entire carrot sliced, then quit, else go to step 4
Steps 4 through 7 form a loop. The actions within the loop are repeated until the condition, entire carrot sliced, is reached.
while
bash can express a similar idea. Lets say we wanted to display five numbers in sequen409
410
Looping
DELAY=3 # Number of seconds to display results
while [[ $REPLY != 0 ]]; do
clear
cat <<- _EOF_
Please Select:
1.
2.
3.
0.
_EOF_
read -p "Enter selection [0-3] > "
if [[ $REPLY =~ ^[0-3]$ ]]; then
if [[ $REPLY == 1 ]]; then
echo "Hostname: $HOSTNAME"
uptime
sleep $DELAY
fi
if [[ $REPLY == 2 ]]; then
df -h
sleep $DELAY
fi
if [[ $REPLY == 3 ]]; then
if [[ $(id -u) -eq 0 ]]; then
echo "Home Space Utilization (All Users)"
du -sh /home/*
else
echo "Home Space Utilization ($USER)"
du -sh $HOME
fi
sleep $DELAY
fi
else
echo "Invalid entry."
sleep $DELAY
fi
done
echo "Program terminated."
By enclosing the menu in a while loop, we are able to have the program repeat the menu
display after each selection. The loop continues as long as REPLY is not equal to 0 and
the menu is displayed again, giving the user the opportunity to make another selection. At
the end of each action, a sleep command is executed so the program will pause for a
few seconds to allow the results of the selection to be seen before the screen is cleared
and the menu is redisplayed. Once REPLY is equal to 0, indicating the quit selection,
411
_EOF_
read -p "Enter selection [0-3] > "
if [[ $REPLY =~ ^[0-3]$ ]]; then
if [[ $REPLY == 1 ]]; then
echo "Hostname: $HOSTNAME"
uptime
sleep $DELAY
continue
fi
if [[ $REPLY == 2 ]]; then
df -h
sleep $DELAY
continue
fi
if [[ $REPLY == 3 ]]; then
if [[ $(id -u) -eq 0 ]]; then
echo "Home Space Utilization (All Users)"
du -sh /home/*
else
412
fi
sleep $DELAY
continue
fi
if [[ $REPLY == 0 ]]; then
break
fi
else
fi
done
echo "Program terminated."
In this version of the script, we set up an endless loop (one that never terminates on its
own) by using the true command to supply an exit status to while. Since true will
always exit with a exit status of zero, the loop will never end. This is a surprisingly common scripting technique. Since the loop will never end on its own, its up to the programmer to provide some way to break out of the loop when the time is right. In this script, the
break command is used to exit the loop when the 0 selection is chosen. The continue command has been included at the end of the other script choices to allow for
more efficient execution. By using continue, the script will skip over code that is not
needed when a selection is identified. For example, if the 1 selection is chosen and
identified, there is no reason to test for the other selections.
until
The until command is much like while, except instead of exiting a loop when a nonzero exit status is encountered, it does the opposite. An until loop continues until it receives a zero exit status. In our while-count script, we continued the loop as long as
the value of the count variable was less than or equal to 5. We could get the same result
by coding the script with until:
#!/bin/bash
# until-count: display a series of numbers
count=1
until [[ $count -gt 5 ]]; do
echo $count
413
By changing the test expression to $count -gt 5, until will terminate the loop at
the correct time. The decision of whether to use the while or until loop is usually a
matter of choosing the one that allows the clearest test to be written.
To redirect a file to the loop, we place the redirection operator after the done statement.
The loop will use read to input the fields from the redirected file. The read command
will exit after each line is read, with a zero exit status until the end-of-file is reached. At
that point, it will exit with a non-zero exit status, thereby terminating the loop. It is also
possible to pipe standard input into a loop:
#!/bin/bash
# while-read2: read lines from a file
sort -k 1,1 -k 2n distros.txt | while read distro version release; do
printf "Distro: %s\tVersion: %s\tReleased: %s\n" \
$distro \
$version \
$release
done
414
Summing Up
With the introduction of loops, and our previous encounters with branching, subroutines
and sequences, we have covered the major types of flow control used in programs. bash
has some more tricks up its sleeve, but they are refinements on these basic concepts.
Further Reading
The Bash Guide for Beginners from the Linux Documentation Project has some
more examples of while loops:
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_09_02.html
The Wikipedia has an article on loops, which is part of a larger article on flow
control:
http://en.wikipedia.org/wiki/Control_flow#Loops
415
30 Troubleshooting
30 Troubleshooting
As our scripts become more complex, its time to take a look at what happens when
things go wrong and they dont do what we want. In this chapter, well look at some of
the common kinds of errors that occur in scripts, and describe a few useful techniques
that can be used to track down and eradicate problems.
Syntactic Errors
One general class of errors is syntactic. Syntactic errors involve mistyping some element
of shell syntax. In most cases, these kinds of errors will lead to the shell refusing to execute the script.
In the following discussions, we will use this script to demonstrate common types of errors:
#!/bin/bash
# trouble: script to demonstrate common errors
number=1
if [ $number = 1 ]; then
echo "Number is equal to 1."
else
echo "Number is not equal to 1."
fi
416
Syntactic Errors
Missing Quotes
If we edit our script and remove the trailing quote from the argument following the first
echo command:
#!/bin/bash
# trouble: script to demonstrate common errors
number=1
if [ $number = 1 ]; then
echo "Number is equal to 1.
else
echo "Number is not equal to 1."
fi
It generates two errors. Interestingly, the line numbers reported are not where the missing
quote was removed, but rather much later in the program. We can see why, if we follow
the program after the missing quote. bash will continue looking for the closing quote
until it finds one, which it does immediately after the second echo command. bash becomes very confused after that, and the syntax of the if command is broken because the
fi statement is now inside a quoted (but open) string.
In long scripts, this kind of error can be quite hard to find. Using an editor with syntax
highlighting will help. If a complete version of vim is installed, syntax highlighting can
be enabled by entering the command:
:syntax on
30 Troubleshooting
while. Lets look at what happens if we remove the semicolon after the test in the if
command:
#!/bin/bash
# trouble: script to demonstrate common errors
number=1
if [ $number = 1 ] then
echo "Number is equal to 1."
else
echo "Number is not equal to 1."
fi
Again, the error message points to a error that occurs later than the actual problem. What
happens is really pretty interesting. As we recall, if accepts a list of commands and evaluates the exit code of the last command in the list. In our program, we intend this list to
consist of a single command, [, a synonym for test. The [ command takes what follows
it as a list of arguments; in our case, four arguments: $number, 1, =, and ]. With the
semicolon removed, the word then is added to the list of arguments, which is syntactically legal. The following echo command is legal, too. Its interpreted as another command in the list of commands that if will evaluate for an exit code. The else is encountered next, but its out of place, since the shell recognizes it as a reserved word (a
word that has special meaning to the shell) and not the name of a command, hence the error message.
Unanticipated Expansions
Its possible to have errors that only occur intermittently in a script. Sometimes the script
will run fine and other times it will fail because of the results of an expansion. If we return our missing semicolon and change the value of number to an empty variable, we
can demonstrate:
418
Syntactic Errors
#!/bin/bash
# trouble: script to demonstrate common errors
number=
if [ $number = 1 ]; then
echo "Number is equal to 1."
else
echo "Number is not equal to 1."
fi
We get this rather cryptic error message, followed by the output of the second echo
command. The problem is the expansion of the number variable within the test command. When the command:
[ $number = 1 ]
= 1 ]
which is invalid and the error is generated. The = operator is a binary operator (it requires
a value on each side), but the first value is missing, so the test command expects a
unary operator (such as -z) instead. Further, since the test failed (because of the error),
the if command receives a non-zero exit code and acts accordingly, and the second
echo command is executed.
This problem can be corrected by adding quotes around the first argument in the test
command:
[ "$number" = 1 ]
419
30 Troubleshooting
Then when expansion occurs, the result will be this:
[ "" = 1 ]
which yields the correct number of arguments. In addition to empty strings, quotes should
be used in cases where a value could expand into multi-word strings, as with filenames
containing embedded spaces.
Logical Errors
Unlike syntactic errors, logical errors do not prevent a script from running. The script
will run, but it will not produce the desired result, due to a problem with its logic. There
are countless numbers of possible logical errors, but here are a few of the most common
kinds found in scripts:
1. Incorrect conditional expressions. Its easy to incorrectly code an if/then/else
and have the wrong logic carried out. Sometimes the logic will be reversed, or it
will be incomplete.
2. Off by one errors. When coding loops that employ counters, it is possible to
overlook that the loop may require that the counting start with zero, rather than
one, for the count to conclude at the correct point. These kinds of errors result in
either a loop going off the end by counting too far, or else missing the last iteration of the loop by terminating one iteration too soon.
3. Unanticipated situations. Most logic errors result from a program encountering
data or situations that were unforeseen by the programmer. This can also include
unanticipated expansions, such as a filename that contains embedded spaces that
expands into multiple command arguments rather than a single filename.
Defensive Programming
It is important to verify assumptions when programming. This means a careful evaluation
of the exit status of programs and commands that are used by a script. Here is an example, based on a true story. An unfortunate system administrator wrote a script to perform a
maintenance task on an important server. The script contained the following two lines of
code:
cd $dir_name
rm *
420
Logical Errors
There is nothing intrinsically wrong with these two lines, as long as the directory named
in the variable, dir_name, exists. But what happens if it does not? In that case, the cd
command fails and the script continues to the next line and deletes the files in the current
working directory. Not the desired outcome at all! The hapless administrator destroyed an
important part of the server because of this design decision.
Lets look at some ways this design could be improved. First, it might be wise to make
the execution of rm contingent on the success of cd:
cd $dir_name && rm *
This way, if the cd command fails, the rm command is not carried out. This is better, but
still leaves open the possibility that the variable, dir_name, is unset or empty, which
would result in the files in the users home directory being deleted. This could also be
avoided by checking to see that dir_name actually contains the name of an existing directory:
[[ -d $dir_name ]] && cd $dir_name && rm *
Often, it is best to terminate the script with an error when an situation such as the one
above occurs:
# Delete files in directory $dir_name
if [[ ! -d "$dir_name" ]]; then
echo "No such directory: '$dir_name'" >&2
exit 1
fi
if ! cd $dir_name; then
echo "Cannot cd to '$dir_name'" >&2
exit 1
fi
if ! rm *; then
echo "File deletion failed. Check results" >&2
exit 1
fi
Here, we check both the name, to see that it is that of an existing directory, and the success of the cd command. If either fails, a descriptive error message is sent to standard error and the script terminates with an exit status of one to indicate a failure.
421
30 Troubleshooting
Verifying Input
A general rule of good programming is that if a program accepts input, it must be able to
deal with anything it receives. This usually means that input must be carefully screened,
to ensure that only valid input is accepted for further processing. We saw an example of
this in the previous chapter when we studied the read command. One script contained
the following test to verify a menu selection:
[[ $REPLY =~ ^[0-3]$ ]]
This test is very specific. It will only return a zero exit status if the string returned by the
user is a numeral in the range of zero to three. Nothing else will be accepted. Sometimes
these sorts of tests can be very challenging to write, but the effort is necessary to produce
a high quality script.
Testing
Testing is an important step in every kind of software development, including scripts.
There is a saying in the open-source world, release early, release often, which reflects
this fact. By releasing early and often, software gets more exposure to use and testing.
Experience has shown that bugs are much easier to find, and much less expensive to fix,
if they are found early in the development cycle.
In a previous discussion, we saw how stubs can be used to verify program flow. From the
earliest stages of script development, they are a valuable technique to check the progress
422
Testing
of our work.
Lets look at the file-deletion problem above and see how this could be coded for easy
testing. Testing the original fragment of code would be dangerous, since its purpose is to
delete files, but we could modify the code to make the test safe:
if [[ -d $dir_name ]]; then
if cd $dir_name; then
echo rm * # TESTING
else
echo "cannot cd to '$dir_name'" >&2
exit 1
fi
else
echo "no such directory: '$dir_name'" >&2
exit 1
fi
exit # TESTING
Since the error conditions already output useful messages, we don't have to add any. The
most important change is placing an echo command just before the rm command to allow the command and its expanded argument list to be displayed, rather than the command actually being executed. This change allows safe execution of the code. At the end
of the code fragment, we place an exit command to conclude the test and prevent any
other part of the script from being carried out. The need for this will vary according to the
design of the script.
We also include some comments that act as markers for our test-related changes. These
can be used to help find and remove the changes when testing is complete.
Test Cases
To perform useful testing, it's important to develop and apply good test cases. This is
done by carefully choosing input data or operating conditions that reflect edge and corner cases. In our code fragment (which is very simple), we want to know how the code
performs under three specific conditions:
1. dir_name contains the name of an existing directory
2. dir_name contains the name of a non-existent directory
3. dir_name is empty
By performing the test with each of these conditions, good test coverage is achieved.
Just as with design, testing is a function of time, as well. Not every script feature needs to
423
30 Troubleshooting
be extensively tested. It's really a matter of determining what is most important. Since it
could be so potentially destructive if it malfunctioned, our code fragment deserves careful
consideration during both its design and testing.
Debugging
If testing reveals a problem with a script, the next step is debugging. A problem usually
means that the script is, in some way, not performing to the programmer's expectations. If
this is the case, we need to carefully determine exactly what the script is actually doing
and why. Finding bugs can sometimes involve a lot of detective work.
A well designed script will try to help. It should be programmed defensively, to detect abnormal conditions and provide useful feedback to the user. Sometimes, however, problems are quite strange and unexpected, and more involved techniques are required.
By placing comment symbols at the beginning of each line in a logical section of a script,
we prevent that section from being executed. Testing can then be performed again, to see
if the removal of the code has any impact on the behavior of the bug.
Tracing
Bugs are often cases of unexpected logical flow within a script. That is, portions of the
script are either never being executed, or are being executed in the wrong order or at the
424
Debugging
wrong time. To view the actual flow of the program, we use a technique called tracing.
One tracing method involves placing informative messages in a script that display the location of execution. We can add messages to our code fragment:
echo "preparing to delete files" >&2
if [[ -d $dir_name ]]; then
if cd $dir_name; then
echo "deleting files" >&2
rm *
else
echo "cannot cd to '$dir_name'" >&2
exit 1
fi
else
echo "no such directory: '$dir_name'" >&2
exit 1
fi
echo "file deletion complete" >&2
We send the messages to standard error to separate them from normal output. We also do
not indent the lines containing the messages, so it is easier to find when its time to remove them.
Now when the script is executed, its possible to see that the file deletion has been performed:
[me@linuxbox ~]$ deletion-script
preparing to delete files
deleting files
file deletion complete
[me@linuxbox ~]$
bash also provides a method of tracing, implemented by the -x option and the set
command with the -x option. Using our earlier trouble script, we can activate tracing
for the entire script by adding the -x option to the first line:
#!/bin/bash -x
# trouble: script to demonstrate common errors
number=1
425
30 Troubleshooting
if [ $number = 1 ]; then
echo "Number is equal to 1."
else
echo "Number is not equal to 1."
fi
With tracing enabled, we see the commands performed with expansions applied. The
leading plus signs indicate the display of the trace to distinguish them from lines of regular output. The plus sign is the default character for trace output. It is contained in the
PS4 (prompt string 4) shell variable. The contents of this variable can be adjusted to
make the prompt more useful. Here, we modify the contents of the variable to include the
current line number in the script where the trace is performed. Note that single quotes are
required to prevent expansion until the prompt is actually used:
[me@linuxbox ~]$ export PS4='$LINENO + '
[me@linuxbox ~]$ trouble
5 + number=1
7 + '[' 1 = 1 ']'
8 + echo 'Number is equal to 1.'
Number is equal to 1.
To perform a trace on a selected portion of a script, rather than the entire script, we can
use the set command with the -x option:
#!/bin/bash
# trouble: script to demonstrate common errors
number=1
set -x # Turn on tracing
if [ $number = 1 ]; then
echo "Number is equal to 1."
426
Debugging
else
echo "Number is not equal to 1."
fi
set +x # Turn off tracing
We use the set command with the -x option to activate tracing and the +x option to deactivate tracing. This technique can be used to examine multiple portions of a troublesome script.
In this trivial example, we simply display the value of the variable number and mark the
added line with a comment to facilitate its later identification and removal. This technique is particularly useful when watching the behavior of loops and arithmetic within
scripts.
Summing Up
In this chapter, we looked at just a few of the problems that can crop up during script development. Of course, there are many more. The techniques described here will enable
finding most common bugs. Debugging is a fine art that can be developed through experience, both in knowing how to avoid bugs (testing constantly throughout development)
and in finding bugs (effective use of tracing).
427
30 Troubleshooting
Further Reading
The Wikipedia has a couple of short articles on syntactic and logical errors:
http://en.wikipedia.org/wiki/Syntax_error
http://en.wikipedia.org/wiki/Logic_error
There are many online resources for the technical aspects of bash programming:
http://mywiki.wooledge.org/BashPitfalls
http://tldp.org/LDP/abs/html/gotchas.html
http://www.gnu.org/software/bash/manual/html_node/Reserved-Word-Index.html
Eric Raymonds The Art of Unix Programming is a great resource for learning the
basic concepts found in well-written Unix programs. Many of these ideas apply to
shell scripts:
http://www.faqs.org/docs/artu/
http://www.faqs.org/docs/artu/ch01s06.html
428
case
The bash multiple-choice compound command is called case. It has the following syntax:
case word in
[pattern [| pattern]...) commands ;;]...
esac
If we look at the read-menu program from Chapter 28, we see the logic used to act on
a users selection:
#!/bin/bash
# read-menu: a menu driven system information program
clear
echo "
Please Select:
1. Display System Information
2. Display Disk Space
3. Display Home Space Utilization
0. Quit
"
read -p "Enter selection [0-3] > "
if [[ $REPLY =~ ^[0-3]$ ]]; then
if [[ $REPLY == 0 ]]; then
429
else
fi
if [[ $REPLY == 1 ]]; then
echo "Hostname: $HOSTNAME"
uptime
exit
fi
if [[ $REPLY == 2 ]]; then
df -h
exit
fi
if [[ $REPLY == 3 ]]; then
if [[ $(id -u) -eq 0 ]]; then
echo "Home Space Utilization (All Users)"
du -sh /home/*
else
echo "Home Space Utilization ($USER)"
du -sh $HOME
fi
exit
fi
echo "Invalid entry." >&2
exit 1
fi
430
case
1)
2)
3)
*)
esac
The case command looks at the value of word, in our example, the value of the REPLY
variable, and then attempts to match it against one of the specified patterns. When a
match is found, the commands associated with the specified pattern are executed. After a
match is found, no further matches are attempted.
Patterns
The patterns used by case are the same as those used by pathname expansion. Patterns
are terminated with a ) character. Here are some valid patterns:
Table32- 1: case Pattern Examples
Pattern
a)
Description
[[:alpha:]])
???)
*.txt)
*)
echo
echo
echo
echo
echo
"is
"is
"is
"is
"is
It is also possible to combine multiple patterns using the vertical bar character as a separator. This creates an or conditional pattern. This is useful for such things as handling
both upper- and lowercase characters. For example:
#!/bin/bash
# case-menu: a menu driven system information program
clear
echo "
Please Select:
A. Display System Information
B. Display Disk Space
C. Display Home Space Utilization
Q. Quit
"
read -p "Enter selection [A, B, C or Q] > "
case $REPLY in
q|Q) echo "Program terminated."
exit
;;
a|A) echo "Hostname: $HOSTNAME"
uptime
;;
b|B) df -h
;;
c|C) if [[ $(id -u) -eq 0 ]]; then
echo "Home Space Utilization (All Users)"
du -sh /home/*
else
echo "Home Space Utilization ($USER)"
du -sh $HOME
fi
432
case
*)
esac
;;
echo "Invalid entry" >&2
exit 1
;;
Here, we modify the case-menu program to use letters instead of digits for menu selection. Notice how the new patterns allow for entry of both upper- and lowercase letters.
The script works for the most part, but fails if a character matches more than one of the
POSIX characters classes. For example, the character "a" is both lower case and alphabetic, as well as a hexadecimal digit. In bash prior to version 4.0 there was no way for
case to match more than one test. Modern versions of bash, add the ;;& notation to
433
The addition of the ";;&" syntax allows case to continue on to the next test rather than
simply terminating.
Summing Up
The case command is a handy addition to our bag of programming tricks. As we will
see in the next chapter, its the perfect tool for handling certain types of problems.
Further Reading
The Bash Reference Manual section on Conditional Constructs describes the
case command in detail:
http://tiswww.case.edu/php/chet/bash/bashref.html#SEC21
434
Further Reading
tions:
http://tldp.org/LDP/abs/html/testbranch.html
435
32 Positional Parameters
32 Positional Parameters
One feature that has been missing from our programs is the ability to accept and process
command line options and arguments. In this chapter, we will examine the shell features
that allow our programs to get access to the contents of the command line.
A very simple script that displays the values of the variables $0-$9. When executed
with no command line arguments:
[me@linuxbox ~]$ posit-param
$0 = /home/me/bin/posit-param
436
=
=
=
=
=
=
=
=
=
Even when no arguments are provided, $0 will always contain the first item appearing on
the command line, which is the pathname of the program being executed. When arguments are provided, we see the results:
[me@linuxbox ~]$ posit-param a b c d
$0
$1
$2
$3
$4
$5
$6
$7
$8
$9
=
=
=
=
=
=
=
=
=
=
/home/me/bin/posit-param
a
b
c
d
Note: You can actually access more than nine parameters using parameter expansion. To specify a number greater than nine, surround the number in braces. For example ${10}, ${55}, ${211}, and so on.
437
32 Positional Parameters
Number of arguments: $#
\$0 = $0
\$1 = $1
\$2 = $2
\$3 = $3
\$4 = $4
\$5 = $5
\$6 = $6
\$7 = $7
\$8 = $8
\$9 = $9
"
The result:
[me@linuxbox ~]$ posit-param a b c d
Number of arguments: 4
$0 = /home/me/bin/posit-param
$1 = a
$2 = b
$3 = c
$4 = d
$5 =
$6 =
$7 =
$8 =
$9 =
438
On this example system, the wildcard * expands into 82 arguments. How can we process
that many? The shell provides a method, albeit a clumsy one, to do this. The shift
command causes all the parameters to move down one each time it is executed. In fact,
by using shift, it is possible to get by with only one parameter (in addition to $0,
which never changes):
#!/bin/bash
# posit-param2: script to display all arguments
count=1
while [[ $# -gt 0 ]]; do
echo "Argument $count = $1"
count=$((count + 1))
shift
done
Each time shift is executed, the value of $2 is moved to $1, the value of $3 is moved
to $2 and so on. The value of $# is also reduced by one.
In the posit-param2 program, we create a loop that evaluates the number of arguments remaining and continues as long as there is at least one. We display the current argument, increment the variable count with each iteration of the loop to provide a running count of the number of arguments processed and, finally, execute a shift to load
$1 with the next argument. Here is the program at work:
[me@linuxbox
Argument 1 =
Argument 2 =
Argument 3 =
Argument 4 =
~]$ posit-param2 a b c d
a
b
c
d
Simple Applications
Even without shift, its possible to write useful applications using positional parameters. By way of example, here is a simple file information program:
439
32 Positional Parameters
#!/bin/bash
# file_info: simple file information program
PROGNAME=$(basename $0)
if [[ -e $1 ]]; then
echo -e "\nFile Type:"
file $1
echo -e "\nFile Status:"
stat $1
else
echo "$PROGNAME: usage: $PROGNAME file" >&2
exit 1
fi
This program displays the file type (determined by the file command) and the file status (from the stat command) of a specified file. One interesting feature of this program
is the PROGNAME variable. It is given the value that results from the basename $0
command. The basename command removes the leading portion of a pathname, leaving only the base name of a file. In our example, basename removes the leading portion
of the pathname contained in the $0 parameter, the full pathname of our example program. This value is useful when constructing messages such as the usage message at the
end of the program. By coding it this way, the script can be renamed and the message automatically adjusts to contain the name of the program.
440
Now, if a script that incorporates the file_info shell function calls the function with a
filename argument, the argument will be passed to the function.
With this capability, we can write many useful shell functions that can not only be used in
scripts, but also within the .bashrc file.
Notice that the PROGNAME variable was changed to the shell variable FUNCNAME. The
shell automatically updates this variable to keep track of the currently executed shell
function. Note that $0 always contains the full pathname of the first item on the command line (i.e., the name of the program) and does not contain the name of the shell function as we might expect.
Description
$*
$@
441
32 Positional Parameters
#!/bin/bash
# posit-params3 : script to demonstrate $* and $@
print_params () {
echo "\$1 = $1"
echo "\$2 = $2"
echo "\$3 = $3"
echo "\$4 = $4"
}
pass_params () {
echo -e "\n"
echo -e "\n"
echo -e "\n"
echo -e "\n"
}
'$* :';
'"$*" :';
'$@ :';
'"$@" :';
print_params
print_params
print_params
print_params
$*
"$*"
$@
"$@"
In this rather convoluted program, we create two arguments: word and words with
spaces, and pass them to the pass_params function. That function, in turn, passes
them on to the print_params function, using each of the four methods available with
the special parameters $! and $@. When executed, the script reveals the differences:
[me@linuxbox ~]$ posit-param3
$* :
$1 = word
$2 = words
$3 = with
$4 = spaces
"$*" :
$1 = word words with spaces
$2 =
$3 =
$4 =
$@ :
$1 = word
$2 = words
$3 = with
$4 = spaces
"$@" :
$1 = word
442
which matches our actual intent. The lesson to take from this is that even though the shell
provides four different ways of getting the list of positional parameters, "$@" is by far
the most useful for most situations, because it preserves the integrity of each positional
parameter.
Output file. We will add an option to specify a name for a file to contain the programs output. It will be specified as either -f file or --file file.
Interactive mode. This option will prompt the user for an output filename and
will determine if the specified file already exists. If it does, the user will be
prompted before the existing file is overwritten. This option will be specified by
either -i or --interactive.
443
32 Positional Parameters
filename=
while [[ -n $1 ]]; do
case $1 in
-f | --file)
-i | --interactive)
-h | --help)
*)
shift
filename=$1
;;
interactive=1
;;
usage
exit
;;
usage >&2
exit 1
;;
esac
shift
done
First, we add a shell function called usage to display a message when the help option is
invoked or an unknown option is attempted.
Next, we begin the processing loop. This loop continues while the positional parameter
$1 is not empty. At the bottom of the loop, we have a shift command to advance the
positional parameters to ensure that the loop will eventually terminate.
Within the loop, we have a case statement that examines the current positional parameter to see if it matches any of the supported choices. If a supported parameter is found, it
is acted upon. If not, the usage message is displayed and the script terminates with an error.
The -f parameter is handled in an interesting way. When detected, it causes an additional
shift to occur, which advances the positional parameter $1 to the filename argument
supplied to the -f option.
We next add the code to implement the interactive mode:
# interactive mode
if [[ -n $interactive ]]; then
while true; do
read -p "Enter name of output file: " filename
if [[ -e $filename ]]; then
read -p "'$filename' exists. Overwrite? [y/n/q] > "
case $REPLY in
Y|y) break
444
Q|q)
*)
done
;;
echo "Program terminated."
exit
;;
continue
;;
esac
elif [[ -z $filename ]]; then
continue
else
break
fi
fi
If the interactive variable is not empty, an endless loop is started, which contains
the filename prompt and subsequent existing file-handling code. If the desired output file
already exists, the user is prompted to overwrite, choose another filename, or quit the
program. If the user chooses to overwrite an existing file, a break is executed to terminate the loop. Notice how the case statement only detects if the user chooses to overwrite or quit. Any other choice causes the loop to continue and prompts the user again.
In order to implement the output filename feature, we must first convert the existing
page-writing code into a shell function, for reasons that will become clear in a moment:
write_html_page () {
cat <<- _EOF_
<HTML>
<HEAD>
<TITLE>$TITLE</TITLE>
</HEAD>
<BODY>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
$(report_uptime)
$(report_disk_space)
$(report_home_space)
</BODY>
</HTML>
_EOF_
return
}
# output html page
if [[ -n $filename ]]; then
445
32 Positional Parameters
else
fi
The code that handles the logic of the -f option appears at the end of the listing shown
above. In it, we test for the existence of a filename and, if one is found, a test is performed to see if the file is indeed writable. To do this, a touch is performed, followed
by a test to determine if the resulting file is a regular file. These two tests take care of sit uations where an invalid pathname is input (touch will fail), and, if the file already exists, that its a regular file.
As we can see, the write_html_page function is called to perform the actual generation of the page. Its output is either directed to standard output (if the variable filename is empty) or redirected to the specified file.
Summing Up
With the addition of positional parameters, we can now write fairly functional scripts.
For simple, repetitive tasks, positional parameters make it possible to write very useful
shell functions that can be placed in a users .bashrc file.
Our sys_info_page program has grown in complexity and sophistication. Here is a
complete listing, with the most recent changes highlighted:
#!/bin/bash
# sys_info_page: program to output a system information page
PROGNAME=$(basename $0)
TITLE="System Information Report For $HOSTNAME"
CURRENT_TIME=$(date +"%x %r %Z")
TIMESTAMP="Generated $CURRENT_TIME, by $USER"
report_uptime () {
cat <<- _EOF_
<H2>System Uptime</H2>
<PRE>$(uptime)</PRE>
_EOF_
return
446
Summing Up
}
report_disk_space () {
cat <<- _EOF_
<H2>Disk Space Utilization</H2>
<PRE>$(df -h)</PRE>
_EOF_
return
}
report_home_space () {
if [[ $(id -u) -eq 0 ]]; then
cat <<- _EOF_
<H2>Home Space Utilization (All Users)</H2>
<PRE>$(du -sh /home/*)</PRE>
_EOF_
else
cat <<- _EOF_
<H2>Home Space Utilization ($USER)</H2>
<PRE>$(du -sh $HOME)</PRE>
_EOF_
fi
return
}
usage () {
echo "$PROGNAME: usage: $PROGNAME [-f file | -i]"
return
}
write_html_page () {
cat <<- _EOF_
<HTML>
<HEAD>
<TITLE>$TITLE</TITLE>
</HEAD>
<BODY>
<H1>$TITLE</H1>
<P>$TIMESTAMP</P>
$(report_uptime)
$(report_disk_space)
$(report_home_space)
</BODY>
</HTML>
_EOF_
return
}
# process command line options
447
32 Positional Parameters
interactive=
filename=
while [[ -n $1 ]]; do
case $1 in
-f | --file)
-i | --interactive)
-h | --help)
*)
done
esac
shift
shift
filename=$1
;;
interactive=1
;;
usage
exit
;;
usage >&2
exit 1
;;
# interactive mode
if [[ -n $interactive ]]; then
while true; do
read -p "Enter name of output file: " filename
if [[ -e $filename ]]; then
read -p "'$filename' exists. Overwrite? [y/n/q] > "
case $REPLY in
Y|y) break
;;
Q|q) echo "Program terminated."
exit
;;
*)
continue
;;
esac
fi
done
fi
# output html page
if [[ -n $filename ]]; then
if touch $filename && [[ -f $filename ]]; then
write_html_page > $filename
else
echo "$PROGNAME: Cannot write file '$filename'" >&2
exit 1
fi
else
448
Summing Up
write_html_page
fi
Were not done yet. There are still more things we can do and improvements we can
make.
Further Reading
The Bash Reference Manual has an article on the special parameters, including
$* and $@:
http://www.gnu.org/software/bash/manual/bashref.html#Special-Parameters
449
Where variable is the name of a variable that will increment during the execution of the
loop, words is an optional list of items that will be sequentially assigned to variable, and
commands are the commands that are to be executed on each iteration of the loop.
The for command is useful on the command line. We can easily demonstrate how it
works:
[me@linuxbox ~]$ for i in A B C D; do echo $i; done
A
B
C
D
In this example, for is given a list of four words: A, B, C, and D. With a list of
four words, the loop is executed four times. Each time the loop is executed, a word is assigned to the variable i. Inside the loop, we have an echo command that displays the
value of i to show the assignment. As with the while and until loops, the done keyword closes the loop.
450
or pathname expansion:
[me@linuxbox ~]$ for i in distros*.txt; do echo $i; done
distros-by-date.txt
distros-dates.txt
distros-key-names.txt
distros-key-vernums.txt
distros-names.txt
distros.txt
distros-vernums.txt
distros-versions.txt
or command substitution:
#!/bin/bash
# longest-word : find longest string in a file
while [[ -n $1 ]]; do
if [[ -r $1 ]]; then
max_word=
max_len=0
for i in $(strings $1); do
len=$(echo $i | wc -c)
if (( len > max_len )); then
max_len=$len
max_word=$i
fi
done
echo "$1: '$max_word' ($max_len characters)"
fi
shift
done
451
As we can see, we have changed the outermost loop to use for in place of while. By
omitting the list of words in the for command, the positional parameters are used instead. Inside the loop, previous instances of the variable i have been changed to the variable j. The use of shift has also been eliminated.
Why i?
You may have noticed that the variable i was chosen for each of the for loop
examples above. Why? No specific reason actually, besides tradition. The variable
used with for can be any valid variable, but i is the most common, followed by
j and k.
452
The basis of this tradition comes from the Fortran programming language. In Fortran, undeclared variables starting with the letters I, J, K, L, and M are automatically typed as integers, while variables beginning with any other letter are typed
as real (numbers with decimal fractions). This behavior led programmers to use
the variables I, J, and K for loop variables, since it was less work to use them
when a temporary variable (as loop variables often are) was needed.
It also led to the following Fortran-based witticism:
GOD is real, unless declared integer.
where expression1, expression2, and expression3 are arithmetic expressions and commands are the commands to be performed during each iteration of the loop.
In terms of behavior, this form is equivalent to the following construct:
(( expression1 ))
while (( expression2 )); do
commands
(( expression3 ))
done
expression1 is used to initialize conditions for the loop, expression2 is used to determine
when the loop is finished, and expression3 is carried out at the end of each iteration of the
loop.
Here is a typical application:
#!/bin/bash
# simple_counter : demo of C style for command
for (( i=0; i<5; i=i+1 )); do
echo $i
done
453
In this example, expression1 initializes the variable i with the value of zero, expression2
allows the loop to continue as long as the value of i remains less than 5, and expression3
increments the value of i by one each time the loop repeats.
The C language form of for is useful anytime a numeric sequence is needed. We will see
several applications for this in the next two chapters.
Summing Up
With our knowledge of the for command, we will now apply the final improvements to
our sys_info_page script. Currently, the report_home_space function looks
like this:
report_home_space () {
if [[ $(id -u) -eq 0 ]]; then
cat <<- _EOF_
<H2>Home Space Utilization (All Users)</H2>
<PRE>$(du -sh /home/*)</PRE>
_EOF_
else
cat <<- _EOF_
<H2>Home Space Utilization ($USER)</H2>
<PRE>$(du -sh $HOME)</PRE>
_EOF_
fi
return
}
Next, we will rewrite it to provide more detail for each users home directory, and include
the total number of files and subdirectories in each:
report_home_space () {
454
Summing Up
local format="%8s%10s%10s\n"
local i dir_list total_files total_dirs total_size user_name
if [[ $(id -u) -eq 0 ]]; then
dir_list=/home/*
user_name="All Users"
else
dir_list=$HOME
user_name=$USER
fi
echo "<H2>Home Space Utilization ($user_name)</H2>"
for i in $dir_list; do
total_files=$(find $i -type f | wc -l)
total_dirs=$(find $i -type d | wc -l)
total_size=$(du -sh $i | cut -f 1)
echo "<H3>$i</H3>"
echo "<PRE>"
printf "$format" "Dirs" "Files" "Size"
printf "$format" "----" "-----" "----"
printf "$format" $total_dirs $total_files $total_size
echo "</PRE>"
done
return
}
This rewrite applies much of what we have learned so far. We still test for the superuser,
but instead of performing the complete set of actions as part of the if, we set some variables used later in a for loop. We have added several local variables to the function and
made use of printf to format some of the output.
Further Reading
The Advanced Bash-Scripting Guide has a chapter on loops, with a variety of examples using for:
http://tldp.org/LDP/abs/html/loops1.html
The Bash Reference Manual describes the looping compound commands, including for:
http://www.gnu.org/software/bash/manual/bashref.html#Looping-Constructs
455
Parameter Expansion
Though parameter expansion came up in Chapter 7, we did not cover it in detail because
most parameter expansions are used in scripts rather than on the command line. We have
already worked with some forms of parameter expansion; for example, shell variables.
The shell provides many more.
Basic Parameters
The simplest form of parameter expansion is reflected in the ordinary use of variables.
For example:
$a
when expanded, becomes whatever the variable a contains. Simple parameters may also
be surrounded by braces:
${a}
This has no effect on the expansion, but is required if the variable is adjacent to other
text, which may confuse the shell. In this example, we attempt to create a filename by appending the string _file to the contents of the variable a.
[me@linuxbox ~]$ a="foo"
[me@linuxbox ~]$ echo "$a_file"
456
Parameter Expansion
If we perform this sequence, the result will be nothing, because the shell will try to expand a variable named a_file rather than a. This problem can be solved by adding
braces:
[me@linuxbox ~]$ echo "${a}_file"
foo_file
We have also seen that positional parameters greater than 9 can be accessed by surrounding the number in braces. For example, to access the eleventh positional parameter, we
can do this:
${11}
foo=
echo ${foo:-"substitute value if unset"}
if unset
echo $foo
${parameter:=word}
If parameter is unset or empty, this expansion results in the value of word. In addition,
the value of word is assigned to parameter. If parameter is not empty, the expansion results in the value of parameter.
[me@linuxbox ~]$ foo=
[me@linuxbox ~]$ echo ${foo:="default value if unset"}
457
unset
echo $foo
unset
foo=bar
echo ${foo:="default value if unset"}
echo $foo
Note: Positional and other special parameters cannot be assigned this way.
${parameter:?word}
If parameter is unset or empty, this expansion causes the script to exit with an error, and
the contents of word are sent to standard error. If parameter is not empty, the expansion
results in the value of parameter.
[me@linuxbox ~]$ foo=
[me@linuxbox ~]$ echo ${foo:?"parameter is empty"}
bash: foo: parameter is empty
[me@linuxbox ~]$ echo $?
1
[me@linuxbox ~]$ foo=bar
[me@linuxbox ~]$ echo ${foo:?"parameter is empty"}
bar
[me@linuxbox ~]$ echo $?
0
${parameter:+word}
If parameter is unset or empty, the expansion results in nothing. If parameter is not
empty, the value of word is substituted for parameter; however, the value of parameter is
not changed.
[me@linuxbox ~]$ foo=
[me@linuxbox ~]$ echo ${foo:+"substitute value if set"}
[me@linuxbox ~]$ foo=bar
[me@linuxbox ~]$ echo ${foo:+"substitute value if set"}
substitute value if set
458
Parameter Expansion
String Operations
There is a large set of expansions that can be used to operate on strings. Many of these
expansions are particularly well suited for operations on pathnames.
${#parameter}
expands into the length of the string contained by parameter. Normally, parameter is a
string; however, if parameter is either @ or *, then the expansion results in the number of
positional parameters.
[me@linuxbox ~]$ foo="This string is long."
[me@linuxbox ~]$ echo "'$foo' is ${#foo} characters long."
'This string is long.' is 20 characters long.
${parameter:offset}
${parameter:offset:length}
These expansions are used to extract a portion of the string contained in parameter. The
extraction begins at offset characters from the beginning of the string and continues until
the end of the string, unless the length is specified.
[me@linuxbox ~]$ foo="This string is long."
[me@linuxbox ~]$ echo ${foo:5}
string is long.
459
If the value of offset is negative, it is taken to mean it starts from the end of the string
rather than the beginning. Note that negative values must be preceded by a space to prevent confusion with the ${parameter:-word} expansion. length, if present, must not
be less than zero.
If parameter is @, the result of the expansion is length positional parameters, starting at
offset.
[me@linuxbox ~]$ foo="This string is long."
[me@linuxbox ~]$ echo ${foo: -5}
long.
[me@linuxbox ~]$ echo ${foo: -5:2}
lo
${parameter#pattern}
${parameter##pattern}
These expansions remove a leading portion of the string contained in parameter defined
by pattern. pattern is a wildcard pattern like those used in pathname expansion. The difference in the two forms is that the # form removes the shortest match, while the ## form
removes the longest match.
[me@linuxbox ~]$ foo=file.txt.zip
[me@linuxbox ~]$ echo ${foo#*.}
txt.zip
[me@linuxbox ~]$ echo ${foo##*.}
zip
${parameter%pattern}
${parameter%%pattern}
These expansions are the same as the # and ## expansions above, except they remove
text from the end of the string contained in parameter rather than from the beginning.
[me@linuxbox ~]$ foo=file.txt.zip
[me@linuxbox ~]$ echo ${foo%.*}
file.txt
[me@linuxbox ~]$ echo ${foo%%.*}
460
Parameter Expansion
file
${parameter/pattern/string}
${parameter//pattern/string}
${parameter/#pattern/string}
${parameter/%pattern/string}
This expansion performs a search-and-replace upon the contents of parameter. If text is
found matching wildcard pattern, it is replaced with the contents of string. In the normal
form, only the first occurrence of pattern is replaced. In the // form, all occurrences are
replaced. The /# form requires that the match occur at the beginning of the string, and
the /% form requires the match to occur at the end of the string. /string may be omitted,
which causes the text matched by pattern to be deleted.
[me@linuxbox
[me@linuxbox
jpg.JPG
[me@linuxbox
jpg.jpg
[me@linuxbox
jpg.JPG
[me@linuxbox
JPG.jpg
~]$ foo=JPG.JPG
~]$ echo ${foo/JPG/jpg}
~]$ echo ${foo//JPG/jpg}
~]$ echo ${foo/#JPG/jpg}
~]$ echo ${foo/%JPG/jpg}
Parameter expansion is a good thing to know. The string manipulation expansions can be
used as substitutes for other common commands such as sed and cut. Expansions improve the efficiency of scripts by eliminating the use of external programs. As an example, we will modify the longest-word program discussed in the previous chapter to
use the parameter expansion ${#j} in place of the command substitution $(echo $j
| wc -c) and its resulting subshell, like so:
#!/bin/bash
# longest-word3 : find longest string in a file
for i; do
if [[ -r $i ]]; then
max_word=
max_len=
for j in $(strings $i); do
len=${#j}
if (( len > max_len )); then
461
fi
done
echo "$i: '$max_word' ($max_len characters)"
fi
shift
done
Next, we will compare the efficiency of the two versions by using the time command:
[me@linuxbox ~]$ time longest-word2 dirlist-usr-bin.txt
dirlist-usr-bin.txt: 'scrollkeeper-get-extended-content-list' (38
characters)
real 0m3.618s
user 0m1.544s
sys 0m1.768s
[me@linuxbox ~]$ time longest-word3 dirlist-usr-bin.txt
dirlist-usr-bin.txt: 'scrollkeeper-get-extended-content-list' (38
characters)
real 0m0.060s
user 0m0.056s
sys 0m0.008s
The original version of the script takes 3.618 seconds to scan the text file, while the new
version, using parameter expansion, takes only 0.06 secondsa
very significant im provement.
Case Conversion
Recent versions of bash have support for upper/lowercase conversion of strings. bash
has four parameter expansions and two options to the declare command to support it.
So what is case conversion good for? Aside from the obvious aesthetic value, it has an
important role in programming. Let's consider the case of a database look-up. Imagine
that a user has entered a string into a data input field that we want to look up in a database. It's possible the user will enter the value in all uppercase letters or lowercase letters
or a combination of both. We certainly don't want to populate our database with every
possible permutation of upper and lower case spellings. What to do?
A common approach to this problem is to normalize the user's input. That is, convert it
into a standardized form before we attempt the database look-up. We can do this by con462
Parameter Expansion
verting all of the characters in the user's input to either lower or uppercase and ensure that
the database entries are normalized the same way.
The declare command can be used to normalize strings to either upper or lowercase.
Using declare, we can force a variable to always contain the desired format no matter
what is assigned to it:
#!/bin/bash
# ul-declare: demonstrate case conversion via declare
declare -u upper
declare -l lower
if [[ $1 ]]; then
upper="$1"
lower="$1"
echo $upper
echo $lower
fi
In the above script, we use declare to create two variables, upper and lower. We
assign the value of the first command line argument (positional parameter 1) to each of
the variables and then display them on the screen:
[me@linuxbox ~]$ ul-declare aBc
ABC
abc
As we can see, the command line argument ("aBc") has been normalized.
There are four parameter expansions that perform upper/lowercase conversion:
Table 34-1: Case Conversion Parameter Expansions
Format
Result
${parameter,,}
${parameter,}
${parameter^^}
${parameter^}
then
${1,,}
${1,}
${1^^}
${1^}
Again, we process the first command line argument and output the four variations supported by the parameter expansions. While this script uses the first positional parameter,
parameter my be any string, variable, or string expression.
464
Number Bases
Back in Chapter 9, we got a look at octal (base 8) and hexadecimal (base 16) numbers. In
arithmetic expressions, the shell supports integer constants in any base.
Table 34-2: Specifying Different Number Bases
Notation
Description
number
0number
0xnumber
Hexadecimal notation
base#number
number is in base
Some examples:
[me@linuxbox ~]$ echo $((0xff))
255
[me@linuxbox ~]$ echo $((2#11111111))
255
In the examples above, we print the value of the hexadecimal number ff (the largest
two-digit number) and the largest eight-digit binary (base 2) number.
Unary Operators
There are two unary operators, the + and -, which are used to indicate if a number is positive or negative, respectively. For example, -5.
Simple Arithmetic
The ordinary arithmetic operators are listed in the table below:
Table 34-3: Arithmetic Operators
Operator
Description
Addition
Subtraction
465
Multiplication
Integer division
**
Exponentiation
Modulo (remainder)
Most of these are self-explanatory, but integer division and modulo require further discussion.
Since the shells arithmetic only operates on integers, the results of division are always
whole numbers:
[me@linuxbox ~]$ echo $(( 5 / 2 ))
2
By using the division and modulo operators, we can determine that 5 divided by 2 results
in 2, with a remainder of 1.
Calculating the remainder is useful in loops. It allows an operation to be performed at
specified intervals during the loop's execution. In the example below, we display a line of
numbers, highlighting each multiple of 5:
#!/bin/bash
# modulo : demonstrate the modulo operator
for ((i = 0; i <= 20; i = i + 1)); do
remainder=$((i % 5))
if (( remainder == 0 )); then
printf "<%d> " $i
else
printf "%d " $i
fi
done
printf "\n"
466
Assignment
Although its uses may not be immediately apparent, arithmetic expressions may perform
assignment. We have performed assignment many times, though in a different context.
Each time we give a variable a value, we are performing assignment. We can also do it
within arithmetic expressions:
[me@linuxbox ~]$ foo=
[me@linuxbox ~]$ echo $foo
[me@linuxbox ~]$ if (( foo = 5 ));then echo "It is true."; fi
It is true.
[me@linuxbox ~]$ echo $foo
5
In the example above, we first assign an empty value to the variable foo and verify that
it is indeed empty. Next, we perform an if with the compound command (( foo = 5
)). This process does two interesting things: 1) it assigns the value of 5 to the variable
foo, and 2) it evaluates to true because foo was assigned a nonzero value.
Note: It is important to remember the exact meaning of the = in the expression
above. A single = performs assignment. foo = 5 says make foo equal to 5,
while == evaluates equivalence. foo == 5 says does foo equal 5? This can
be very confusing because the test command accepts a single = for string equivalence. This is yet another reason to use the more modern [[ ]] and (( )) compound commands in place of test.
In addition to the =, the shell also provides notations that perform some very useful assignments:
Table 34-4: Assignment Operators
Notation
Description
467
parameter += value
parameter -= value
parameter *= value
parameter /= value
parameter %= value
parameter++
parameter
++parameter
--parameter
These assignment operators provide a convenient shorthand for many common arithmetic
tasks. Of special interest are the increment (++) and decrement () operators, which increase or decrease the value of their parameters by one. This style of notation is taken
from the C programming language and has been incorporated by several other programming languages, including bash.
The operators may appear either at the front of a parameter or at the end. While they both
either increment or decrement the parameter by one, the two placements have a subtle
difference. If placed at the front of the parameter, the parameter is incremented (or decremented) before the parameter is returned. If placed after, the operation is performed after
the parameter is returned. This is rather strange, but it is the intended behavior. Here is a
demonstration:
[me@linuxbox ~]$ foo=1
[me@linuxbox ~]$ echo $((foo++))
1
[me@linuxbox ~]$ echo $foo
468
If we assign the value of one to the variable foo and then increment it with the ++ operator placed after the parameter name, foo is returned with the value of one. However, if
we look at the value of the variable a second time, we see the incremented value. If we
place the ++ operator in front of the parameter, we get the more expected behavior:
[me@linuxbox ~]$ foo=1
[me@linuxbox ~]$ echo $((++foo))
2
[me@linuxbox ~]$ echo $foo
2
For most shell applications, prefixing the operator will be the most useful.
The ++ and -- operators are often used in conjunction with loops. We will make some improvements to our modulo script to tighten it up a bit:
#!/bin/bash
# modulo2 : demonstrate the modulo operator
for ((i = 0; i <=
if (((i % 5)
printf
else
printf
fi
done
printf "\n"
Bit Operations
One class of operators manipulates numbers in an unusual way. These operators work at
the bit level. They are used for certain kinds of low level tasks, often involving setting or
reading bit-flags.
Table 34-5: Bit Operators
Operator
Description
Left bitwise shift. Shift all the bits in a number to the left.
>>
Right bitwise shift. Shift all the bits in a number to the right.
&
Note that there are also corresponding assignment operators ( for example, <<=) for all
but bitwise negation.
Here we will demonstrate producing a list of powers of 2, using the left bitwise shift operator:
[me@linuxbox ~]$ for ((i=0;i<8;++i)); do echo $((1<<i)); done
1
2
4
8
16
32
64
128
Logic
As we discovered in Chapter 27, the (( )) compound command supports a variety of
comparison operators. There are a few more that can be used to evaluate logic. Here is
the complete list:
Table 34-6: Comparison Operators
Operator
<=
Description
>=
<
Less than
>
Greater than
470
Equal to
!=
Not equal to
&&
Logical AND
||
Logical OR
expr1?expr2:expr3
When used for logical operations, expressions follow the rules of arithmetic logic; that is,
expressions that evaluate as zero are considered false, while non-zero expressions are
considered true. The (( )) compound command maps the results into the shells normal
exit codes:
[me@linuxbox ~]$ if ((1)); then echo "true"; else echo "false"; fi
true
[me@linuxbox ~]$ if ((0)); then echo "true"; else echo "false"; fi
false
The strangest of the logical operators is the ternary operator. This operator (which is
modeled after the one in the C programming language) performs a standalone logical test.
It can be used as a kind of if/then/else statement. It acts on three arithmetic expressions
(strings wont work), and if the first expression is true (or non-zero) the second expression is performed. Otherwise, the third expression is performed. We can try this on the
command line:
[me@linuxbox
[me@linuxbox
[me@linuxbox
1
[me@linuxbox
[me@linuxbox
0
~]$ a=0
~]$ ((a<1?++a:--a))
~]$ echo $a
~]$ ((a<1?++a:--a))
~]$ echo $a
Here we see a ternary operator in action. This example implements a toggle. Each time
the operator is performed, the value of the variable a switches from zero to one or vice
versa.
Please note that performing assignment within the expressions is not straightforward.
471
This problem can be mitigated by surrounding the assignment expression with parentheses:
[me@linuxbox ~]$ ((a<1?(a+=1):(a-=1)))
Next, we see a more complete example of using arithmetic operators in a script that produces a simple table of numbers:
#!/bin/bash
# arith-loop: script to demonstrate arithmetic operators
finished=0
a=0
printf "a\ta**2\ta**3\n"
printf "=\t====\t====\n"
until ((finished)); do
b=$((a**2))
c=$((a**3))
printf "%d\t%d\t%d\n" $a $b $c
((a<10?++a:(finished=1)))
done
In this script, we implement an until loop based on the value of the finished variable.
Initially, the variable is set to zero (arithmetic false) and we continue the loop until it becomes non-zero. Within the loop, we calculate the square and cube of the counter variable
a. At the end of the loop, the value of the counter variable is evaluated. If it is less than
10 (the maximum number of iterations), it is incremented by one, else the variable finished is given the value of one, making finished arithmetically true, thereby terminating the loop. Running the script gives this result:
472
The first line of the script is a comment. bc uses the same syntax for comments as the C
programming language. Comments, which may span multiple lines, begin with /* and
end with */.
473
Using bc
If we save the bc script above as foo.bc, we can run it this way:
[me@linuxbox ~]$ bc foo.bc
bc 1.06.94
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software
Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
4
If we look carefully, we can see the result at the very bottom, after the copyright message.
This message can be suppressed with the -q (quiet) option.
bc can also be used interactively:
[me@linuxbox ~]$ bc -q
2 + 2
4
quit
When using bc interactively, we simply type the calculations we wish to perform, and
the results are immediately displayed. The bc command quit ends the interactive session.
It is also possible to pass a script to bc via standard input:
[me@linuxbox ~]$ bc < foo.bc
4
The ability to take standard input means that we can use here documents, here strings,
and pipes to pass scripts. This is a here string example:
[me@linuxbox ~]$ bc <<< "2+2"
4
474
An Example Script
As a real-world example, we will construct a script that performs a common calculation,
monthly loan payments. In the script below, we use a here document to pass a script to
bc:
#!/bin/bash
# loan-calc : script to calculate monthly loan payments
PROGNAME=$(basename $0)
usage () {
cat <<- EOF
Usage: $PROGNAME PRINCIPAL INTEREST MONTHS
Where:
PRINCIPAL is the amount of the loan.
INTEREST is the APR as a number (7% = 0.07).
MONTHS is the length of the loan's term.
EOF
}
if (($# != 3)); then
usage
exit 1
fi
principal=$1
interest=$2
months=$3
bc <<- EOF
scale = 10
i = $interest / 12
p = $principal
n = $months
a = p * ((i * ((1 + i) ^ n)) / (((1 + i) ^ n) - 1))
print a, "\n"
EOF
475
This example calculates the monthly payment for a $135,000 loan at 7.75% APR for 180
months (15 years). Notice the precision of the answer. This is determined by the value
given to the special scale variable in the bc script. A full description of the bc scripting language is provided by the bc man page. While its mathematical notation is slightly
different from that of the shell (bc more closely resembles C), most of it will be quite familiar, based on what we have learned so far.
Summing Up
In this chapter, we have learned about many of the little things that can be used to get the
real work done in scripts. As our experience with scripting grows, the ability to effectively manipulate strings and numbers will prove extremely valuable. Our loan-calc
script demonstrates that even simple scripts can be created to do some really useful
things.
Extra Credit
While the basic functionality of the loan-calc script is in place, the script is far from
complete. For extra credit, try improving the loan-calc script with the following features:
A command line option to implement an interactive mode that will prompt the
user to input the principal, interest rate, and term of the loan.
Further Reading
The Bash Hackers Wiki has a good discussion of parameter expansion:
http://wiki.bash-hackers.org/syntax/pe
476
Further Reading
as well as a description of the formula for calculating loan payments used in our
loan-calc script:
http://en.wikipedia.org/wiki/Amortization_calculator
477
35 Arrays
35 Arrays
In the last chapter, we looked at how the shell can manipulate strings and numbers. The
data types we have looked at so far are known in computer science circles as scalar variables; that is, variables that contain a single value.
In this chapter, we will look at another kind of data structure called an array, which holds
multiple values. Arrays are a feature of virtually every programming language. The shell
supports them, too, though in a rather limited fashion. Even so, they can be very useful
for solving programming problems.
Creating An Array
Array variables are named just like other bash variables, and are created automatically
when they are accessed. Here is an example:
478
Creating An Array
[me@linuxbox ~]$ a[1]=foo
[me@linuxbox ~]$ echo ${a[1]}
foo
Here we see an example of both the assignment and access of an array element. With the
first command, element 1 of array a is assigned the value foo. The second command
displays the stored value of element 1. The use of braces in the second command is required to prevent the shell from attempting pathname expansion on the name of the array
element.
An array can also be created with the declare command:
[me@linuxbox ~]$ declare -a a
where name is the name of the array and subscript is an integer (or arithmetic expression)
greater than or equal to zero. Note that the first element of an array is subscript zero, not
one. value is a string or integer assigned to the array element.
Multiple values may be assigned using the following syntax:
name=(value1 value2 ...)
where name is the name of the array and value... are values assigned sequentially to elements of the array, starting with element zero. For example, if we wanted to assign abbreviated days of the week to the array days, we could do this:
[me@linuxbox ~]$ days=(Sun Mon Tue Wed Thu Fri Sat)
It is also possible to assign values to a specific element by specifying a subscript for each
value:
[me@linuxbox ~]$ days=([0]=Sun [1]=Mon [2]=Tue [3]=Wed [4]=Thu
479
35 Arrays
[5]=Fri [6]=Sat)
hours .
Files
----11
7
1
7
6
5
4
4
1
0
0
0
Total files = 80
We execute the hours program, specifying the current directory as the target. It produces a table showing, for each hour of the day (0-23), how many files were last modified. The code to produce this is as follows:
#!/bin/bash
# hours : script to count files by modification time
usage () {
echo "usage: $(basename $0) directory" >&2
}
480
The script consists of one function (usage) and a main body with four sections. In the
first section, we check that there is a command line argument and that it is a directory. If
it is not, we display the usage message and exit.
The second section initializes the array hours. It does this by assigning each element a
value of zero. There is no special requirement to prepare arrays prior to use, but our script
needs to ensure that no element is empty. Note the interesting way the loop is constructed. By employing brace expansion ({0..23}), we are able to easily generate a sequence of words for the for command.
The next section gathers the data by running the stat program on each file in the directory. We use cut to extract the two-digit hour from the result. Inside the loop, we need to
remove leading zeros from the hour field, since the shell will try (and ultimately fail) to
interpret values 00 through 09 as octal numbers (see Table 34-1). Next, we increment
the value of the array element corresponding with the hour of the day. Finally, we increment a counter (count) to track the total number of files in the directory.
The last section of the script displays the contents of the array. We first output a couple of
header lines and then enter a loop that produces two columns of output. Lastly, we output
the final tally of files.
481
35 Arrays
Array Operations
There are many common array operations. Such things as deleting arrays, determining
their size, sorting, etc. have many applications in scripting.
We create the array animals and assign it three two-word strings. We then execute four
loops to see the affect of word-splitting on the array contents. The behavior of notations $
{animals[*]} and ${animals[@]} is identical until they are quoted. The * notation results in a single word containing the arrays contents, while the @ notation results
in three words, which matches the arrays real contents.
482
Array Operations
[me@linuxbox ~]$ a[100]=foo
[me@linuxbox ~]$ echo ${#a[@]} # number of array elements
1
[me@linuxbox ~]$ echo ${#a[100]} # length of element 100
3
We create array a and assign the string foo to element 100. Next, we use parameter expansion to examine the length of the array, using the @ notation. Finally, we look at the
length of element 100 which contains the string foo. It is interesting to note that while
we assigned our string to element 100, bash only reports one element in the array. This
differs from the behavior of some other languages in which the unused elements of the array (elements 0-99) would be initialized with empty values and counted.
35 Arrays
[me@linuxbox
[me@linuxbox
a b c
[me@linuxbox
[me@linuxbox
a b c d e f
~]$ foo=(a b c)
~]$ echo ${foo[@]}
~]$ foo+=(d e f)
~]$ echo ${foo[@]}
Sorting An Array
Just as with spreadsheets, it is often necessary to sort the values in a column of data. The
shell has no direct way of doing this, but it's not hard to do with a little coding:
#!/bin/bash
# array-sort : Sort an array
a=(f e d c b a)
echo "Original array: ${a[@]}"
a_sorted=($(for i in "${a[@]}"; do echo $i; done | sort))
echo "Sorted array:
${a_sorted[@]}"
The script operates by copying the contents of the original array (a) into a second array
(a_sorted) with a tricky piece of command substitution. This basic technique can be
used to perform many kinds of operations on the array by changing the design of the
pipeline.
Deleting An Array
To delete an array, use the unset command:
[me@linuxbox ~]$ foo=(a b c d e f)
[me@linuxbox ~]$ echo ${foo[@]}
a b c d e f
484
Array Operations
[me@linuxbox ~]$ unset foo
[me@linuxbox ~]$ echo ${foo[@]}
[me@linuxbox ~]$
~]$ foo=(a b c d e f)
~]$ echo ${foo[@]}
~]$ unset 'foo[2]'
~]$ echo ${foo[@]}
In this example, we delete the third element of the array, subscript 2. Remember, arrays
start with subscript zero, not one! Notice also that the array element must be quoted to
prevent the shell from performing pathname expansion.
Interestingly, the assignment of an empty value to an array does not empty its contents:
[me@linuxbox ~]$ foo=(a b c d e f)
[me@linuxbox ~]$ foo=
[me@linuxbox ~]$ echo ${foo[@]}
b c d e f
Any reference to an array variable without a subscript refers to element zero of the array:
[me@linuxbox
[me@linuxbox
a b c d e f
[me@linuxbox
[me@linuxbox
A b c d e f
~]$ foo=(a b c d e f)
~]$ echo ${foo[@]}
~]$ foo=A
~]$ echo ${foo[@]}
Associative Arrays
Recent versions of bash now support associative arrays. Associative arrays use strings
rather than integers as array indexes. This capability allow interesting new approaches to
managing data. For example, we can create an array called colors and use color names
as indexes:
485
35 Arrays
declare -A colors
colors["red"]="#ff0000"
colors["green"]="#00ff00"
colors["blue"]="#0000ff"
Unlike integer indexed arrays, which are created by merely referencing them, associative
arrays must be created with the declare command using the new -A option. Associative array elements are accessed in much the same way as integer indexed arrays:
echo ${colors["blue"]}
In the next chapter, we will look at a script that makes good use of associative arrays to
produce an interesting report.
Summing Up
If we search the bash man page for the word array, we find many instances of where
bash makes use of array variables. Most of these are rather obscure, but they may provide occasional utility in some special circumstances. In fact, the entire topic of arrays is
rather under-utilized in shell programming owing largely to the fact that the traditional
Unix shell programs (such as sh) lacked any support for arrays. This lack of popularity is
unfortunate because arrays are widely used in other programming languages and provide
a powerful tool for solving many kinds of programming problems.
Arrays and loops have a natural affinity and are often used together. The
for ((expr; expr; expr))
form of loop is particularly well-suited to calculating array subscripts.
Further Reading
A couple of Wikipedia articles about the data structures found in this chapter:
http://en.wikipedia.org/wiki/Scalar_(computing)
http://en.wikipedia.org/wiki/Associative_array
486
36 Exotica
36 Exotica
In this, the final chapter of our journey, we will look at some odds and ends. While we
have certainly covered a lot of ground in the previous chapters, there are many bash features that we have not covered. Most are fairly obscure, and useful mainly to those integrating bash into a Linux distribution. However, there are a few that, while not in common use, are helpful for certain programming problems. We will cover them here.
This is pretty straightforward. Three commands with their output redirected to a file
named output.txt. Using a group command, we could code this as follows:
487
36 Exotica
{ ls -l; echo "Listing of foo.txt"; cat foo.txt; } > output.txt
Using this technique we have saved ourselves some typing, but where a group command
or subshell really shines is with pipelines. When constructing a pipeline of commands, it
is often useful to combine the results of several commands into a single stream. Group
commands and subshells make this easy:
{ ls -l; echo "Listing of foo.txt"; cat foo.txt; } | lpr
Here we have combined the output of our three commands and piped them into the input
of lpr to produce a printed report.
In the script that follows, we will use groups commands and look at several programming
techniques that can be employed in conjunction with associative arrays. This script,
called array-2, when given the name of a directory, prints a listing of the files in the
directory along with the names of the the file's owner and group owner. At the end of
listing, the script prints a tally of the number of files belonging to each owner and group.
Here we see the results (condensed for brevity) when the script is given the directory
/usr/bin:
[me@linuxbox ~]$ array-2 /usr/bin
/usr/bin/2to3-2.6
/usr/bin/2to3
/usr/bin/a2p
/usr/bin/abrowser
/usr/bin/aconnect
/usr/bin/acpi_fakekey
/usr/bin/acpi_listen
/usr/bin/add-apt-repository
.
.
.
/usr/bin/zipgrep
/usr/bin/zipinfo
/usr/bin/zipnote
/usr/bin/zip
488
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
File owners:
daemon
:
1 file(s)
root
: 1394 file(s)
File group owners:
crontab
:
1 file(s)
daemon
:
1 file(s)
lpadmin
:
1 file(s)
mail
:
4 file(s)
mlocate
:
1 file(s)
root
: 1380 file(s)
shadow
:
2 file(s)
ssh
:
1 file(s)
tty
:
2 file(s)
utmp
:
2 file(s)
#!/bin/bash
# array-2: Use arrays to tally file owners
declare -A files file_group file_owner groups owners
if [[ ! -d "$1" ]]; then
echo "Usage: array-2 dir" >&2
exit 1
fi
for i in "$1"/*; do
owner=$(stat -c %U "$i")
group=$(stat -c %G "$i")
files["$i"]="$i"
file_owner["$i"]=$owner
file_group["$i"]=$group
((++owners[$owner]))
((++groups[$group]))
done
# List the collected files
{ for i in "${files[@]}"; do
printf "%-40s %-10s %-10s\n" \
"$i" ${file_owner["$i"]} ${file_group["$i"]}
done } | sort
489
36 Exotica
27
28
29
30
31
32
33
34
35
36
37
38
39
40
echo
# List owners
echo "File owners:"
{ for i in "${!owners[@]}"; do
printf "%-10s: %5d file(s)\n" "$i" ${owners["$i"]}
done } | sort
echo
# List groups
echo "File group owners:"
{ for i in "${!groups[@]}"; do
printf "%-10s: %5d file(s)\n" "$i" ${groups["$i"]}
done } | sort
Process Substitution
While they look similar and can both be used to combine streams for redirection, there is
an important difference between group commands and subshells. Whereas a group command executes all of its commands in the current shell, a subshell (as the name suggests)
executes its commands in a child copy of the current shell. This means that the environment is copied and given to a new instance of the shell. When the subshell exits, the copy
of the environment is lost, so any changes made to the subshells environment (including
variable assignment) is lost as well. Therefore, in most cases, unless a script requires a
subshell, group commands are preferable to subshells. Group commands are both faster
and require less memory.
We saw an example of the subshell environment problem in Chapter 28, when we discovered that a read command in a pipeline does not work as we might intuitively expect. To
recap, if we construct a pipeline like this:
echo "foo" | read
echo $REPLY
The content of the REPLY variable is always empty because the read command is executed in a subshell, and its copy of REPLY is destroyed when the subshell terminates.
Because commands in pipelines are always executed in subshells, any command that assigns variables will encounter this issue. Fortunately, the shell provides an exotic form of
expansion called process substitution that can be used to work around this problem.
Process substitution is expressed in two ways:
For processes that produce standard output:
<(list)
or, for processes that intake standard input:
>(list)
where list is a list of commands.
To solve our problem with read, we can employ process substitution like this:
read < <(echo "foo")
echo $REPLY
491
36 Exotica
Process substitution allows us to treat the output of a subshell as an ordinary file for purposes of redirection. In fact, since it is a form of expansion, we can examine its real
value:
[me@linuxbox ~]$ echo <(echo "foo")
/dev/fd/63
By using echo to view the result of the expansion, we see that the output of the subshell
is being provided by a file named /dev/fd/63.
Process substitution is often used with loops containing read. Here is an example of a
read loop that processes the contents of a directory listing created by a subshell:
#!/bin/bash
# pro-sub : demo of process substitution
while read attr links owner group size date time filename; do
cat <<- EOF
Filename:
$filename
Size:
$size
Owner:
$owner
Group:
$group
Modified:
$date $time
Links:
$links
Attributes: $attr
EOF
done < <(ls -l | tail -n +2)
The loop executes read for each line of a directory listing. The listing itself is produced
on the final line of the script. This line redirects the output of the process substitution into
the standard input of the loop. The tail command is included in the process substitution
pipeline to eliminate the first line of the listing, which is not needed.
When executed, the script produces output like this:
[me@linuxbox ~]$ pro_sub | head -n 20
Filename:
addresses.ldif
Size:
14540
Owner:
me
Group:
me
Modified:
2009-04-02 11:12
492
bin
4096
me
me
2009-07-10 07:31
2
drwxr-xr-x
Filename:
Size:
Owner:
Group:
bookmarks.html
394213
me
me
Traps
In Chapter 10, we saw how programs can respond to signals. We can add this capability
to our scripts, too. While the scripts we have written so far have not needed this capability (because they have very short execution times, and do not create temporary files),
larger and more complicated scripts may benefit from having a signal handling routine.
When we design a large, complicated script, it is important to consider what happens if
the user logs off or shuts down the computer while the script is running. When such an
event occurs, a signal will be sent to all affected processes. In turn, the programs representing those processes can perform actions to ensure a proper and orderly termination of
the program. Lets say, for example, that we wrote a script that created a temporary file
during its execution. In the course of good design, we would have the script delete the file
when the script finishes its work. It would also be smart to have the script delete the file
if a signal is received indicating that the program was going to be terminated prematurely.
bash provides a mechanism for this purpose known as a trap. Traps are implemented
with the appropriately named builtin command, trap. trap uses the following syntax:
trap argument signal [signal...]
where argument is a string which will be read and treated as a command and signal is the
specification of a signal that will trigger the execution of the interpreted command.
Here is a simple example:
#!/bin/bash
# trap-demo : simple signal handling demo
493
36 Exotica
This script defines a trap that will execute an echo command each time either the SIGINT or SIGTERM signal is received while the script is running. Execution of the program looks like this when the user attempts to stop the script by pressing Ctrl-c:
[me@linuxbox ~]$ trap-demo
Iteration 1 of 5
Iteration 2 of 5
I am ignoring you.
Iteration 3 of 5
I am ignoring you.
Iteration 4 of 5
Iteration 5 of 5
As we can see, each time the user attempts to interrupt the program, the message is
printed instead.
Constructing a string to form a useful sequence of commands can be awkward, so it is
common practice to specify a shell function as the command. In this example, a separate
shell function is specified for each signal to be handled:
#!/bin/bash
# trap-demo2 : simple signal handling demo
exit_on_signal_SIGINT () {
echo "Script interrupted." 2>&1
exit 0
}
exit_on_signal_SIGTERM () {
echo "Script terminated." 2>&1
exit 0
}
trap exit_on_signal_SIGINT SIGINT
trap exit_on_signal_SIGTERM SIGTERM
494
Traps
for i in {1..5}; do
echo "Iteration $i of 5"
sleep 5
done
This script features two trap commands, one for each signal. Each trap, in turn, specifies a shell function to be executed when the particular signal is received. Note the inclusion of an exit command in each of the signal-handling functions. Without an exit,
the script would continue after completing the function.
When the user presses Ctrl-c during the execution of this script, the results look like
this:
[me@linuxbox ~]$ trap-demo2
Iteration 1 of 5
Iteration 2 of 5
Script interrupted.
Temporary Files
One reason signal handlers are included in scripts is to remove temporary files
that the script may create to hold intermediate results during execution. There is
something of an art to naming temporary files. Traditionally, programs on Unixlike systems create their temporary files in the /tmp directory, a shared directory
intended for such files. However, since the directory is shared, this poses certain
security concerns, particularly for programs running with superuser privileges.
Aside from the obvious step of setting proper permissions for files exposed to all
users of the system, it is important to give temporary files non-predictable filenames. This avoids an exploit known as a temp race attack. One way to create a
non-predictable (but still descriptive) name is to do something like this:
tempfile=/tmp/$(basename $0).$$.$RANDOM
This will create a filename consisting of the programs name, followed by its
process ID (PID), followed by a random integer. Note, however, that the $RANDOM shell variable only returns a value in the range of 1-32767, which is not a
very large range in computer terms, so a single instance of the variable is not sufficient to overcome a determined attacker.
495
36 Exotica
A better way is to use the mktemp program (not to be confused with the mktemp
standard library function) to both name and create the temporary file. The mktemp program accepts a template as an argument that is used to build the filename. The template should include a series of X characters, which are replaced
by a corresponding number of random letters and numbers. The longer the series
of X characters, the longer the series of random characters. Here is an example:
tempfile=$(mktemp /tmp/foobar.$$.XXXXXXXXXX)
This creates a temporary file and assigns its name to the variable tempfile.
The X characters in the template are replaced with random letters and numbers
so that the final filename (which, in this example, also includes the expanded
value of the special parameter $$ to obtain the PID) might be something like:
/tmp/foobar.6593.UOZuvM6654
For scripts that are executed by regular users, it may be wise to avoid the use of
the /tmp directory and create a directory for temporary files within the users
home directory, with a line of code such as this:
[[ -d $HOME/tmp ]] || mkdir $HOME/tmp
Asynchronous Execution
It is sometimes desirable to perform more than one task at the same time. We have seen
how all modern operating systems are at least multitasking if not multiuser as well.
Scripts can be constructed to behave in a multitasking fashion.
Usually this involves launching a script that, in turn, launches one or more child scripts
that perform an additional task while the parent script continues to run. However, when a
series of scripts runs this way, there can be problems keeping the parent and child coordinated. That is, what if the parent or child is dependent on the other, and one script must
wait for the other to finish its task before finishing its own?
bash has a builtin command to help manage asynchronous execution such as this. The
wait command causes a parent script to pause until a specified process (i.e., the child
script) finishes.
wait
We will demonstrate the wait command first. To do this, we will need two scripts, a parent script:
496
Asynchronous Execution
#!/bin/bash
# async-parent : Asynchronous execution demo (parent)
echo "Parent: starting..."
echo "Parent: launching child script..."
async-child &
pid=$!
echo "Parent: child (PID= $pid) launched."
echo "Parent: continuing..."
sleep 2
echo "Parent: pausing to wait for child to finish..."
wait $pid
echo "Parent: child is finished. Continuing..."
echo "Parent: parent is done. Exiting."
In this example, we see that the child script is very simple. The real action is being performed by the parent. In the parent script, the child script is launched and put into the
background. The process ID of the child script is recorded by assigning the pid variable
with the value of the $! shell parameter, which will always contain the process ID of the
last job put into the background.
The parent script continues and then executes a wait command with the PID of the child
process. This causes the parent script to pause until the child script exits, at which point
the parent script concludes.
When executed, the parent and child scripts produce the following output:
[me@linuxbox ~]$ async-parent
Parent: starting...
497
36 Exotica
Parent: launching child script...
Parent: child (PID= 6741) launched.
Parent: continuing...
Child: child is running...
Parent: pausing to wait for child to finish...
Child: child is done. Exiting.
Parent: child is finished. Continuing...
Parent: parent is done. Exiting.
Named Pipes
In most Unix-like systems, it is possible to create a special type of file called a named
pipe. Named pipes are used to create a connection between two processes and can be
used just like other types of files. They are not that popular, but theyre good to know
about.
There is a common programming architecture called client-server, which can make use of
a communication method such as named pipes, as well as other kinds of interprocess
communication such as network connections.
The most widely used type of client-server system is, of course, a web browser communicating with a web server. The web browser acts as the client, making requests to the
server and the server responds to the browser with web pages.
Named pipes behave like files, but actually form first-in first-out (FIFO) buffers. As with
ordinary (unnamed) pipes, data goes in one end and emerges out the other. With named
pipes, it is possible to set up something like this:
process1 > named_pipe
and
process2 < named_pipe
and it will behave as if:
process1 | process2
498
Named Pipes
Here we use mkfifo to create a named pipe called pipe1. Using ls, we examine the
file and see that the first letter in the attributes field is p, indicating that it is a named
pipe.
After we press the Enter key, the command will appear to hang. This is because there is
nothing receiving data from the other end of the pipe yet. When this occurs, it is said that
the pipe is blocked. This condition will clear once we attach a process to the other end
and it begins to read input from the pipe. Using the second terminal window, we enter
this command:
[me@linuxbox ~]$ cat < pipe1
and the directory listing produced from the first terminal window appears in the second
terminal as the output from the cat command. The ls command in the first terminal
successfully completes once it is no longer blocked.
Summing Up
Well, we have completed our journey. The only thing left to do now is practice, practice,
practice. Even though we covered a lot of ground in our trek, we barely scratched the surface as far as the command line goes. There are still thousands of command line programs left to be discovered and enjoyed. Start digging around in /usr/bin and youll
see!
Further Reading
The Compound Commands section of the bash man page contains a full description of group command and subshell notations.
The EXPANSION section of the bash man page contains a subsection of process
substitution.
499
36 Exotica
The Advanced Bash-Scripting Guide also has a discussion of process substitution:
http://tldp.org/LDP/abs/html/process-sub.html
Linux Journal has two good articles on named pipes. The first, from September
1997:
http://www.linuxjournal.com/article/2156
500
Index
Index
A
a2ps command...................................................333
absolute pathnames................................................9
alias command.............................................50, 126
aliases.....................................................42, 50, 124
American National Standards Institute (see ANSI)
............................................................................160
American Standard Code for Information
Interchange (see ASCII).......................................17
anchors...............................................................247
anonymous FTP servers.....................................200
ANSI..................................................................160
ANSI escape codes....................................160, 164
ANSI.SYS..........................................................160
Apache web server.............................................118
apropos command................................................47
apt-cache command...........................................169
apt-get command.............................................168p.
aptitude command..............................................168
archiving............................................................230
arithmetic expansion..............70, 75, 367, 456, 464
arithmetic expressions..................70, 453, 464, 467
arithmetic operators.....................................70, 465
arithmetic truth tests...................................391, 464
arrays........................................................................
append values to the end..............................483
assigning values............................................479
associative............................................485, 488
creating.........................................................478
deleting.........................................................484
determine number of elements.....................482
finding used subscripts.................................483
index.............................................................478
multidimensional..........................................478
reading variables into...................................400
sorting...........................................................484
subscript.......................................................478
two-dimensional...........................................478
B
back references........................................263, 294p.
backslash escape sequences.................................78
backslash-escaped special characters.................156
backups, incremental..........................................234
basename command...........................................440
bash................................................................2, 124
man page........................................................48
basic regular expressions 254, 262p., 292, 296, 306
bc command.......................................................473
Berkeley Software Distribution.........................331
bg command.......................................................116
binary.............................................93, 97, 341, 465
bit mask................................................................96
bit operators.......................................................469
Bourne, Steve.....................................................2, 6
brace expansion......................................71, 75, 451
branching............................................................381
501
Index
break command..........................................412, 445
broken links..........................................................39
BSD style............................................................111
buffering.............................................................182
bugs............................................................422, 424
build environment..............................................346
bzip2 command..................................................229
C
C programming language...........341, 453, 468, 471
C++....................................................................341
cal command..........................................................4
cancel command.................................................338
carriage return. .18, 77p., 157, 251p., 266, 298, 330
case compound command..................................429
case conversion..................................................462
cat command................................................57, 266
cd command.....................................................9, 11
CD-ROMs...............................................179p., 191
cdrecord command.............................................192
cdrtools...............................................................192
character classes...26p., 248, 250p., 253, 257, 289,
299
character ranges................................27, 249p., 299
chgrp command..................................................103
child process.......................................................108
chmod command..................................92, 105, 356
chown command........................................102, 105
Chrome...............................................................361
chronological sorting.........................................273
cleartext......................................................200, 202
client-server architecture....................................498
COBOL programming language........................341
collation order....................126, 251, 253, 289, 387
ASCII...................................................253, 387
dictionary......................................................251
traditional.....................................................253
comm command.................................................284
command history..............................................3, 83
command line...........................................................
arguments.....................................................436
editing.........................................................3, 79
expansion........................................................67
history.........................................................3, 84
interfaces................................................xvii, 28
command options.................................................14
command substitution............................73, 75, 451
commands................................................................
arguments...............................................14, 436
determining type.............................................43
502
documentation................................................44
executable program files........................42, 341
executing as another user...............................99
long options....................................................14
options............................................................14
comments...........................128, 134, 298, 355, 424
Common Unix Printing System.................329, 339
comparison operators.........................................470
compiler.............................................................341
compiling...........................................................340
completions..........................................................81
compound commands..............................................
case...............................................................429
for.................................................................450
if...................................................................381
until...............................................................413
while.............................................................410
(( ))................................................391, 406, 464
[[ ]]........................................................389, 406
compression algorithms.....................................227
conditional expressions..............................396, 420
configuration files..................................18, 21, 124
configure command...........................................346
constants.............................................................366
continue command.............................................412
control characters.......................................157, 266
control codes................................................77, 251
control operators......................................................
&&........................................................394, 406
||....................................................................394
controlling terminal............................................109
COPYING..........................................................344
copying and pasting.................................................
in vim............................................................145
on the command line......................................80
with X Window System....................................3
coreutils package.........................45, 48p., 279, 303
counting words in a file........................................62
cp command...................................28, 35, 131, 207
CPU.........................................................108p., 340
cron job...............................................................211
crossword puzzles..............................................247
csplit command..................................................304
CUPS..........................................................329, 339
current working directory......................................8
cursor movement..................................................79
cut command..............................................276, 461
D
daemon programs.......................................108, 118
Index
data compression................................................226
data redundancy.................................................226
data validation....................................................389
date command........................................................4
date formats........................................................273
dd command.......................................................190
Debian................................................................166
Debian Style (.deb)............................................167
debugging...................................................377, 424
declare command...............................................463
defensive programming.............................420, 424
delimiters..............................................76, 271, 274
dependencies..............................................168, 349
design..............................................................422p.
device drivers.............................................174, 341
device names......................................................182
device nodes.........................................................20
df command...................................................4, 379
diction................................................................342
dictionary collation order...................................251
diff command.....................................................284
Digital Restrictions Management (DRM)..........168
directories.................................................................
archiving.......................................................230
changing...........................................................9
copying...........................................................28
creating.....................................................28, 34
current working................................................8
deleting.....................................................31, 39
hierarchical.......................................................7
home.................................................21, 90, 379
listing..............................................................13
moving......................................................30, 36
navigating.........................................................7
OLD_PWD variable.....................................126
parent................................................................8
PATH variable..............................................126
PWD variable...............................................127
removing...................................................31, 39
renaming...................................................30, 36
root...................................................................7
shared...........................................................103
sticky bit.........................................................98
synchronizing...............................................238
transferring over a network..........................238
viewing contents...............................................8
disk partitions.....................................................177
DISPLAY variable.............................................126
Dolphin................................................................27
dos2unix command............................................267
double quotes.......................................................75
dpkg command...................................................168
du command...............................................269, 379
Dynamic Host Configuration Protocol (DHCP) 199
E
echo command.....................................67, 125, 362
-e option..........................................................78
-n option.......................................................398
edge and corner cases.........................................423
EDITOR variable...............................................126
effective group ID................................................98
effective user ID...........................................98, 109
elif statement......................................................388
email...................................................................265
embedded systems.............................................341
empty variables..................................................457
encrypted tunnels...............................................206
encryption..........................................................290
end of file.....................................................59, 369
endless loop........................................................413
enscript command..............................................336
environment.........................................99, 124, 404
aliases...........................................................124
establishing...................................................127
examining.....................................................124
login shell.....................................................127
shell functions..............................................124
shell variables...............................................124
startup files...................................................127
subshells.......................................................491
variables.......................................................124
eqn command.....................................................318
executable files...................................................347
executable program files..............................42, 341
executable programs................................................
determining location.......................................43
PATH variable..............................................126
exit command.........................................5, 386, 407
exit status...................................................382, 386
expand command...............................................279
expansions............................................................67
arithmetic..........................70, 75, 367, 456, 464
brace.................................................71, 75, 451
command substitution......................73, 75, 451
delimiters........................................................76
errors resulting from.....................................418
history.......................................................84, 86
parameter..........................72, 75, 365, 371, 456
pathname..........................................68, 75, 451
tilde...........................................................69, 75
503
Index
word-splitting..............................................74p.
expressions...............................................................
arithmetic........................70, 453, 464, 467, 479
conditional............................................396, 420
ext3.....................................................................188
extended regular expressions.............................254
Extensible Markup Language............................265
F
false command...................................................383
fdformat command.............................................190
fdisk command...................................................185
fg command........................................................116
FIFO...................................................................498
file command.......................................................17
file descriptor.......................................................56
file system corruption........................................182
File Transfer Protocol (FTP)..............................199
filenames............................................................221
case sensitive..................................................11
embedded spaces in................................12, 260
extensions.......................................................12
hidden.............................................................11
files...........................................................................
access..............................................................89
archiving...............................................230, 236
attributes.........................................................90
block special...................................................91
block special device.....................................212
changing file mode.........................................92
changing owner and group owner................102
character special.............................................91
character special device................................212
compression..................................................226
configuration..................................18, 124, 264
copying.....................................................28, 34
copying over a network................................199
creating empty................................................55
deb................................................................166
deleting.............................................31, 39, 218
determining contents......................................17
device nodes...................................................20
execution access.............................................90
expressions...................................................384
finding..........................................................209
hidden.............................................................11
iso image...................................................191p.
listing..........................................................8, 13
mode...............................................................91
moving......................................................30, 35
504
owner..............................................................92
permissions.....................................................89
read access......................................................90
regular...........................................................212
removing...................................................31, 39
renaming...................................................30, 35
rpm...............................................................166
shared library..................................................21
startup...........................................................127
sticky bit.........................................................98
symbolic links..............................................212
synchronizing...............................................238
temporary.....................................................495
text..................................................................17
transferring over a network..........199, 235, 238
truncating........................................................55
type.................................................................90
viewing contents.............................................17
write access....................................................90
filters....................................................................61
find command.............................................211, 234
findutils package................................................225
Firefox................................................................361
firewalls..............................................................196
first-in first-out...................................................498
floppy disks........................................176, 183, 189
flow control..............................................................
branching......................................................381
case compound command............................429
elif statement................................................388
endless loop..................................................413
for compound command...............................450
for loop.........................................................450
function statement........................................374
if compound command.................................381
looping..........................................................409
menu-driven.................................................406
multiple-choice decisions.............................429
reading files with while and until loops.......414
terminating a loop.........................................412
traps..............................................................493
until loop......................................................413
while loop.....................................................410
fmt command.....................................................309
focus policy............................................................4
fold command....................................................309
for compound command....................................450
for loop...............................................................450
Foresight............................................................166
Fortran programming language..................341, 453
free command.................................................5, 181
Index
Free Software Foundation............................xix, xxi
fsck command....................................................189
ftp command..............................199, 207, 342, 370
FTP servers.................................................200, 370
FUNCNAME variable.......................................441
function statement..............................................374
G
gcc......................................................................342
gedit command...........................................114, 131
genisoimage command.......................................191
Gentoo................................................................166
getopts command...............................................449
Ghostscript.........................................................329
gid........................................................................89
global variables..................................................376
globbing...............................................................26
GNOME...............................2, 27, 40, 95, 131, 208
gnome-terminal......................................................2
GNU binutils package........................................452
GNU C Compiler...............................................342
GNU coreutils package...............45, 48p., 279, 303
GNU findutils package......................................225
GNU Project..........14, xix, xxi, 225, 303, 342, 344
info command.................................................48
GNU/Linux..................................................xix, xxi
graphical user interfaces....................................xvii
grep command......................................62, 243, 403
groff....................................................................318
group commands................................................487
groups...................................................................89
effective group ID..........................................98
gid...................................................................89
primary group ID............................................89
setgid..............................................................98
GUI................................3, xvii, 27, 40, 79, 95, 127
gunzip command................................................227
gzip command..............................................50, 227
H
hard disks...........................................................176
hard links..................................................24, 33, 37
creating...........................................................37
listing..............................................................38
head command.....................................................63
header files.........................................................345
hello world program...........................................355
help command......................................................44
here documents..................................................369
here strings.........................................................404
hexadecimal.................................................93, 465
hidden files.....................................................11, 69
hierarchical directory structure..............................7
high-level programming languages....................341
history......................................................................
expansion..................................................84, 86
searching.........................................................84
history command..................................................84
home directories...................................................21
root account....................................................22
/etc/passwd.....................................................90
home directory...........................8, 11, 69, 100, 126
HOME variable..................................................126
hostname............................................................157
HTML........................265, 299, 319, 361, 371, 373
Hypertext Markup Language.............................265
I
I/O redirection (see redirection)...........................53
id command..........................................................89
IDE.....................................................................183
if compound command......................129, 418, 429
IFS variable........................................................402
IMCP ECHO_REQUEST..................................196
incremental backups...........................................234
info files...............................................................49
init......................................................................108
init scripts...........................................................108
inodes...................................................................37
INSTALL...........................................................344
installation wizard..............................................167
integers.....................................................................
arithmetic................................................70, 473
division...................................................71, 466
expressions...................................................388
interactivity........................................................397
Internal Field Separator......................................402
interpreted languages.........................................341
interpreted programs..........................................342
interpreter...........................................................341
iso images........................................................191p.
iso9660.......................................................180, 192
J
job control..........................................................115
job numbers........................................................115
jobspec................................................................116
join command.....................................................281
505
Index
Joliet extensions.................................................192
Joy, Bill..............................................................137
K
kate command....................................................131
KDE.....................................2, 27, 40, 95, 131, 208
kedit command...................................................131
kernel...xvi, xixp., 46, 108, 118, 174, 183, 287, 350
key fields............................................................271
kill command......................................................117
killall command.................................................120
killing text............................................................80
Knuth, Donald....................................................318
Konqueror..............................................27, 95, 208
konsole...................................................................2
kwrite command.........................................114, 131
L
LANG variable...................................126, 251, 253
less command.................................17, 60, 238, 261
lftp command.....................................................202
libraries..............................................................341
LibreOffice Writer..............................................xxi
line continuation character.................................359
line editors..........................................................137
line-continuation character.................................298
linker..................................................................341
linking................................................................341
links..........................................................................
broken.............................................................39
creating...........................................................33
hard...........................................................24, 33
symbolic...................................................23, 34
Linux community...............................................166
Linux distributions.............................................166
CentOS.................................................167, 336
Debian...............................................166p., 340
Fedora......................................xix, 89, 167, 336
Foresight.......................................................166
Gentoo..........................................................166
Linspire.........................................................167
Mandriva......................................................167
OpenSUSE............................................xix, 167
packaging systems........................................166
PCLinuxOS..................................................167
Red Hat Enterprise Linux.............................167
Slackware.....................................................166
Ubuntu........................................xix, 166p., 336
Xandros........................................................167
506
M
machine language...............................................340
maintenance...............................358, 362, 364, 372
make command..................................................347
Makefile.............................................................347
man command......................................................45
man pages.....................................................45, 319
markup languages......................................265, 319
memory....................................................................
assigned to each process...............................109
displaying free..................................................5
Resident Set Size..........................................111
segmentation violation..................................119
usage.............................................................111
Index
viewing usage...............................................121
virtual............................................................111
menu-driven programs.......................................406
meta key...............................................................81
meta sequences...................................................246
metacharacters....................................................246
metadata.....................................................167, 169
mkdir command.............................................28, 34
mkfifo command................................................498
mkfs command...........................................188, 190
mkisofs command..............................................192
mktemp command..............................................496
mnemonics.........................................................341
modal editor.......................................................139
monospaced fonts...............................................329
Moolenaar, Bram................................................137
more command.....................................................19
mount command.........................................178, 192
mount points.........................................21, 178, 180
mounting............................................................177
MP3....................................................................104
multi-user systems...............................................88
multiple-choice decisions...................................429
multitasking..........................................88, 108, 496
mv command..................................................30, 35
N
named pipes.......................................................498
nano command...................................................136
Nautilus..................................................27, 95, 208
netstat command................................................198
networking.........................................................195
anonymous FTP servers...............................200
default route..................................................199
Dynamic Host Configuration Protocol (DHCP)
......................................................................199
encrypted tunnels..........................................206
examine network settings and statistics.......198
File Transfer Protocol (FTP)........................199
firewalls........................................................196
FTP servers...................................................200
Local Area Network.....................................199
loopback interface........................................199
man in the middle attacks.............................203
routers...........................................................198
secure communication with remote hosts....203
testing if a host is alive.................................196
tracing the route to a host.............................197
transferring files...........................................238
transporting files...........................................199
O
octal......................................................93, 465, 481
Ogg Vorbis.........................................................104
OLD_PWD variable...........................................126
OpenOffice.org Writer................................18, xxp.
OpenSSH............................................................203
operators...................................................................
arithmetic................................................70, 465
assignment....................................................467
binary............................................................419
comparison...................................................470
ternary...........................................................471
owning files..........................................................89
P
package files.......................................................167
package maintainers...........................................167
package management.........................................166
deb................................................................166
Debian Style (.deb).......................................167
finding packages...........................................169
high-level tools.............................................168
installing packages.......................................169
low-level tools..............................................168
package repositories.....................................167
Red Hat Style (.rpm)....................................167
removing packages.......................................170
RPM.............................................................166
updating packages........................................171
packaging systems.............................................166
page description language..................265, 320, 328
PAGER variable.................................................126
pagers...................................................................19
parameter expansion..............................72, 75, 456
parent directory......................................................8
parent process.....................................................108
passwd command...............................................106
passwords...........................................................106
paste command...................................................280
PATA..................................................................183
507
Index
patch command..................................................287
patches................................................................285
PATH variable............................126, 129, 356, 374
pathname expansion...............................68, 75, 451
pathnames..........................................................260
absolute.............................................................9
completion......................................................81
relative..............................................................9
PDF............................................................321, 331
Perl programming language. 42, 243, 299, 341, 473
permissions........................................................354
PHP programming language..............................341
ping command....................................................196
pipelines...............................................60, 404, 491
in command substitution................................73
portability...........................................346, 380, 394
portable..............................................................380
Portable Document Format........................321, 331
Portable Operating System Interface.................255
positional parameters......................436, 457p., 460
POSIX.....................................192, 251, 254p., 394
character classes.....26p., 250p., 253, 257, 289,
299
PostScript...........................265, 320, 328, 333, 338
pr command...............................................313, 329
primary group ID.................................................89
printable characters............................................251
printenv command.......................................73, 124
printer buffers.....................................................181
printers.......................................................181, 183
buffering output............................................181
control codes................................................327
daisy-wheel...................................................327
device names................................................183
drivers...........................................................329
graphical.......................................................328
impact...........................................................327
laser..............................................................328
printf command..........................................314, 455
printing.....................................................................
determining system status............................336
history of......................................................326
Internet Printing Protocol.............................337
monospaced fonts.........................................327
preparing text................................................329
pretty.............................................................333
print queues..................................................336
proportional fonts.........................................328
queue............................................................337
spooling........................................................336
terminate print jobs.......................................338
508
viewing jobs.................................................337
process ID..........................................................109
process substitution............................................491
processes............................................................108
background...................................................115
child..............................................................108
controlling.....................................................113
foreground....................................................115
interrupting...................................................114
job control.....................................................115
killing............................................................117
nice...............................................................110
parent............................................................108
PID...............................................................109
process ID.....................................................109
SIGINT.........................................................494
signals...........................................................117
SIGTERM....................................................494
sleeping.........................................................110
state...............................................................110
stopping........................................................116
viewing.................................................109, 111
zombie..........................................................110
production use....................................................422
programmable completion...................................83
ps command.......................................................109
PS1 variable...............................................126, 156
PS2 variable.......................................................363
ps2pdf command................................................321
PS4 variable.......................................................426
pseudocode.................................................381, 409
pstree command.................................................121
PuTTY................................................................208
pwd command........................................................8
PWD variable.....................................................127
Python programming language..........................341
Q
quoting.................................................................74
double quotes..................................................75
escape character..............................................77
missing quote................................................417
single quotes...................................................76
R
RAID (Redundant Array of Independent Disks)
............................................................................176
raster image processor........................................329
read command....................398, 408, 414, 422, 491
Index
Readline...............................................................79
README.....................................................49, 344
redirection................................................................
blocked pipe.................................................499
group commands and subshells....................487
here documents.............................................369
here strings...................................................404
standard error..................................................55
standard input.........................................57, 370
standard output...............................................54
redirection operators................................................
&>...................................................................57
&>>................................................................57
<......................................................................59
<(list)............................................................491
<<..............................................................369p.
<<-................................................................370
<<<...............................................................404
>......................................................................54
>(list)............................................................491
>>...................................................................55
|.......................................................................60
regular expressions...............62, 243, 295, 389, 403
anchors.........................................................247
back references..................................263, 294p.
basic...........................254, 262p., 292, 296, 306
extended.......................................................254
relational databases............................................281
relative pathnames.................................................9
release early, release often.................................422
removing duplicate lines in a file.........................61
REPLY variable..........................................398, 491
report generator..................................................361
repositories.........................................................167
return command.........................................375, 386
reusable..............................................................380
RIP.....................................................................329
rlogin command.................................................202
rm command........................................................31
Rock Ridge extensions.......................................192
roff......................................................................318
ROT13 encoding................................................290
RPM...................................................................166
rpm command....................................................169
rsync command..................................................238
rsync remote-update protocol............................238
Ruby programming language.............................341
S
scalar variables...................................................478
Schilling, Jorg...................................................192
scp command.....................................................207
script command....................................................86
scripting languages.......................................42, 341
sdiff command....................................................304
searching a file for patterns..................................62
searching history..................................................84
Secure Shell.......................................................203
sed command.....................................290, 322, 461
set command..............................................124, 425
setgid....................................................................98
setuid............................................................98, 385
Seward, Julian....................................................229
sftp command.....................................................207
shared libraries.............................................21, 168
shebang......................................................355, 360
shell builtins.........................................................42
shell functions..............................42, 124, 374, 440
shell prompts 2, 9, 85, 100, 114, 126, 156, 204, 363
shell scripts.........................................................354
SHELL variable.................................................126
shell variables.....................................................124
shift command............................................439, 444
SIGINT..............................................................494
signals................................................................493
single quotes.........................................................76
Slackware...........................................................166
sleep command...................................................411
soft link................................................................23
sort command...............................................61, 267
sort keys.............................................................271
source code..............................166p., 174, 265, 340
source command........................................135, 357
source tree..........................................................343
special parameters......................................441, 458
split command....................................................304
SSH....................................................................203
ssh command..............................................203, 235
ssh program..........................................................88
Stallman, Richard.........xvi, xix, xxi, 131, 255, 342
standard error..............................................53p., 56
disposing of....................................................57
redirecting to a file.........................................55
standard input.......................................53, 370, 398
redirecting.......................................................57
standard output.....................................................53
appending to a file..........................................55
disposing of....................................................57
redirecting standard error to...........................56
redirecting to a file.........................................54
startup files.........................................................127
509
Index
stat command.....................................................223
sticky bit...............................................................98
storage devices...................................................176
audio CDs.............................................180, 191
CD-ROMs.........................................179p., 191
creating file systems.....................................185
device names................................................182
disk partitions...............................................177
FAT32...........................................................185
floppy disks..........................................183, 189
formatting.....................................................185
LVM (Logical Volume Manager).................179
mount points.........................................178, 180
partitions.......................................................185
reading and writing directly.........................190
repairing file systems...................................189
unmounting...................................................181
USB flash drives...........................................190
stream editor.......................................................290
strings.......................................................................
expressions...................................................387
extract a portion of.......................................459
length of........................................................459
perform search and replace upon.................461
remove leading portion of............................460
remove trailing portion of............................460
${parameter:offset:length}...........................459
${parameter:offset}......................................459
strings command................................................452
stubs...........................................................377, 422
style....................................................................345
su command.........................................................99
subshells.....................................................404, 487
sudo command.............................................99, 101
Sun Microsystems..............................................137
superuser..........................................2, 90, 100, 120
symbolic links..........................................23, 34, 38
creating.....................................................38, 40
listing..............................................................38
symlink.................................................................23
syntax errors.......................................................416
syntax highlighting.....................................354, 359
T
tables..................................................................281
tabular data.................................................271, 317
tail command........................................................63
tape archive........................................................230
tar command.......................................................230
tarballs................................................................343
510
targets.................................................................347
Task Manager.....................................................113
Tatham, Simon...................................................208
tbl command...............................................318, 322
tee command........................................................64
Teletype..............................................................109
telnet command..................................................202
TERM variable...................................................127
terminal emulators.................................................2
terminal sessions......................................................
controlling terminal......................................109
effect of .bashrc............................................357
environment....................................................99
exiting...............................................................5
login shell...............................................99, 127
TERM variable.............................................127
using named pipes........................................499
virtual...............................................................5
with remote systems.......................................88
terminals..............................81, 87p., 160, 318, 327
ternary operator..................................................471
test cases.............................................................423
test command.............................384, 389, 410, 419
test coverage.......................................................423
testing..............................................................422p.
TEX....................................................................318
text........................................................................17
adjusting line length.....................................309
ASCII.............................................................17
carriage return..............................................267
comparing.....................................................283
converting MS-DOS to Unix........................289
counting words...............................................62
cutting...........................................................276
deleting duplicate lines.................................275
deleting multiple blank lines........................267
detecting differences.....................................284
displaying common lines..............................284
displaying control characters........................266
DOS format..................................................267
EDITOR variable.........................................126
editors...................................................130, 264
expanding tabs..............................................279
files.................................................................17
filtering...........................................................61
folding..........................................................309
formatting.....................................................305
formatting for typesetters.............................318
formatting tables...........................................322
joining...........................................................281
linefeed character.........................................267
Index
lowercase to uppercase conversion..............289
numbering lines....................................267, 305
paginating.....................................................313
pasting..........................................................280
preparing for printing...................................329
removing duplicate lines................................61
rendering in PostScript.................................320
ROT13 encoded............................................290
searching for patterns.....................................62
sorting.....................................................61, 267
spell checking...............................................299
substituting...................................................294
substituting tabs for spaces...........................279
tab-delimited.................................................278
transliterating characters..............................288
Unix format..................................................267
viewing with less......................................17, 60
text editors..........................................130, 264, 288
emacs............................................................131
for writing shell scripts.................................354
gedit......................................................131, 354
interactive.....................................................288
kate.......................................................131, 354
kedit..............................................................131
kwrite............................................................131
line................................................................137
nano......................................................131, 136
pico...............................................................131
stream...........................................................290
syntax highlighting...............................354, 359
vi...................................................................131
vim................................................131, 354, 359
visual............................................................137
tilde expansion...............................................69, 75
tload command...................................................121
top command......................................................111
top-down design.................................................372
Torvalds, Linus............................................xvi, xxi
touch command.......................222p., 239, 349, 446
tr command........................................................288
traceroute command...........................................197
tracing................................................................425
transliterating characters....................................288
traps....................................................................493
troff command....................................................318
true command.....................................................383
TTY....................................................................109
type command......................................................43
typesetters..................................................318, 328
TZ variable.........................................................127
U
Ubuntu..................................89, 102, 166, 250, 357
umask command..........................................96, 105
umount command...............................................181
unalias command.................................................51
unary operator expected.....................................419
unary operators...................................................465
unexpand command...........................................279
unexpected token...............................................418
uniq command..............................................61, 275
Unix...................................................................xvii
Unix System V...................................................331
unix2dos command............................................267
unset command..................................................484
until compound command..................................413
until loop............................................................413
unzip command..................................................236
updatedb command............................................211
upstream providers.............................................167
uptime................................................................373
uptime command................................................379
USB flash drives........................................176, 190
Usenet................................................................290
USER variable...........................................125, 127
users.........................................................................
accounts..........................................................89
changing identity............................................99
changing passwords......................................106
effective user ID.....................................98, 109
home directory................................................90
identity............................................................89
password.........................................................90
setting default permissions.............................96
setuid..............................................................98
superuser..................................90, 92, 98p., 107
/etc/passwd.....................................................90
/etc/shadow.....................................................90
V
validating input..................................................404
variables...............................................72, 364, 456
assigning values....................................367, 467
constants.......................................................366
declaring...............................................364, 367
environment..................................................124
global............................................................376
local..............................................................376
names....................................................366, 459
scalar.............................................................478
shell..............................................................124
511
Index
vfat.....................................................................188
vi command........................................................136
vim command.............................................263, 359
virtual consoles......................................................5
Virtual Private Network.....................................206
virtual terminals.....................................................5
visual editors......................................................137
vmstat command................................................121
W
wait command....................................................496
wc command........................................................62
web pages...........................................................265
wget command...................................................202
What You See Is What You Get.........................327
whatis command..................................................47
which command.............................................43, 73
while compound command................................410
wildcards..................................26, 58, 67, 243, 250
wodim command................................................193
word-splitting..................................................74pp.
world....................................................................89
WYSIWYG........................................................327
X
X Window System...................................3, 88, 206
xargs command..................................................220
xload command..................................................121
xlogo command..................................................114
XML...................................................................265
Y
yanking text..........................................................80
yum command....................................................169
Z
zgrep command..................................................263
zip command......................................................236
zless command.....................................................50
--help option.........................................................45
.
./configure..........................................................346
.bash_history........................................................83
512
.bash_login.........................................................127
.bash_profile.......................................................127
.bashrc................................128, 130, 357, 380, 441
.profile................................................................127
.ssh/known_hosts...............................................205
(
(( )) compound command...........................464, 470
[
[ command.........................................................418
/
/............................................................................20
/bin.......................................................................20
/boot.....................................................................20
/boot/grub/grub.conf............................................20
/boot/vmlinuz.......................................................20
/dev.......................................................................20
/dev/cdrom.........................................................183
/dev/dvd..............................................................183
/dev/floppy.........................................................183
/dev/null...............................................................57
/etc........................................................................21
/etc/bash.bashrc..................................................128
/etc/crontab...........................................................21
/etc/fstab...............................................21, 177, 189
/etc/group.............................................................90
/etc/passwd.............................21, 90, 274, 279, 403
/etc/profile..................................................127, 129
/etc/shadow..........................................................90
/etc/sudoers..........................................................99
/lib........................................................................21
/lost+found...........................................................21
/media...................................................................21
/mnt......................................................................21
/opt.......................................................................21
/proc.....................................................................22
/root..............................................................22, 100
/sbin......................................................................22
/tmp..............................................................22, 496
/usr........................................................................22
/usr/bin.................................................................22
/usr/lib..................................................................22
/usr/local...............................................................22
/usr/local/bin........................................22, 350, 358
/usr/local/sbin.....................................................358
/usr/sbin................................................................22
Index
/usr/share..............................................................22
/usr/share/dict.....................................................247
/usr/share/doc.................................................22, 49
/var.......................................................................23
/var/log.................................................................23
/var/log/messages...................................23, 64, 183
/var/log/syslog..............................................64, 183
$
$!................................................................442, 497
$((expression))...................................................464
${!array[@]}......................................................483
${!array[*]}........................................................483
${!prefix@}.......................................................459
${!prefix*}.........................................................459
${#parameter}....................................................459
${parameter,,}....................................................463
${parameter,}.....................................................463
${parameter:-word}...........................................457
${parameter:?word}...........................................458
${parameter:+word}..........................................458
${parameter:=word}..........................................457
${parameter//pattern/string}..............................461
${parameter/#pattern/string}.............................461
${parameter/%pattern/string}............................461
${parameter/pattern/string}...............................461
${parameter##pattern}.......................................460
${parameter#pattern}.........................................460
${parameter%%pattern}....................................460
${parameter%pattern}.......................................460
${parameter^}....................................................463
${parameter^^}..................................................463
$@..............................................................441, 449
$*................................................................441, 449
$#........................................................................437
$0........................................................................441
513