Cse IV Unix and Shell Programming 10cs44 Notes
Cse IV Unix and Shell Programming 10cs44 Notes
m
PART – A
UNIT 1:
o
1. The UNIX Operating System, the UNIX architecture and Command Usage, The File System
6 Hours
s.c
UNIT 2:
UNIT 3:
3.
c
The Shell, The Process, Customizing the environment 7 Hours
vtu
UNIT 4:
PART – B
UNIT 5:
w.
UNIT 6:
6. Essential Shell Programming 6 Hours
ww
UNIT 7:
7. awk – An Advanced Filter 7 Hours
UNIT 8:
8. perl - The Master Manipulator 7 Hours
Dept of CSE,SJBIT
Unix and Shell programming 10CS44
Text Book
1. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
m
Hill, 2006.
o
Reference Books
s.c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
c
vtu
w.
ww
Dept of CSE,SJBIT
Unix and Shell programming 10CS44
Table of Contents
m
2 Unit 2
Basic File Attributes 20-34
3 Unit 3
The Shell, The Process 35-62
o
4 Unit 4
More file attributes 63-77
5 Unit 5
s.c
Filters using regular expressions 78-89
6 Unit 6
Essential Shell Programming 90-124
7 Unit 7
awk – An Advanced Filter 125-146
8 Unit 8 c
perl - The Master Manipulator 147-160
vtu
w.
ww
Dept of CSE,SJBIT
Unix & Shell programming 10CS44
UNIT 1
. The Unix Operating System, The UNIX architecture and Command Usage, The File
m
System
6 Hours
o
Text Book
s.c
McGraw Hill, 2006.
Reference Books c
vtu
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg,
Thomson, 2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
Introduction
This chapter introduces you to the UNIX operating system. We first look at what is an
operating system and then proceed to discuss the different features of UNIX that have
m
made it a popular operating system.
Objectives
What is an operating system (OS)?
o
Features of UNIX OS
A Brief History of UNIX OS, POSIX and Single Unix Specification (SUS)
s.c
1. What is an operating system (OS)?
An operating system (OS) is a resource manager. It takes the form of a set of software
routines that allow users and application programs to access system resources (e.g. the
CPU, memory, disks, modems, printers, network cards etc.) in a safe, efficient and
abstract way. c
For example, an OS ensures safe access to a printer by allowing only one application
vtu
program to send data directly to the printer at any one time. An OS encourages efficient
use of the CPU by suspending programs that are waiting for I/O operations to complete to
make way for programs that can use the CPU more productively. An OS also provides
convenient abstractions (such as files rather than disk locations) which isolate
application programmers and users from the details of the underlying hardware.
User Applications
w.
Processor/Hardware
UNIX Operating system allows complex tasks to be performed with a few keystrokes. It
doesn’t tell or warn the user about the consequences of the command.
Kernighan and Pike (The UNIX Programming Environment) lamented long ago that “as
the UNIX system has spread, the fraction of its users who are skilled in its application has
decreased.” However, the capabilities of UNIX are limited only by your imagination.
2. Features of UNIX OS
Several features of UNIX have made it popular. Some of them are:
Portable
UNIX can be installed on many hardware platforms. Its widespread use can be traced to
the decision to develop it using the C language.
m
Multiuser
The UNIX design allows multiple users to concurrently share hardware and software
o
Multitasking
UNIX allows a user to run more than one program at a time. In fact more than one
program can be running in the background while a user is working foreground.
s.c
Networking
While UNIX was developed to be an interactive, multiuser, multitasking system,
networking is also incorporated into the heart of the operating system. Access to another
system uses a standard communications protocol known as Transmission Control
Protocol/Internet Protocol (TCP/IP).
Device Independence
UNIX treats input/output devices like ordinary files. The source or destination for file
input and output is easily controlled through a UNIX design feature called redirection.
Utilities
w.
UNIX provides a rich library of utilities that can be use to increase user productivity.
Ken Thompson then teamed up with Dennis Ritchie, the author of the first C compiler in
1973. They rewrote the UNIX kernel in C - this was a big step forwards in terms of the
system's portability - and released the Fifth Edition of UNIX to universities in 1974. The
Seventh Edition, released in 1978, marked a split in UNIX development into two main
branches: SYSV (System 5) and BSD (Berkeley Software Distribution). BSD arose from
the University of California at Berkeley where Ken Thompson spent a sabbatical year. Its
development was continued by students at Berkeley and other research institutions.
m
SYSV was developed by AT&T and other commercial companies. UNIX flavors based
on SYSV have traditionally been more conservative, but better supported than BSD-
based flavors.
Until recently, UNIX standards were nearly as numerous as its variants. In early
o
days, AT&T published a document called System V Interface Definition (SVID).
X/OPEN (now The Open Group), a consortium of vendors and users, had one too, in
s.c
the X/Open Portability Guide (XPG). In the US, yet another set of standards, named
Portable Operating System Interface for Computer Environments (POSIX), were
developed at the behest of the Institution of Electrical and Electronics Engineers
(IEEE).
In 1998, X/OPEN and IEEE undertook an ambitious program of unifying the two
standards. In 2001, this joint initiative resulted in a single specification called the
c
Single UNIX Specification, Version 3 (SUSV3), that is also known as IEEE
1003.1:2001 (POSIX.1). In 2002, the International Organization for Standardization
(ISO) approved SUSV3 and IEEE 1003.1:2001.
vtu
Some of the commercial UNIX based on system V are:
IBM's AIX
Hewlett-Packard's HPUX
SCO's Open Server Release 5
Silicon Graphics' IRIS
w.
Conclusion
In this chapter we defined an operating system. We also looked at history of UNIX and
features of UNIX that make it a popular operating system. We also discussed the
convergence of different flavors of UNIX into Single Unix Specification (SUS) and
Portable Operating System Interface for Computing Environments (POSIX).
o m
c s.c
vtu
w.
ww
m
the rich collection of UNIX command set, with a specific discussion of command
structure and usage of UNIX commands. We also look at the man command, used for
obtaining online help on any UNIX command. Sometimes the keyboard sequences don’t
work, in which case, you need to know what to do to fix them. Final topic of this chapter
is troubleshooting some terminal problems.
o
Objectives
s.c
The UNIX Architecture
Locating Commands
Internal and External Commands
Command Structure and usage
Flexibility of Command Usage
The man Pages, apropos and whatis
c
Troubleshooting the terminal problems
vtu
1. The UNIX Architecture
Users
Shell
w.
Kernel
Hardware
ww
System Calls
UNIX architecture comprises of two major components viz., the shell and the kernel. The
kernel interacts with the machine’s hardware and the shell with the user.
The kernel is the core of the operating system. It is a collection of routines written in C. It
is loaded into memory when the system is booted and communicates directly with the
m
hardware. User programs that need to access the hardware use the services of the kernel
via use of system calls and the kernel performs the job on behalf of the user. Kernel is
also responsible for managing system’s memory, schedules processes, decides their
priorities.
o
The shell performs the role of command interpreter. Even though there’s only one kernel
running on the system, there could be several shells in action, one for each user who’s
s.c
logged in. The shell is responsible for interpreting the meaning of metacharacters if any,
found on the command line before dispatching the command to the kernel for execution.
2. Locating Files
All UNIX commands are single words like ls, cd, cat, etc. These names are in lowercase.
These commands are essentially files containing programs, mainly written in C. Files are
stored in directories, and so are the binaries associated with these commands. You can
find the location of an executable program using type command:
w.
$ type ls
ls is /bin/ls
This means that when you execute ls command, the shell locates this file in /bin directory
and makes arrangements to execute it.
ww
The Path
The sequence of directories that the shell searches to look for a command is specified in
its own PATH variable. These directories are colon separated. When you issue a
command, the shell searches this list in the sequence specified to locate and execute it.
4. Command Structure
UNIX commands take the following general form:
verb [options] [arguments]
where verb is the command name that can take a set of optional options and one or more
optional arguments.
m
Commands, options and arguments have to be separated by spaces or tabs to enable the
shell to interpret them as words. A contiguous string of spaces and tabs together is called
a whitespace. The shell compresses multiple occurrences of whitespace into a single
whitespace.
o
Options
An option is preceded by a minus sign (-) to distinguish it from filenames.
s.c
Example: $ ls –l
There must not be any whitespaces between – and l. Options are also arguments, but
given a special name because they are predetermined. Options can be normally compined
with only one – sign. i.e., instead of using
$ ls –l –a –t
we can as well use,
$ ls –lat c
Because UNIX was developed by people who had their own ideas as to what options
should look like, there will be variations in the options. Some commands use + as an
vtu
option prefix instead of -.
Filename Arguments
Many UNIX commands use a filename as argument so that the command can take input
from the file. If a command uses a filename as argument, it will usually be the last
argument, after all options.
w.
Exceptions
Some commands in UNIX like pwd do not take any options and arguments. Some
commands like who may or may not be specified with arguments. The ls command can
run without arguments (ls), with only options (ls –l), with only filenames (ls f1 f2), or
using a combination of both (ls –l f1 f2). Some commands compulsorily take options
(cut). Some commands like grep, sed can take an expression as an argument, or a set of
instructions as argument.
Combining Commands
Instead of executing commands on separate lines, where each command is processed and
executed before the next could be entered, UNIX allows you to specify more than one
command in the single command line. Each command has to be separated from the other
by a ; (semicolon).
m
wc sample.txt ; ls –l sample.txt
You can even group several commands together so that their combined output is
redirected to a file.
(wc sample.txt ; ls –l sample.txt) > newfile
o
When a command line contains a semicolon, the shell understands that the command on
each side of it needs to be processed separately. Here ; is known as a metacharacter.
s.c
Note: When a command overflows into the next line or needs to be split into multiple
lines, just press enter, so that the secondary prompt (normally >) is displayed and you can
enter the remaining part of the command on the next line.
specified command.
A pager is a program that displays one screenful information and pauses for the user to
view the contents. The user can make use of internal commands of the pager to scroll up
and scroll down the information. The two popular pagers are more and less. more is the
Berkeley’s pager, which is a superior alternative to original pg command. less is the
ww
standard pager used on Linux systems. less if modeled after a popular editor called vi and
is more powerful than more as it provides vi-like navigational and search facilities. We
can use pagers with commands like ls | more. The man command is configured to work
with a pager.
When you use man command, it starts searching the manuals starting from section 1. If it
locates a keyword in one section, it won’t continue the search, even if the keyword occurs
in another section. However, we can provide the section number additionally as argument
for man command.
For example, passwd appears in section 1 and section 4. If we want to get documentation
of passwd in section 4, we use,
m
$ man 4 passwd OR $ man –s4 passwd (on Solaris)
o
User Commands wc(1)
NAME
s.c
wc – displays a count of lines, words and characters
in a file
SYNOPSIS
wc [-c | -m | -C] [-lw] [file ...]
DESCRIPTION
The wc utility reads one or more input files and, by
default, writes the c
number of newline characters,
words and bytes contained in each input file to the
standard output. The utility also writes a total count for
vtu
all named files, if more than one input file is
specified.
OPTIONS
The following options are supported:
-c Count bytes.
-m Count characters.
-C same as –m.
w.
-l Count lines.
-w Count words delimited by white spaces or new line
characters ...
OPERANDS
The following operand is supported:
ww
A man page is divided into a number of compulsory and optional sections. Every
command doesn’t need all sections, but the first three (NAME, SYNOPSIS and
DESCRIPTION) are generally seen in all man pages. NAME presents a one-line
introduction of the command. SYNOPSIS shows the syntax used by the command and
DESCRIPTION provides a detailed description.
m
otherwise, the argument is required.
The ellipsis (a set if three dots) implies that there can be more instances of the
preceding word.
The | means that only one of the options shows on either side of the pipe can be
o
used.
All the options used by the command are listed in OPTIONS section. There is a separate
s.c
section named EXIT STATUS which lists possible error conditions and their numeric
representation.
Note: You can use man command to view its own documentation ($ man man). You can
also set the pager to use with man ($ PAGER=less ; export PAGER). To understand
which pager is being used by man, use $ echo $PAGER.
c
The following table shows the organization of man documentation.
vtu
Section Subject (SVR4) Subject (Linux)
1 User programs User programs
2 Kernel’s system calls Kernel’s system calls
3 Library functions Library functions
4 Administrative file formats Special files (in /dev)
5 Miscellaneous Administrative file formats
w.
6 Games Games
7 Special files (in /dev) Macro packages and conventions
8 Administration commands Administration commands
man –k: Searches a summary database and prints one-line description of the command.
Example:
$ man –k awk
awk awk(1) -pattern scanning and processing language
nawk nawk(1) -pattern scanning and processing language
m
9. When Things Go Wrong
Terminals and keyboards have no uniform behavioral pattern. Terminal settings directly
impact the keyboard operation. If you observe a different behavior from that expected,
when you press certain keystrokes, it means that the terminal settings are different. In
such cases, you should know which keys to press to get the required behavior. The
o
following table lists keyboard commands to try when things go wrong.
s.c
Keystroke Function
or
command
[Ctrl-h] Erases text
[Ctrl-c] or Interrupts a command
Delete
[Ctrl-d]
c
Terminates login session or a program that expects its input from
keyboard
vtu
[Ctrl-s] Stops scrolling of screen output and locks keyboard
[Ctrl-q] Resumes scrolling of screen output and unlocks keyboard
[Ctrl-u] Kills command line without executing it
[Ctrl-\] Kills running program but creates a core file containing the memory
image of the program
[Ctrl-z] Suspends process and returns shell prompt; use fg to resume job
[Ctrl-j] Alternative to [Enter]
w.
Conclusion
In this chapter, we looked at the architecture of UNIX and the division of labor between
ww
two agencies viz., the shell and the kernel. We also looked at the structure and usage of
UNIX commands. The man documentation will be the most valuable source of
documentation for UNIX commands. Also, when the keyboard sequences won’t
sometimes work as expected because of different terminal settings. We listed the possible
remedial keyboard sequences when that happens.
m
viz., cd, pwd, mkdir, rmdir and ls will be made. Finally we look at some of the important
directories contained under UNIX file system.
Objectives
Types of files
o
UNIX Filenames
Directories and Files
s.c
Absolute and Relative Pathnames
pwd – print working directory
cd – change directory
mkdir – make a directory
rmdir – remove directory
The PATH environmental variable
c
ls – list directory contents
The UNIX File System
vtu
1. Types of files
A simple description of the UNIX system is this:
“On a UNIX system, everything is a file; if something is not a file, it is a process.”
A UNIX system makes no difference between a file and a directory, since a directory is
just a file containing names of other files. Programs, services, texts, images, and so forth,
are all files. Input and output devices, and generally all devices, are considered to be files,
w.
Most files are just files, called regular files; they contain normal data, for example text
files, executable files or programs, input for or output from a program and so on.
While it is reasonably safe to suppose that everything you encounter on a UNIX system is
ww
m
with the newline character.
A binary file, on the other hand, contains both printable and nonprintable characters that
cover the entire ASCII range. The object code and executables that you produce by
compiling C programs are binary files. Sound and video files are also binary files.
o
Directory File
A directory contains no data, but keeps details of the files and subdirectories that it
s.c
contains. A directory file contains one entry for every file and subdirectory that it houses.
Each entry has two components namely, the filename and a unique identification number
of the file or directory (called the inode number).
When you create or remove a file, the kernel automatically updates its corresponding
directory by adding or removing the entry (filename and inode number) associated with
the file. c
Device File
vtu
All the operations on the devices are performed by reading or writing the file representing
the device. It is advantageous to treat devices as files as some of the commands used to
access an ordinary file can be used with device files as well.
Device filenames are found in a single directory structure, /dev. A device file is not really
a stream of characters. It is the attributes of the file that entirely govern the operation of
the device. The kernel identifies a device from its attributes and uses them to operate the
device.
w.
2. Filenames in UNIX
On a UNIX system, a filename can consist of up to 255 characters. Files may or may not
have extensions and can consist of practically any ASCII character except the / and the
Null character. You are permitted to use control characters or other nonprintable
ww
characters in a filename. However, you should avoid using these characters while naming
a file. It is recommended that only the following characters be used in filenames:
Alphabets and numerals.
The period (.), hyphen (-) and underscore (_).
UNIX imposes no restrictions on the extension. In all cases, it is the application that
imposes that restriction. Eg. A C Compiler expects C program filenames to end with .c,
Oracle requires SQL scripts to have .sql extension.
A file can have as many dots embedded in its name. A filename can also begin with or
end with a dot.
UNIX is case sensitive; cap01, Chap01 and CHAP01 are three different filenames that
can coexist in the same directory.
m
spaces in a filename.
Unix organizes files in a tree-like hierarchical structure, with the root directory, indicated
by a forward slash (/), at the top of the tree. See the Figure below, in which part of the
hierarchy of files and directories on the computer is shown.
o
c s.c
vtu
4. Absolute and relative paths
A path, which is the way you need to follow in the tree structure to reach a given file, can
be described as starting from the trunk of the tree (the / or root directory). In that case, the
path starts with a slash and is called an absolute path, since there can be no mistake: only
one file on the system can comply.
w.
Paths that don't start with a slash are always relative to the current directory. In relative
paths we also use the . and .. indications for the current and the parent directory.
E.g.,
$ echo $HOME
/home/kumar
What you see above is an absolute pathname, which is a sequence of directory names
starting from root (/). The subsequent slashes are used to separate the directories.
/home/frank/src
6. cd - change directory
You can change to a new directory with the cd, change directory, command. cd will
accept both absolute and relative path names.
Syntax
cd [directory]
m
Examples
cd changes to user's home directory
cd / changes directory to the system's root
cd .. goes up one directory level
cd ../.. goes up two directory levels
o
cd /full/path/name/from/root changes directory to absolute path named
(note the leading slash)
s.c
cd path/from/current/location changes directory to path relative to current
location (no leading slash)
Examples
c
vtu
mkdir patch Creates a directory patch under current directory
mkdir patch dbs doc Creates three directories under current directory
mkdir pis pis/progs pis/data Creates a directory tree with pis as a directory under
the current directory and progs and data as
subdirectories under pis
Note the order of specifying arguments in example 3. The parent directory should be
w.
The system may refuse to create a directory due to the following reasons:
1. The directory already exists.
2. There may be an ordinary file by the same name in the current directory.
ww
3. The permissions set for the current directory don’t permit the creation of files and
directories by the user.
E.g.
m
Environmental variables are used to provide information to the programs you use. We
have already seen one such variable called HOME.
A command runs in UNIX by executing a disk file. When you specify a command like
date, the system will locate the associated file from a list of directories specified in the
o
PATH variable and then executes it. The PATH variable normally includes the current
directory also.
s.c
Whenever you enter any UNIX command, you are actually specifying the name of an
executable file located somewhere on the system. The system goes through the following
steps in order to determine which program to execute:
1. Built in commands (such as cd and history) are executed within the shell.
2. If an absolute path name (such as /bin/ls) or a relative path name (such as ./myprog),
the system executes the program from the specified directory.
c
3. Otherwise the PATH variable is used.
vtu
10. ls - list directory contents
The command to list your directories and files is ls. With options it can provide
information about the size, type of file, permissions, dates of file creation, change and
access.
Syntax
ls [options] [argument]
Common Options
w.
When no argument is used, the listing will be of the current directory. There are many
very useful options for the ls command. A listing of many of them follows. When using
the command, string the desired options together preceded by "-".
-a Lists all files, including those beginning with a dot (.).
-d Lists only names of directories, not the files in the directory
-F Indicates type of entry with a trailing symbol: executables with *, directories with / and
ww
The mode field is given by the -l option and consists of 10 characters. The first character
is one of the following:
CHARACTER IF ENTRY IS A
d directory
- plain file
b block-type special file
c character-type special file
l symbolic link
s socket
The next 9 characters are in 3 sets of 3 characters each. They indicate the file access
permissions: the first 3 characters refer to the permissions for the user, the next three for
the users in the Unix group assigned to the file, and the last 3 to the permissions for other
m
users on the system.
Designations are as follows:
r read permission
w write permission
o
x execute permission
- no permission
Examples
s.c
1. To list the files in a directory:
$ ls
2. To list all files in a directory, including the hidden (dot) files:
$ ls -a
3. To get a long listing:
$ ls -al
total 24 c
drwxr-sr-x 5 workshop acs 512 Jun 7 11:12 .
drwxr-xr-x 6 root sys 512 May 29 09:59 ..
vtu
-rwxr-xr-x 1 workshop acs 532 May 20 15:31 .cshrc
-rw------- 1 workshop acs 525 May 20 21:29 .emacs
-rw------- 1 workshop acs 622 May 24 12:13 .history
-rwxr-xr-x 1 workshop acs 238 May 14 09:44 .login
-rw-r--r-- 1 workshop acs 273 May 22 23:53 .plan
-rwxr-xr-x 1 workshop acs 413 May 14 09:36 .profile
-rw------- 1 workshop acs 49 May 20 20:23 .rhosts
drwx------ 3 workshop acs 512 May 24 11:18 demofiles
w.
Directory Content
/bin Common programs, shared by the system, the system administrator and the users.
Contains references to all the CPU peripheral hardware, which are represented as files with
/dev
special properties.
Most important system configuration files are in /etc, this directory contains data similar to
/etc
those in the Control Panel in Windows
/home Home directories of the common users.
/lib Library files, includes files for all kinds of programs needed by the system and the users.
/sbin Programs for use by the system and the system administrator.
Temporary space for use by the system, cleaned upon reboot, so don't use this for saving any
/tmp
work!
/usr Programs, libraries, documentation etc. for all user-related programs.
Storage for all variable files and temporary files created by users, such as log files, the mail
/var queue, the print spooler area, space for temporary storage of files downloaded from the
Internet, or to keep an image of a CD before burning it.
m
Conclusion
In this chapter we looked at the UNIX file system and different types of files UNIX
o
understands. We also discussed different commands that are specific to directory files
viz., pwd, mkdir, cd, rmdir and ls. These commands have no relevance to ordinary or
s.c
device files. We also saw filenaming conventions in UNIX. Difference between the
absolute and relative pathnames was highlighted next. Finally we described some of the
important subdirectories contained under root (/).
c
vtu
w.
ww
UNIT 2
m
Text Book
o
2. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
s.c
Hill, 2006.
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
vtu
2005.
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
m
• Listing of a specific directory
• Ownership and group ownership
• Different file permissions
o
Listing File Attributes
ls command is used to obtain a list of all filenames in the current directory. The
s.c
output in UNIX lingo is often referred to as the listing. Sometimes we combine this
option with other options for displaying other attributes, or ordering the list in a different
sequence. ls look up the file’s inode to fetch its attributes. It lists seven attributes of all
files in the current directory and they are:
The file type and its permissions are associated with each file. Links indicate the
number of file names maintained by the system. This does not mean that there are so
w.
many copies of the file. File is created by the owner. Every user is attached to a group
owner. File size in bytes is displayed. Last modification time is the next field. If you
change only the permissions or ownership of the file, the modification time remains
unchanged. In the last field, it displays the file name.
For example,
ww
$ ls –l
total 72
-rw-r--r-- 1 kumar metal 19514 may 10 13:45 chap01
-rw-r--r-- 1 kumar metal 4174 may 10 15:01 chap02
-rw-rw-rw- 1 kumar metal 84 feb 12 12:30 dept.lst
-rw-r--r-- 1 kumar metal 9156 mar 12 1999 genie.sh
drwxr-xr-x 2 kumar metal 512 may 9 10:31 helpdir
drwxr-xr-x 2 kumar metal 512 may 9 09:57 progs
m
drwxr-xr-x 2 kumar metal 512 may 9 09:57 progs
Directories are easily identified in the listing by the first character of the first
column, which here shows a d. The significance of the attributes of a directory differs a
good deal from an ordinary file. To see the attributes of a directory rather than the files
o
contained in it, use ls –ld with the directory name. Note that simply using ls –d will not
list all subdirectories in the current directory. Strange though it may seem, ls has no
s.c
option to list only directories.
File Ownership
When you create a file, you become its owner. Every owner is attached to a group
owner. Several users may belong to a single group, but the privileges of the group are set
c
by the owner of the file and not by the group members. When the system administrator
creates a user account, he has to assign these parameters to the user:
The user-id (UID) – both its name and numeric representation
vtu
The group-id (GID) – both its name and numeric representation
File Permissions
UNIX follows a three-tiered file protection system that determines a file’s access
rights. It is displayed in the following format:
w.
For Example:
The first group has all three permissions. The file is readable, writable and
executable by the owner of the file. The second group has a hyphen in the middle slot,
which indicates the absence of write permission by the group owner of the file. The third
group has the write and execute bits absent. This set of permissions is applicable to others.
You can set different permissions for the three categories of users – owner, group
and others. It’s important that you understand them because a little learning here can be a
dangerous thing. Faulty file permission is a sure recipe for disaster
m
determined by umask. Let us assume that the file permission for the created file is -rw-r--
r--. Using chmod command, we can change the file permissions and allow the owner to
execute his file. The command can be used in two ways:
o
In an absolute manner by specifying the final permissions
s.c
Relative Permissions
chmod only changes the permissions specified in the command line and leaves the
other permissions unchanged. Its syntax is:
a - all (ugo)
The command assigns (+) execute (x) permission to the user (u), other permissions
remain unchanged.
m
Let initially,
o
Then, it becomes
s.c
-rwx--x--x 1 kumar metal 1906 sep 23:38 xstart
Absolute Permissions
Here, we need not to know the current file permissions. We can set all nine
c
permissions explicitly. A string of three octal digits is used as an expression. The
permission can be represented by one octal digit for each category. For each category, we
add octal digits. If we represent the permissions of each category by one octal digit, this
vtu
is how the permission can be represented:
0 --- no permissions
1 --x execute only
2 -w- write only
3 -wx write and execute
4 r-- read only
ww
We have three categories and three permissions for each category, so three octal
digits can describe a file’s permissions completely. The most significant digit represents
user and the least one represents others. chmod can use this three-digit string as the
expression.
m
chmod 761 xstart
will assign all permissions to the owner, read and write permissions for the group and
only execute permission to the others.
o
777 signify all permissions for all categories, but still we can prevent a file from
s.c
being deleted. 000 signifies absence of all permissions for all categories, but still we can
delete a file. It is the directory permissions that determine whether a file can be deleted or
not. Only owner can change the file permissions. User can not change other user’s file’s
permissions. But the system administrator can do anything.
----------
This is simply useless but still the user can delete this file
w.
-rwxrwxrwx
The UNIX system by default, never allows this situation as you can never have a secure
system. Hence, directory permissions also play a very vital role here
This makes all the files and subdirectories found in the shell_scripts directory, executable
by all users. When you know the shell meta characters well, you will appreciate that the *
doesn’t match filenames beginning with a dot. The dot is generally a safer but note that
both commands change the permissions of directories also.
Directory Permissions
m
It is possible that a file cannot be accessed even though it has read permission,
and can be removed even when it is write protected. The default permissions of a
directory are,
rwxr-xr-x (755)
o
A directory must never be writable by group and others
s.c
Example:
mkdir c_progs
ls –ld c_progs
drwxr-xr-x
c
2 kumar metal 512 may 9 09:57 c_progs
vtu
If a directory has write permission for group and others also, be assured that every
user can remove every file in the directory. As a rule, you must not make directories
universally writable unless you have definite reasons to do so.
Usually, on BSD and AT&T systems, there are two commands meant to change the
w.
ownership of a file or directory. Let kumar be the owner and metal be the group owner. If
sharma copies a file of kumar, then sharma will become its owner and he can manipulate
the attributes
chown
ls -l note
Once ownership of the file has been given away to sharma, the user file
permissions that previously applied to Kumar now apply to sharma. Thus, Kumar can no
m
longer edit note since there is no write privilege for group and others. He can not get back
the ownership either. But he can copy the file to his own directory, in which case he
becomes the owner of the copy.
chgrp
o
This command changes the file’s group owner. No superuser permission is required.
s.c
ls –l dept.lst
-rw-r--r--
c
1 kumar dba 139 jun 8 16:43 dept.lst
vtu
In this chapter we considered two important file attributes – permissions and
ownership. After we complete the first round of discussions related to files, we will take
up the other file attributes.
Source: Sumitabha Das, “UNIX – Concepts and Applications”, 4th edition, Tata
McGraw Hill, 2006
w.
ww
The vi Editor
To write and edit some programs and scripts, we require editors. UNIX provides vi
editor for BSD system – created by Bill Joy. Bram Moolenaar improved vi editor and
called it as vim (vi improved) on Linux OS.
vi Basics
m
To add some text to a file, we invoke,
vi <filename>
o
In all probability, the file doesn’t exist, and vi presents you a full screen with the
filename shown at the bottom with the qualifier. The cursor is positioned at the top and
s.c
all remaining lines of the screen show a ~. They are non-existent lines. The last line is
reserved for commands that you can enter to act on text. This line is also used by the
system to display messages. This is the command mode. This is the mode where you can
pass commands to act on text, using most of the keys of the keyboard. This is the default
mode of the editor where every key pressed is interpreted as a command to run on text.
You will have to be in this mode to copy and delete text
c
For, text editing, vi uses 24 out of 25 lines that are normally available in the
vtu
terminal. To enter text, you must switch to the input mode. First press the key i, and you
are in this mode ready to input text. Subsequent key depressions will then show up on the
screen as text input.
After text entry is complete, the cursor is positioned on the last character of the
last line. This is known as current line and the character where the cursor is stationed is
the current cursor position. This mode is used to handle files and perform substitution.
w.
After the command is run, you are back to the default command mode. If a word has been
misspelled, use ctrl-w to erase the entire word.
Now press esc key to revert to command mode. Press it again and you will hear a
beep. A beep in vi indicates that a key has been pressed unnecessarily. Actually, the text
ww
entered has not been saved on disk but exists in some temporary storage called a buffer.
To save the entered text, you must switch to the execute mode (the last line mode).
Invoke the execute mode from the command mode by entering a: which shows up in the
last line.
ctrl-l
m
Input Mode – Entering and Replacing Text
:set showmode
o
Messages like INSERT MODE, REPLACE MODE, CHANGE MODE, etc will appear in
s.c
the last line.
Pressing ‘i’ changes the mode from command to input mode. To append text to the right
of the cursor position, we use a, text. I and A behave same as i and a, but at line extremes
I inserts text at the beginning of line. A appends text at end of line. o opens a new line
below the current line
•
•
•
c
r<letter> replacing a single character
s<text/word> replacing text with s
R<text/word> replacing text with R
vtu
• Press esc key to switch to command mode after you have keyed in text
COMMAND FUNCTION
i inserts text
a appends text
w.
When you edit a file using vi, the original file is not distributed as such, but only a
copy of it that is placed in a buffer. From time to time, you should save your work by
writing the buffer contents to disk to keep the disk file current. When we talk of saving a
file, we actually mean saving this buffer. You may also need to quit vi after or without
saving the buffer. Some of the save and exit commands of the ex mode is:
Command Action
:W saves file and remains in editing mode
:x saves and quits editing mode
:wq saves and quits editing mode
:w <filename> save as
:w! <filename> save as, but overwrites existing file
:q quits editing mode
m
:q! quits editing mode by rejecting changes made
:sh escapes to UNIX shell
:recover recovers file from a crash
Navigation
o
A command mode command doesn’t show up on screen but simply performs a function.
s.c
To move the cursor in four directions,
k moves cursor up
j moves cursor down
h moves cursor left
l moves cursor right
Word Navigation
c
vtu
Moving by one character is not always enough. You will often need to move faster
along a line. vi understands a word as a navigation unit which can be defined in two ways,
depending on the key pressed. If your cursor is a number of words away from your
desired position, you can use the word-navigation commands to go there directly. There
are three basic commands:
Example,
ww
0 or |
Scrolling
Faster movement can be achieved by scrolling text in the window using the
control keys. The two commands for scrolling a page at a time are
m
ctrl-f scrolls forward
ctrl-b scrolls backward
o
ctrl-d scrolls half page forward
s.c
ctrl-u scrolls half page backward
Absolute Movement
Ctrl-g
c
The editor displays the total number of lines in the last line
Editing Text
The editing facilitates in vi are very elaborate and invoke the use of operators. They use
w.
d delete
y yank (copy)
ww
Deleting Text
Moving Text
p and P place text on right and left only when you delete parts of lines. But the same keys
get associated with “below” and “above” when you delete complete lines
Copying Text
m
yy copies current line
10yy copies current line & 9 lines below
Joining Lines
o
J to join the current line and the line following it
4J joins following 3 lines with current line
s.c
Undoing Last Editing Instructions
c
vim (LINUX) lets you undo and redo multiple editing instructions. u behaves
differently here; repeated use of this key progressively undoes your previous actions. You
could even have the original file in front of you. Further 10u reverses your last 10 editing
vtu
actions. The function of U remains the same.
You may overshoot the desired mark when you keep u pressed, in which case use
ctrl-r to redo your undone actions. Further, undoing with 10u can be completely reversed
with 10ctrl-r. The undoing limit is set by the execute mode command: set undolevels=n,
where n is set to 1000 by default.
w.
The . (dot) command is used for repeating the last instruction in both editing and
command mode commands
For example:
ww
2dd deletes 2 lines from current line and to repeat this operation, type. (dot)
/ search forward
? search backward
/printf
The search begins forward to position the cursor on the first instance of the word
?pattern
Searches backward for the most previous instance of the pattern
m
depends on the search command used. If you used? printf to search in the reverse
direction in the first place, then n also follows the same direction. In that case, N will
repeat the search in the forward direction, and not n.
o
Command Function
s.c
/pat searches forward for pattern pat
?pat searches backward for pattern pat
n repeats search in same direction along which previous search was made
N repeats search in direction opposite to that along which previous search was
made
c
Substitution – search and replace
vtu
We can perform search and replace in execute mode using :s. Its syntax is,
:address/source_pattern/target_pattern/flags
Interactive substitution: sometimes you may like to selectively replace a string. In that
case, add the c parameter as the flag at the end:
ww
:1,$s/director/member/gc
Each line is selected in turn, followed by a sequence of carets in the next line, just below
the pattern that requires substitution. The cursor is positioned at the end of this caret
sequence, waiting for your response.
The ex mode is also used for substitution. Both search and replace operations also
use regular expressions for matching multiple patterns.
The features of vi editor that have been highlighted so far are good enough for a
beginner who should not proceed any further before mastering most of them. There are
many more functions that make vi a very powerful editor. Can you copy three words or
even the entire file using simple keystrokes? Can you copy or move multiple sections of
text from one file to another in a single file switch? How do you compile your C and Java
programs without leaving the editor? vi can do all this.
m
Source: Sumitabha Das, “UNIX – Concepts and Applications”, 4th edition, Tata
McGraw Hill, 2006
o
c s.c
vtu
w.
ww
UNIT 3
m
Text Book
o
3. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
s.c
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
vtu
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
The Shell
Introduction
In this chapter we will look at one of the major component of UNIX architecture – The
Shell. Shell acts as both a command interpreter as well as a programming facility. We
will look at the interpretive nature of the shell in this chapter.
Objectives
m
The Shell and its interpretive cycle
Pattern Matching – The wild-cards
Escaping and Quoting
Redirection – The three standard files
o
Filters – Using both standard input and standard output
/dev/null and /dev/tty – The two special files
s.c
Pipes
tee – Creating a tee
Command Substitution
Shell Variables
c
1. The shell and its interpretive cycle
The shell sits between you and the operating system, acting as a command interpreter. It
vtu
reads your terminal input and translates the commands into actions taken by the system.
The shell is analogous to command.com in DOS. When you log into the system you are
given a default shell. When the shell starts up it reads its startup files and may set
environment variables, command search paths, and command aliases, and executes any
commands specified in these files. The original shell was the Bourne shell, sh. Every
Unix platform will either have the Bourne shell, or a Bourne compatible shell available.
w.
Numerous other shells are available. Some of the more well known of these may be on
your Unix system: the Korn shell, ksh, by David Korn, C shell, csh, by Bill Joy and the
Bourne Again SHell, bash, from the Free Software Foundations GNU project, both based
on sh, the T-C shell, tcsh, and the extended C shell, cshe, both based on csh.
ww
Even though the shell appears not to be doing anything meaningful when there is no
activity at the terminal, it swings into action the moment you key in something.
The following activities are typically performed by the shell in its interpretive cycle:
The shell issues the prompt and waits for you to enter a command.
After a command is entered, the shell scans the command line for metacharacters
and expands abbreviations (like the * in rm *) to recreate a simplified command
line.
It then passes on the command line to the kernel for execution.
The shell waits for the command to complete and normally can’t do any work
while the command is running.
After the command execution is complete, the prompt reappears and the shell
returns to its waiting role to start the next cycle. You are free to enter another
command.
m
will expand it suitably before the command is executed.
The metacharacters that are used to construct the generalized pattern for matching
filenames belong to a category called wild-cards. The following table lists them:
o
Wild-Card Matches
* Any number of characters including none
s.c
? A single character
[ijk] A single character – either an i, j or k
[x-z] A single character that is within the ASCII range of characters x and x
[!ijk] A single character that is not an i,j or k (Not in C shell)
[!x-z] A single character that is not within the ASCII range of the characters x
and x (Not in C Shell)
c
{pat1,pat2…} Pat1, pat2, etc. (Not in Bourne shell)
vtu
Examples:
To list all files that begin with chap, use
$ ls chap*
To list all files whose filenames are six character long and start with chap, use
$ ls chap??
Note: Both * and ? operate with some restrictions. for example, the * doesn’t match all
files beginning with a . (dot) ot the / of a pathname. If you wish to list all hidden
w.
filenames in your directory having at least three characters after the dot, the dot must be
matched explicitly.
$ ls .???*
However, if the filename contains a dot anywhere but at the beginning, it need not be
matched explicitly.
Similarly, these characters don’t match the / in a pathname. So, you cannot use
ww
- To match all filenames with a single-character extension but not the .c ot .o files,
use *.[!co]
- To match all filenames that don’t begin with an alphabetic character,
use [!a-zA-Z]*
m
programs from another directory, we can delimit the patterns with a comma and then put
curly braces around them.
$ cp $HOME/prog_sources/*.{c,java} .
The Bourne shell requires two separate invocations of cp to do this job.
$ cp /home/srm/{project,html,scripts/* .
o
The above command copies all files from three directories (project, html and scripts) to
the current directory.
s.c
3. Escaping and Quoting
Escaping is providing a \ (backslash) before the wild-card to remove (escape) its special
meaning.
For instance, if we have a file whose filename is chap* (Remember a file in UNIX can be
names with virtually any character except the / and null), to remove the file, it is
c
dangerous to give command as rm chap*, as it will remove all files beginning with chap.
Hence to suppress the special meaning of *, use the command rm chap\*
To list the contents of the file chap0[1-3], use
vtu
$ cat chap0\[1-3\]
A filename can contain a whitespace character also. Hence to remove a file named
My Documend.doc, which has a space embedded, a similar reasoning should be
followed:
$ rm My\ Document.doc
Quoting is enclosing the wild-card, or even the entire pattern, within quotes. Anything
w.
within these quotes (barring a few exceptions) are left alone by the shell and not
interpreted.
When a command argument is enclosed in quotes, the meanings of all enclosed special
characters are turned off.
Examples:
ww
Standard error: The file (stream) representing error messages that emanate from the
command or shell, connected to the display.
m
The standard output can represent three possible destinations:
The terminal, the default destination.
A file using the redirection symbols > and >>.
As input to another program using a pipeline.
o
A file is opened by referring to its pathname, but subsequent read and write operations
s.c
identify the file by a unique number called a file descriptor. The kernel maintains a table
of file descriptors for every process running in the system. The first three slots are
generally allocated to the three standard streams as,
0 – Standard input
1 – Standard output
2 – Standard error
Examples:
c
These descriptors are implicitly prefixed to the redirection symbols.
Assuming file2 doesn’t exist, the following command redirects the standard output to file
vtu
myOutput and the standard error to file myError.
$ ls –l file1 file2 1>myOutput 2>myError
To redirect both standard output and standard error to a single file use:
$ ls –l file1 file2 1>| myOutput 2>| myError OR
$ ls –l file1 file2 1> myOutput 2>& 1
w.
standard output.
3. Commands like lp that read standard input but don’t write to standard output.
4. Commands like cat, wc, cmp etc. that use both standard input and standard output.
Commands in the fourth category are called filters. Note that filters can also read directly
from files whose names are provided as arguments.
Example: To perform arithmetic calculations that are specified as expressions in input file
calc.txt and redirect the output to a file result.txt, use
$ bc < calc.txt > result.txt
/dev/tty: This file indicates one’s terminal. In a shell script, if you wish to redirect the
output of some select statements explicitly to the terminal. In such cases you can redirect
m
these explicitly to /dev/tty inside the script.
7. Pipes
With piping, the output of a command can be used as input (piped) to a subsequent
o
command.
$ command1 | command2
Output from command1 is piped into input for command2.
s.c
This is equivalent to, but more efficient than:
$ command1 > temp
$ command2 < temp
$ rm temp
Examples
$ ls -al | more
$ who | sort | lpr
c
vtu
When a command needs to be ignorant of its source
If we wish to find total size of all C programs contained in the working directory, we can
use the command,
$ wc –c *.c
However, it also shows the usage for each file(size of each file). We are not interested in
w.
individual statistics, but a single figure representing the total size. To be able to do that,
we must make wc ignorant of its input source. We can do that by feeding the
concatenated output stream of all the .c files to wc –c as its input:
$ cat *.c | wc –c
8. Creating a tee
ww
tee is an external command that handles a character stream by duplicating its input. It
saves one copy in a file and writes the other to standard output. It is also a filter and
hence can be placed anywhere in a pipeline.
Example: The following command sequence uses tee to display the output of who and
saves this output in a file as well.
$ who | tee users.lst
9. Command substitution
The shell enables the connecting of two commands in yet another way. While a pipe
enables a command to obtain its standard input from the standard output of another
command, the shell enables one or more command arguments to be obtained from the
standard output of another command. This feature is called command substitution.
Example:
$ echo Current date and time is `date`
Observe the use of backquotes around date in the above command. Here the output of the
m
command execution of date is taken as argument of echo. The shell executes the enclosed
command and replaces the enclosed command line with the output of the command.
Similarly the following command displays the total number of files in the working
directory.
o
$ echo “There are `ls | wc –l` files in the current directory”
Observe the use of double quotes around the argument of echo. If you use single quotes,
s.c
the backquote is not interpreted by the shell if enclosed in single quotes.
A variable can be removed with unset and protected from reassignment by readonly.
Both are shell internal commands.
Note: In C shell, we use set statement to set variables. Here, there either has to be
whitespace on both sides of the = or none at all.
ww
$ set count=5
$ set size = 10
$ file=$base$ext
$ echo $file // prints foo.c
Conclusion
In this chapter we saw the major interpretive features of the shell. The following is a
summary of activities that the shell performs when a command line is encountered at the
prompt.
m
Parsing: The shell first breaks up the command line into words using spaces
and tabs as delimiters, unless quoted. All consecutive occurrences of a space
or tab are replaced with a single space.
Variable evaluation: All $-prefixed strings are evaluated as variables, unless
o
quoted or escaped.
Command substitution: Any command surrounded by backquotes is executed
by the shell, which then replaces the standard output of the command into the
s.c
command line.
Redirection: The shell then looks for the characters >, < and >> to open the
files they point to.
Wild-card interpretation: The shell then scans the command line for wild-
cards (the characters *, ?, [ and ]). Any word containing a wild-card is
replaced by a sorted list of filenames that match the pattern. The list of these
c
filenames then forms the arguments to the command.
PATH evaluation: It finally looks for the PATH variable to determine the
vtu
sequence of directories it has to search in order to find the associated binary.
w.
ww
The Process
Introduction
A process is an OS abstraction that enables us to look at files and programs as their time
image. This chapter discusses processes, the mechanism of creating a process, different
states of a process and also the ps command with its different options. A discussion on
creating and controlling background jobs will be made next. We also look at three
commands viz., at, batch and cron for scheduling jobs. This chapter also looks at nice
m
command for specifying job priority, signals and time command for getting execution
time usage statistics of a command.
Objectives
o
Process Basics
ps: Process Status
Mechanism of Process Creation
s.c
Internal and External Commands
Process States and Zombies
Background Jobs
nice: Assigning execution priority
Processes and Signals
job Control c
at and batch: Execute Later
cron command: Running Jobs Periodically
vtu
time: Timing Usage Statistics at process runtime
1. Process Basics
UNIX is a multiuser and multitasking operating system. Multiuser means that several
people can use the computer system simultaneously (unlike a single-user operating
system, such as MS-DOS). Multitasking means that UNIX, like Windows NT, can work
w.
on several tasks concurrently; it can begin work on one task and take up another before
the first task is finished.
When you execute a program on your UNIX system, the system creates a special
environment for that program. This environment contains everything needed for the
system to run the program as if no other program were running on the system. Stated in
other words, a process is created. A process is a program in execution. A process is said
ww
to be born when the program starts execution and remains alive as long as the program is
active. After execution is complete, the process is said to die.
The kernel is responsible for the management of the processes. It determines the time and
priorities that are allocated to processes so that more than one process can share the CPU
resources.
Just as files have attributes, so have processes. These attributes are maintained by the
kernel in a data structure known as process table. Two important attributes of a process
are:
m
2. batch: Typically a series of processes scheduled for execution at a specified point
in time
3. daemon: Typically initiated at boot time to perform operating system functions on
demand, such as LPD, NFS, and DNS
o
The Shell Process
As soon as you log in, a process is set up by the kernel. This process represents the login
s.c
shell, which can be either sh(Bourne Shell), ksh(korn Shell), bash(Bourne Again Shell) or
csh(C Shell).
that waits for the child to terminate. However, the shell can be told not to wait for
the child to terminate.
It may not wait for the child to terminate and may continue to spawn other
processes. init process is an example of such a parent process.
ww
Examples
$ ps
m
PID TTY TIME CMD
4245 pts/7 00:00:00 bash
5314 pts/7 00:00:00 ps
o
The output shows the header specifying the PID, the terminal (TTY), the cumulative
processor time (TIME) that has been consumed since the process was started, and the
s.c
process name (CMD).
$ ps -f
UID PID PPID C STIME TTY TIME COMMAND
root 14931 136 0 08:37:48 ttys0 0:00 rlogind
sartin 14932 14931 0 08:37:50 ttys0 0:00 -sh
sartin 15339 14932 c 7 16:32:29 ttys0
They are spawned during system startup and some of them start when the system goes
into multiuser mode. These processes are known as daemons because they are called
without a specific request from a user. To list them use,
$ ps –e
PID TTY TIME CMD
0 ? 0:34 sched
1 ? 41:55 init
23274 Console 0:03 sh
272 ? 2:47 cron
7015 term/12 20:04 vi
m
PID.
Exec: The forked child overwrites its own image with the code and data of the
new program. This mechanism is called exec, and the child process is said to exec
a new program, using one of the family of exec system calls. The PID and PPID
o
of the exec’d process remain unchanged.
Wait: The parent then executes the wait system call to wait for the child to
complete. It picks up the exit status of the child and continues with its other
s.c
functions. Note that a parent need not decide to wait for the child to terminate.
To get a better idea of this, let us explain with an example. When you enter ls to look at
the contents of your current working directory, UNIX does a series of things to create an
environment for ls and the run it:
The shell has UNIX perform a fork. This creates a new process that the shell will
use to run the ls program.
c
The shell has UNIX perform an exec of the ls program. This replaces the shell
program and data with the program and data for ls and then starts running that
vtu
new program.
The ls program is loaded into the new process context, replacing the text and data
of the shell.
The ls program performs its task, listing the contents of the current directory. In
the meanwhile, the shell executes wait system call for ls to complete.
When a process is forked, the child has a different PID and PPID from its parent.
w.
However, it inherits most of the attributes of the parent. The important attributes that are
inherited are:
User name of the real and effective user (RUID and EUID): the owner of the
process. The real owner is the user issuing the command, the effective user is the
one determining access to system resources. RUID and EUID are usually the
same, and the process has the same access rights the issuing user would have.
ww
Real and effective group owner (RGID and EGID): The real group owner of a
process is the primary group of the user who started the process. The effective
group owner is usually the same, except when SGID access mode has been
applied to a file.
The current directory from where the process was run.
The file descriptors of all files opened by the parent process.
Environment variables like HOME, PATH.
The inheritance here means that the child has its own copy of these parameters and thus
can alter the environment it has inherited. But the modified environment is not available
to the parent process.
When the system moves to multiuser mode, init forks and execs a getty for every
m
active communication port.
Each one of these getty’s prints the login prompt on the respective terminal and then
goes off to sleep.
When a user tries to log in, getty wakes up and fork-execs the login program to verify
login name and password entered.
o
On successful login, login for-execs the process representing the login shell.
init goes off to sleep, waiting for the children to terminate. The processes getty and
s.c
login overlay themselves.
When the user logs out, it is intimated to init, which then wakes up and spawns
another getty for that line to monitor the next login.
child to achieve a change of directory. If this is allowed, after the child dies, control
would revert to the parent and the original directory would be restored. Hence, cd is
implemented as an internal command.
At any instance of time, a process is in a particular state. A process after creation is in the
runnable state. Once it starts running, it is in the running state. When a process requests
for a resource (like disk I/O), it may have to wait. The process is said to be in waiting or
sleeping state. A process can also be suspended by pressing a key (usually Ctrl-z).
When a process terminates, the kernel performs clean-up, assigns any children of the
exiting process to be adopted by init, and sends the death of a child signal to the parent
process, and converts the process into the zombie state.
A process in zombie state is not alive; it does not use any resources nor does any work.
But it is not allowed to die until the exit is acknowledged by the parent process.
It is possible for the parent itself to die before the child dies. In such case, the child
becomes an orphan and the kernel makes init the parent of the orphan. When this
adopted child dies, init waits for its death.
m
or other manual interaction and can run in parallel with other active processes.
Interactive processes are initialized and controlled through a terminal session. In other
words, there has to be someone connected to the system to start these processes; they are
o
not started automatically as part of the system functions. These processes can run in the
foreground, occupying the terminal that started the program, and you can't start other
applications as long as this process is running in the foreground.
s.c
There are two ways of starting a job in the background – with the shell’s & operator and
the nohup command.
commands are run in the background, not just the last command.
3. The shell remains the parent of the background process.
because, when you logout, the shell is killed and hence its children are also killed. The
UNIX system provides nohup statement which when prefixed to a command, permits
execution of the process even after the user has logged out. You must use the & with it as
well.
If you try to run a command with nohup and haven’t redirected the standard error, UNIX
automatically places any error messages in a file named nohup.out in the directory from
which the command was run.
In the following command, the sorted file and any error messages are placed in the file
nohup.out.
$ nohup sort sales.dat &
1252
Sending output to nohup.out
m
Note that the shell has returned the PID (1252) of the process.
When the user logs out, the child turns into an orphan. The kernel handles such situations
by reassigning the PPID of the orphan to the system’s init process (PID 1) - the parent of
all shells. When the user logs out, init takes over the parentage of any process run with
nohup. In this way, you can kill a parent (the shell) without killing its child.
o
Additional Points
s.c
When you run a command in the background, the shell disconnects the standard input
from the keyboard, but does not disconnect its standard output from the screen. So,
output from the command, whenever it occurs, shows up on screen. It can be confusing if
you are entering another command or using another program. Hence, make sure that both
standard output and standard error are redirected suitably.
OR
c
$ find . –name “*.log” –print> log_file 2> err.dat &
$ find . –name “*.log” –print> log_file 2> /dev/null &
vtu
Important:
1. You should relegate time-consuming or low-priority jobs to the background.
2. If you log out while a background job is running, it will be terminated.
queue. How the execution is scheduled depends on the priority assigned to the process.
The idea behind nice is that background jobs should demand less attention from the
ww
A high nice value implies a lower priority. A program with a high nice number is friendly
to other programs, other users and the system; it is not an important job. The lower the
nice number, the more important a job is and the more resources it will take without
sharing them.
Example:
$ nice wc –l hugefile.txt
OR $ nice wc –l hugefile.txt &
m
The default nice value is set to 10.
We can specify the nice value explicitly with –n number option where number is an
offset to the default. If the –n number argument is present, the priority is incremented by
that amount up to a limit of 20.
o
Example: $ nice –n 5 wc –l hugefile.txt &
s.c
8. Killing Processes with Signals
When you execute a command, one thing to keep in mind is that commands do not run in
a vacuum. Many things can happen during a command execution that are not under the
control of the command. The user of the command may press the interrupt key or send a
kill command to the process, or the controlling terminal may become disconnected from
the system. In UNIX, any of these events can cause a signal to be sent to the process. The
c
default action when a process receives a signal is to terminate.
When a process ends normally, the program returns its exit status to the parent. This exit
vtu
status is a number returned by the program providing the results of the program's
execution.
Sometimes, you want or need to terminate a process.
The following are some reasons for stopping a process:
It’s using too much CPU time.
It’s running too long without producing the expected output.
It’s producing too much output to the screen or to a disk file.
w.
If the process to be stopped is a background process, use the kill command to get out of
these situations. To stop a command that isn’t in the background, press <ctrl-c>.
Issuing the kill command sends a signal to a process. The default signal is SIGTERM
signal (15). UNIX programs can send or receive more than 20 signals, each of which is
represented by a number. (Use kill –l to list all signal names and numbers)
If the process ignores the signal SIGTERM, you can kill it with SIGKILL signal (9) as,
$ kill -9 123 OR $ kill –s KILL 123
The system variable $! stores the PID of the last background job. You can kill the last
m
background job without knowing its PID by specifying $ kill $!
Note: You can kill only those processes that you own; You can’t kill processes of
other users. To kill all background jobs, enter kill 0.
9. Job Control
o
A job is a name given to a group of processes that is typically created by piping a series
of commands using pipeline character. You can use job control facilities to manipulate
s.c
jobs. You can use job control facilities to,
1. Relegate a job to the background (bg)
2. Bring it back to the foreground (fg)
3. List the active jobs (jobs)
4. Suspend a foreground job ([Ctrl-z])
5. Kill a job (kill)
c
The following examples demonstrate the different job control facilities.
Assume a process is taking a long time. You can suspend it by pressing [Ctrl-z].
[1] + Suspended wc –l hugefile.txt
vtu
A suspended job is not terminated. You can now relegate it to background by,
$ bg
You can start more jobs in the background any time:
$ sort employee.dat > sortedlist.dat &
[2] 530
$ grep ‘director’ emp.dat &
[3] 540
w.
m
[Ctrl-d]
commands will be executed using /usr/bin/bash
job 1041198880.a at Fri Oct 12 14:23:00 2007
The above job prints all files in the directory /usr/sales/reports and sends a user named
o
boss some mail announcing that the print job was done.
All at jobs go into a queue known as at queue.at shows the job number, the date and time
s.c
of scheduled execution. This job number is derived from the number of seconds elapsed
since the Epoch. A user should remember this job number to control the job.
$ at 1 pm today
at> echo “^G^GLunch with Director at 1 PM^G^G” >
/dev/term/43
c
The above job will display the following message on your screen (/dev/term/43) at 1:00
PM, along with two beeps(^G^G).
vtu
Lunch with Director at 1 PM
To see which jobs you scheduled with at, enter at -l. Working with the preceding
examples, you may see the following results:
job 756603300.a at Tue Sep 11 01:00:00 2007
job 756604200.a at Fri Sep 14 14:23:00 2007
w.
The following forms show some of the keywords and operations permissible with at
command:
at hh:mm Schedules job at the hour (hh) and minute (mm) specified, using a
24-hour clock
at hh:mm month day year Schedules job at the hour (hh), minute (mm), month, day,
ww
To sort a collection of files, print the results, and notify the user named boss that the job
is done, enter the following commands:
$ batch
sort /usr/sales/reports/* | lp
echo “Files printed, Boss!” | mailx -s”Job done” boss
The system returns the following response:
job 7789001234.b at Fri Sep 7 11:43:09 2007
m
The date and time listed are the date and time you pressed <Ctrl-d> to complete the batch
command. When the job is complete, check your mail; anything that the commands
normally display is mailed to you. Note that any job scheduled with batch command goes
into a special at queue.
o
11. cron: Running jobs periodically
cron program is a daemon which is responsible for running repetitive tasks on a regular
s.c
schedule. It is a perfect tool for running system administration tasks such as backup and
system logfile maintenance. It can also be useful for ordinary users to schedule regular
tasks including calendar reminders and report generation.
If there’s nothing to do, cron “goes to sleep” and becomes inactive; it “wakes up” every
minute, however, to see if there are commands to run.
cron looks for instructions to be performed in a control file in
w.
/var/spool/cron/crontabs
After executing them, it goes back to sleep, only to wake up the next minute.
Next use crontab command to place the file in the directory containing crontab
files. crontab will create a file with filename same as user name and places it in
/var/spool/cron/crontabs directory.
You can see the contents of your crontab file with crontab –l and remove them with
crontab –r.
The cron system is managed by the cron daemon. It gets information about which
programs and when they should run from the system's and users' crontab entries. The
crontab files are stored in the file /var/spool/cron/crontabs/<user> where <user> is the
login-id of the user. Only the root user has access to the system crontabs, while each user
should only have access to his own crontabs.
m
where, Time-Field Options are as follows:
Field Range
-----------------------------------------------------------------------------------------------
minute 00 through 59 Number of minutes after the hour
o
hour 00 through 23 (midnight is 00)
day-of-month 01 through 31
month-of-year 01 through 12
s.c
day-of-week 01 through 07 (Monday is 01, Sunday is 07)
-----------------------------------------------------------------------------------------------
The first five fields are time option fields. You must specify all five of these fields. Use
an asterisk (*) in a field if you want to ignore that field.
Examples: c
00-10 17 * 3.6.9.12 5 find / -newer .last_time –print >backuplist
In the above entry, the find command will be executed every minute in the first 10
vtu
minutes after 5 p.m. every Friday of the months March, June, September and December
of every year.
The sum of user time and sys time actually represents the CPU time. This could be
significantly less than the real time on a heavily loaded system.
Conclusion
In this chapter, we saw an important abstraction of the UNIX operating system viz.,
processes. We also saw the mechanism of process creation, the attributes inherited by the
child from the parent process as well as the shell’s behavior when it encounters internal
m
commands, external commands and shell scripts. This chapter also discussed background
jobs, creation and controlling jobs as well as controlling processes using signals. We
finally described three commands viz., at, batch and cron for process scheduling, with a
discussion of time command for obtaining time usage statistics of process execution.
o
c s.c
vtu
w.
ww
Introduction
The UNIX environment can be highly customized by manipulating the settings of the
shell. Commands can be made to change their default behavior, environment variables
can be redefined, the initialization scripts can be altered to obtain a required shell
m
environment. This chapter discusses different ways and approaches for customizing the
environment.
Objectives
o
The Shell
Environment Variables
Common Environment Variables
s.c
Command Aliases (bash and korn)
Command History Facility (bash and korn)
In-Line Command Editing (bash and korn)
Miscellaneous Features (bash and korn)
The Initialization Scripts
The Shell
c
The UNIX shell is both an interpreter as well as a scripting language. An interactive shell
vtu
turns noninteractive when it executes a script.
Bourne Shell – This shell was developed by Steve Bourne. It is the original UNIX shell.
It has strong programming features, but it is a weak interpreter.
C Shell – This shell was developed by Bill Joy. It has improved interpretive features, but
it wasn’t suitable for programming.
Korn Shell – This shell was developed by David Korn. It combines best features of the
bourne and C shells. It has features like aliases, command history. But it lacks some
w.
Environment Variables
We already mentioned a couple of environment variables, such as PATH and HOME.
Until now, we only saw examples in which they serve a certain purpose to the shell. But
there are many other UNIX utilities that need information about you in order to do a good
job.
What other information do programs need apart from paths and home directories? A lot
of programs want to know about the kind of terminal you are using; this information is
stored in the TERM variable. The shell you are using is stored in the SHELL variable,
the operating system type in OS and so on. A list of all variables currently defined for
your session can be viewed entering the env command.
The environment variables are managed by the shell. As opposed to regular shell
variables, environment variables are inherited by any program you start, including
another shell. New processes are assigned a copy of these variables, which they can read,
modify and pass on in turn to their own child processes.
The set statement display all variables available in the current shell, but env command
displays only environment variables. Note than env is an external command and runs in a
child process.
There is nothing special about the environment variable names. The convention is to use
m
uppercase letters for naming one.
o
The following table shows some of the common environment variables.
s.c
HISTSIZE size of the shell history file in number of lines
HOME path to your home directory
HOSTNAME local host name
LOGNAME login name
MAIL location of your incoming mail folder
MANPATH
PATH
c
paths to search for man pages
search paths for commands
vtu
PS1 primary prompt
PS2 secondary prompt
PWD present working directory
SHELL current shell
TERM terminal type
UID user ID
w.
The command search path (PATH): The PATH variable instructs the shell about the
route it should follow to locate any executable command.
Your home directory (HOME): When you log in, UNIX normally places you in a
directory named after your login name. This is called the home directory or login
directory. The home directory for a user is set by the system administrator while creating
users (using useradd command).
mailbox location and checking (MAIL and MAILCHECK): The incoming mails for a
user are generally stored at /var/mail or /var/spool/mail and this location is available in
the environment variable MAIL. MAILCHECK determines how often the shell checks
the file for arrival of new mail.
The prompt strings (PS1, PS2): The prompt that you normally see (the $ prompt) is the
shell’s primary prompt specified by PS1. PS2 specifies the secondary prompt (>). You
can change the prompt by assigning a new value to these environment variables.
Shell used by the commands with shell escapes (SHELL): This environment variable
specifies the login shell as well as the shell that interprets the command if preceded with
a shell escape.
m
Variables used in Bash and Korn
The Bash and korn prompt can do much more than displaying such simple information as
your user name, the name of your machine and some indication about the present
working directory. Some examples are demonstrated next.
o
$ PS1=‘[PWD] ‘
[/home/srm] cd progs
s.c
[/home/srm/progs] _
Bash and Korn also support a history facility that treats a previous command as an event
and associates it with a number. This event number is represented as !.
$ PS1=‘[!] ‘ $ PS1=‘[! $PWD] ‘
[42] _ [42 /home/srm/progs] _
$ PS1=“\h> “
saturn> _
c // Host name of the machine
vtu
Aliases
Bash and korn support the use of aliases that let you assign shorthand names to frequently
used commands. Aliases are defined using the alias command. Here are some typical
aliases that one may like to use:
alias lx='/usr/bin/ls -lt'
w.
You can also use aliasing to redefine an existing command so it is always invoked with
certain options. For example:
alias cp=”cp –i”
ww
Command History
Bash and Korn support a history feature that treats a previous command as an event and
associates it with an event number. Using this number you can recall previous commands,
edit them if required and reexecute them.
The history command displays the history list showing the event number of every
previously executed command. With bash, the complete history list is displayed, while
with korn, the last 16 commands. You can specify a numeric argument to specify the
number of previous commands to display, as in, history 5 (in bash) or history -5 (korn).
By default, bash stores all previous commands in $HOME/.bash_history and korn stores
m
them in $HOME/.sh_history. When a command is entered and executed, it is appended to
the list maintained in the file.
o
demonstrate the use of this symbol with corresponding description.
$ !38 The command with event number 38 is displayed and executed (Use r 38 in korn)
s.c
$ !38:p The command is displayed. You can edit and execute it
$ !! Repeats previous command (Use r in korn)
$ !-2 Executes command prior to the previous one ( r -2 in korn)
$ r cp doc=txt in korn
$ cd $_
Variable HISTFILE determines the filename that saves the history list. Bash uses two
variables HISTSIZE for setting the size of the history list in memory and HISTFILESIZE
for setting the size of disk file. Korn uses HISTSIZE for both the purposes.
m
set –o vi
Command line editing features greatly enhance the value of the history list. You can use
them to correct command line errors and to save time and effort in entering commands by
modifying previous commands. It also makes it much easier to search through your
o
command history list, because you can use the same search commands you use in vi.
s.c
1. Using set –o
The set statement by default displays the variables in the current shell, but in Bash and
Korn, it can make several environment settings with –o option.
File Overwriting(noclobber): The shell’s > symbol overwrites (clobbers) an existing
file, and o prevent such accidental overwriting, use the noclobber argument:
set –o noclobber
c
Now, if you redirect output of a command to an existing file, the shell will respond with a
vtu
message that says it “cannot overwrite existing file” or “file already exists”. To override
this protection, use the | after the > as in,
head –n 5 emp.dat >| file1
Accidental Logging out (ignoreeof): The [Ctrl-d] key combination has the effect of
terminating the standard input as well as logging out of the system. In case you
accidentally pressed [Ctrl-d] twice while terminating the standard input, it will log you
off! The ignoreeof keyword offers protection from accidental logging out:
w.
set –o ignoreeof
But note that you can logout only by using exit command.
A set option is turned off with set +o keyword. To reverse the noclobber feature, use
set +o noclobber
2. Tilde Substitution
ww
The ~ acts as a shorthand representation for the home directory. A configuration file
like .profile that exists in the home directory can be referred to both as $HOME/.profile
and ~/.profile.
You can also toggle between the directory you switched to most recently and your current
directory. This is done with the ~- symbols (or simply -, a hyphen). For example, either
of the following commands change to your previous directory:
cd ~- OR cd –
out. To make them permanent, use certain startup scripts. The startup scripts are executed
when the user logs in. The initialization scripts in different shells are listed below:
.profile (Bourne shell)
.profile and .kshrc (Korn shell)
.bash_profile (or .bash_login) and .bashrc (Bash)
.login and .cshrc (C shell)
m
The Profile
When logging into an interactive login shell, login will do the authentication, set the
environment and start your shell. In the case of bash, the next step is reading the general
profile from /etc, if that file exists. bash then looks for ~/.bash_profile, ~/.bash_login and
~/.profile, in that order, and reads and executes commands from the first one that exists
o
and is readable. If none exists, /etc/bashrc is applied.
s.c
When a login shell exits, bash reads and executes commands from the file,
~/.bash_logout, if it exists.
The profile contains commands that are meant to be executed only once in a session. It
can also be used to customize the operating environment to suit user requirements. Every
time you change the profile file, you should either log out and log in again or You can
The rc File
$ . .profile
c
execute it by using a special command (called dot).
vtu
Normally the profiles are executed only once, upon login. The rc files are designed to be
executed every time a separate shell is created. There is no rc file in Bourne, but bash and
korn use one. This file is defined by an environment variable BASH_ENV in Bash and
ENV in Korn.
export BASH_ENV=$HOME/.bashrc
export ENV=$HOME/.kshrc
Korn automatically executes .kshrc during login if ENV is defined. Bash merely ensures
w.
that a sub-shell executes this file. If the login shell also has to execute this file then a
separate entry must be added in the profile:
. ~/.bashrc
The rc file is used to define command aliases, variable settings, and shell options. Some
sample entries of an rc file are
ww
The rc file will be executed after the profile. However, if the BASH_ENV or ENV
variables are not set, the shell executes only the profile.
Conclusion
In this chapter, we looked at the environment-related features of the shells, and found
weaknesses in the Bourne shell. Knowledge of Bash and Korn only supplements your
knowledge of Bourne and doesn’t take anything away. It is always advisable to use Bash
or korn as your default login shell as it results in a more fruitful experience, with their
rich features in the form of aliases, history features and in-line command editing features.
o m
c s.c
vtu
w.
ww
UNIT 4
4. More file attributes, Simple filters 7 Hours
m
Text Book
o
4. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
s.c
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
vtu
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
Apart from permissions and ownership, a UNIX file has several other attributes,
and in this chapter, we look at most of the remaining ones. A file also has properties
related to its time stamps and links. It is important to know how these attributes are
interpreted when applied to a directory or a device.
m
This chapter also introduces the concepts of file system. It also looks at the inode,
the lookup table that contained almost all file attributes. Though a detailed treatment of
the file systems is taken up later, knowledge of its basics is essential to our understanding
of the significance of some of the file attributes. Basic file attributes has helped us to
o
know about - ls –l to display file attributes (properties), listing of a specific directory,
ownership and group ownership and different file permissions. ls –l provides attributes
like – permissions, links, owner, group owner, size, date and the file name.
s.c
File Systems and inodes
The hard disk is split into distinct partitions, with a separate file system in each
partition. Every file system has a directory structure headed by root.
c
n partitions = n file systems = n separate root directories
vtu
All attributes of a file except its name and contents are available in a table – inode
(index node), accessed by the inode number. The inode contains the following attributes
of a file:
• File type
• File permissions
• Number of links
w.
• An array of pointers that keep track of all disk blocks used by the file
Please note that, neither the name of the file nor the inode number is stored in the inode.
To know inode number of a file:
ls -il tulec05
Where, 9059 is the inode number and no other file can have the same inode number in the
same file system.
Hard Links
The link count is displayed in the second column of the listing. This count is normally 1,
but the following files have two links,
m
All attributes seem to be identical, but the files could still be copies. It’s the link count
that seems to suggest that the files are linked to each other. But this can only be
confirmed by using the –i option to ls.
o
ls -li backup.sh restore.sh
s.c
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 backup.sh
478274 -rwxr-xr-- 2 kumar metal163 jul 13 21:36 restore.sh
A file is linked with the ln command which takes two filenames as arguments (cp
c
command). The command can create both a hard link and a soft link and has syntax
similar to the one used by cp. The following command links emp.lst with employee:
vtu
ln emp.lst employee
The –i option to ls shows that they have the same inode number, meaning that
they are actually one end the same file:
The link count, which is normally one for unlinked files, is shown to be two. You
can increase the number of links by adding the third file name emp.dat as:
ww
You can link multiple files, but then the destination filename must be a directory. A file is
considered to be completely removed from the file system when its link count drops to
zero. ln returns an error when the destination file exists. Use the –f option to force the
removal of the existing link before creation of the new one
It creates link in directory input_files. With this link available, your existing
programs will continue to find foo.txt in the input_files directory. It is more convenient to
do this that modifies all programs to point to the new path. Links provide some protection
m
against accidental deletion, especially when they exist in different directories. Because of
links, we don’t need to maintain two programs as two separate disk files if there is very
little difference between them. A file’s name is available to a C program and to a shell
script. A single file with two links can have its program logic make it behave in two
different ways depending on the name by which it is called.
o
We can’t have two linked filenames in two file systems and we can’t link a
s.c
directory even within the same file system. This can be solved by using symbolic links
(soft links).
Symbolic Links
Unlike the hard linked, a symbolic link doesn’t have the file’s contents, but
c
simply provides the pathname of the file that actually has the contents.
ln -s note note.sym
vtu
ls -li note note.sym
Where, l indicate symbolic link file category. -> indicates note.sym contains the
w.
pathname for the filename note. Size of symbolic link is only 4 bytes; it is the length of
the pathname of note.
It’s important that this time we indeed have two files, and they are not identical.
Removing note.sym won’t affect us much because we can easily recreate the link. But if
ww
we remove note, we would lose the file containing the data. In that case, note.sym would
point to a nonexistent file and become a dangling symbolic link.
Symbolic links can also be used with relative pathnames. Unlike hard links, they
can also span multiple file systems and also link directories. If you have to link all
filenames in a directory to another directory, it makes sense to simply link the directories.
Like other files, a symbolic link has a separate directory entry with its own inode number.
This means that rm can remove a symbolic link even if its points to a directory.
A symbolic link has an inode number separate from the file that it points to. In
most cases, the pathname is stored in the symbolic link and occupies space on disk.
However, Linux uses a fast symbolic link which stores the pathname in the inode itself
provided it doesn’t exceed 60 characters.
The Directory
A directory has its own permissions, owners and links. The significance of the file
attributes change a great deal when applied to a directory. For example, the size of a
m
directory is in no way related to the size of files that exists in the directory, but rather to
the number of files housed by it. The higher the number of files, the larger the directory
size. Permission acquires a different meaning when the term is applied to a directory.
ls -l -d progs
o
drwxr-xr-x 2 kumar metal 320 may 9 09:57 progs
s.c
The default permissions are different from those of ordinary files. The user has all
permissions, and group and others have read and execute permissions only. The
permissions of a directory also impact the security of its files. To understand how that can
happen, we must know what permissions for a directory really mean.
Read permission c
Read permission for a directory means that the list of filenames stored in that
vtu
directory is accessible. Since ls reads the directory to display filenames, if a directory’s
read permission is removed, ls wont work. Consider removing the read permission first
from the directory progs,
ls -ld progs
Write permission
We can’t write to a directory file. Only the kernel can do that. If that were
possible, any user could destroy the integrity of the file system. Write permission for a
directory implies that you are permitted to create or remove files in it. To try that out,
restore the read permission and remove the write permission from the directory before
you try to copy a file to it.
cp emp.lst progs
• The write permission for a directory determines whether we can create or remove
files in it because these actions modify the directory
• Whether we can modify a file depends on whether the file itself has write
m
permission. Changing a file doesn't modify its directory entry
Execute permission
o
can’t be searched for the name of the next directory. That’s why the execute privilege of
a directory is often referred to as the search permission. A directory has to be searched
s.c
for the next directory, so the cd command won’t work if the search permission for the
directory is turned off.
When we create files and directories, the permissions assigned to them depend on
the system’s default setting. The UNIX system has the following default permissions for
all files and directories.
w.
The default is transformed by subtracting the user mask from it to remove one or
more permissions. We can evaluate the current value of the mask by using umask without
arguments,
$ umask
022
This becomes 644 (666-022) for ordinary files and 755 (777-022) for directories umask
000. This indicates, we are not subtracting anything and the default permissions will
remain unchanged. Note that, changing system wide default permission settings is
possible using chmod but not by umask
A UNIX file has three time stamps associated with it. Among them, two are:
• Time of last file modification ls -l
• Time of last access ls –lu
The access time is displayed when ls -l is combined with the -u option. Knowledge of
m
file‘s modification and access times is extremely important for the system administrator.
Many of the tools used by them look at these time stamps to decide whether a particular
file will participate in a backup or not.
o
To set the modification and access times to predefined values, we have,
s.c
touch options expression filename(s)
Then, both times are set to the current time and creates the file, if it doesn’t exist.
c
touch command (without options but with expression) can be used. The expression
consists of MMDDhhmm (month, day, hour and minute).
vtu
touch 03161430 emp.lst ; ls -l emp.lst
ls -lu emp.lst
w.
It is possible to change the two times individually. The –m and –a options change the
modification and access times, respectively:
ww
It recursively examines a directory tree to look for files matching some criteria,
and then takes some action on the selected files. It has a difficult command line, and if
you have ever wondered why UNIX is hated by many, then you should look up the
m
cryptic find documentation. How ever, find is easily tamed if you break up its arguments
into three components:
o
• Recursively examines all files specified in path_list
• It then matches each file for one or more selection-criteria
s.c
• It takes some action on those selected files
The path_list comprises one or more subdirectories separated by white space. There can
also be a host of selection_criteria that you use to match a file, and multiple actions to
dispose of the file. This makes the command difficult to use initially, but it is a program
that every user must master since it lets him make file selection under practically any
condition. c
vtu
Source: Sumitabha Das, “UNIX – Concepts and Applications”, 4th edition, Tata
McGraw Hill, 2006
w.
ww
SIMPLE FILTERS
Filters are the commands which accept data from standard input manipulate it and
write the results to standard output. Filters are the central tools of the UNIX tool kit, and
each filter performs a simple function. Some commands use delimiter, pipe (|) or colon (:).
Many filters work well with delimited fields, and some simply won’t work without them.
m
The piping mechanism allows the standard output of one filter serve as standard input of
another. The filters can read data from standard input when used without a filename as
argument, and from the file otherwise
o
The Simple Database
Several UNIX commands are provided for text editing and shell programming.
s.c
(emp.lst) - each line of this file has six fields separated by five delimiters. The details of
an employee are stored in one single line. This text file designed in fixed format and
containing a personnel database. There are 15 lines, where each field is separated by the
delimiter |.
$ cat emp.lst c
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
vtu
9876 | jai sharma | director | production | 12/03/50 | 7000
5678 | sumit chakrobarty | d.g.m. | marketing | 19/04/43 | 6000
2365 | barun sengupta | director | personnel | 11/05/47 | 7800
5423 | n.k.gupta | chairman | admin | 30/08/56 | 5400
1006 | chanchal singhvi | director | sales | 03/09/38 | 6700
6213 | karuna ganguly | g.m. | accounts | 05/06/62 | 6300
1265 | s.n. dasgupta | manager | sales | 12/09/63 | 5600
w.
pr : paginating files
We know that,
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
pr command adds suitable headers, footers and formatted text. pr adds five lines of
margin at the top and bottom. The header shows the date and time of last modification of
the file along with the filename and page number.
m
pr dept.lst
o
01:accounts:6213
02:progs:5423
s.c
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
…blank lines…
pr options
c
vtu
The different options for pr command are:
-on offsets the lines by n spaces and increases left margin of page
pr +10 chap01
pr -l 54 chap01
The command displays the top of the file. It displays the first 10 lines of the file,
when used without an option.
head emp.lst
m
This command displays the end of the file. It displays the last 10 lines of the file,
when used without an option.
tail emp.lst
o
-n to specify a line count
s.c
tail -n 3 emp.lst
displays the last three lines of the file. We can also address lines from the
beginning of the file instead of the end. The +count option allows to do that, where count
represents the line number from where the selection should begin.
Use tail –f when we are running a program that continuously writes to a file, and we want
w.
to see how the file is growing. We have to terminate this command with the interrupt key.
It is used for slitting the file vertically. head -n 5 emp.lst | tee shortlist will select
ww
the first five lines of emp.lst and saves it to shortlist. We can cut by using -c option with a
list of column numbers, delimited by a comma (cutting columns).
The expression 55- indicates column number 55 to end of line. Similarly, -3 is the same
as 1-3.
Most files don’t contain fixed length lines, so we have to cut fields rather than columns
(cutting fields).
m
will display the second and third columns of shortlist and saves the output in
cutlist1. here | is escaped to prevent it as pipeline character
o
cut –d \ | -f 1,4- shortlist > cutlist2
s.c
paste – pasting files
When we cut with cut, it can be pasted back with the paste command, vertically rather
than horizontally. We can view two files side by side by pasting them. In the previous
topic, cut was used to create the two files cutlist1 and cutlist2 containing two cut-out
portions of the same file. c
paste cutlist1 cutlist2
vtu
We can specify one or more delimiters with -d
Where each field will be separated by the delimiter |. Even though paste uses at least two
files for concatenating lines, the data for one file can be supplied through the standard
w.
input.
Let us consider that the file address book contains the details of three persons
ww
cat addressbook
Sorting is the ordering of data in ascending or descending sequence. The sort command
orders a file and by default, the entire line is sorted
sort shortlist
This default sorting sequence can be altered by using certain options. We can also sort
one or more keys (fileds) or use a different ordering rule.
sort options
m
The important sort options are:
o
-k m.n starts sort on nth column of mth field
-u removes repeated lines
s.c
-n sorts numerically
-r reverses sort order
-f folds lowercase to equivalent uppercase
-m list merges sorted files in list
-c checks if file is sorted
-o flname places output in file flname
we can also specify a character position with in a field to be the beginning of sort
as shown above (sorting on columns).
sort –n numfile
when sort acts on numericals, strange things can happen. When we sort a file
containing only numbers, we get a curious result. This can be overridden by –n (numeric)
option.
m
sort –o sortedlist –k 3 shortlist
o
sort –c shortlist
s.c
sort –t “|” –c –k 2 shortlist
c
When we concatenate or merge files, we will face the problem of duplicate entries
creeping in. we saw how sort removes them with the –u option. UNIX offers a special
tool to handle these lines – the uniq command. Consider a sorted dept.lst that includes
vtu
repeated lines:
cat dept.lst
uniq dept.lst
w.
simply fetches one copy of each line and writes it to the standard output. Since uniq
requires a sorted file as input, the general procedure is to sort a file and pipe its output to
uniq. The following pipeline also produces the same output, except that the output is
saved in a file:
ww
m
The tr filter manipulates the individual characters in a line. It translates characters
using one or two compact expressions.
o
It takes input only from standard input, it doesn’t take a filename as argument. By default,
s.c
it translates each character in expression1 to its mapped counterpart in expression2. The
first character in the first expression is replaced with the first character in the second
expression, and similarly for the other characters.
c
exp1=‘|/’ ; exp2=‘~-’
UNIT 5
5. Filters using regular expressions, 6 Hours
m
Text Book
o
5. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
s.c
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
vtu
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
We often need to search a file for a pattern, either to see the lines containing (or
not containing) it or to have it replaced with something else. This chapter discusses two
important filters that are specially suited for these tasks – grep and sed. grep takes care of
all search requirements we may have. sed goes further and can even manipulate the
m
individual characters in a line. In fact sed can de several things, some of then quite well.
o
It scans the file / input for a pattern and displays lines containing the pattern, the
line numbers or filenames where the pattern occurs. It’s a command from a special family
in UNIX for handling search requirements.
s.c
grep options pattern filename(s)
will display lines containing sales from the file emp.lst. Patterns with and without quotes
c
is possible. It’s generally safe to quote the pattern. Quote is mandatory when pattern
involves more than one word. It returns the prompt in case the pattern can’t be located.
vtu
grep president emp.lst
When grep is used with multiple filenames, it displays the filenames along with the
output.
grep options
grep is one of the most important UNIX commands, and we must know the
ww
options that POSIX requires grep to support. Linux supports all of these options.
m
grep –n ‘marketing’ emp.lst
o
will print filenames prefixed to the line count
s.c
grep –l ‘manager’ *.lst
all the above three patterns are stored in a separate file pattern.lst
It is tedious to specify each pattern separately with the -e option. grep uses an
w.
The basic regular expression character subset uses an elaborate meta character set,
overshadowing the shell’s wild-cards, and can perform amazing matches.
grep supports basic regular expressions (BRE) by default and extended regular
expressions (ERE) with the –E option. A regular expression allows a group of characters
enclosed within a pair of [ ], in which the match is performed for a single character in the
group.
m
grep “[aA]g[ar][ar]wal” emp.lst
A single pattern has matched two similar strings. The pattern [a-zA-Z0-9] matches a
single alphanumeric character. When we use range, make sure that the character on the
o
left of the hyphen has a lower ASCII value than the one on the right. Negating a class (^)
(caret) can be used to negate the character class. When the character class begins with
s.c
this character, all characters other than the ones grouped in the class are matched.
The *
The asterisk refers to the immediately preceding character. * indicates zero or more
occurrences of the previous character.
c
g* nothing or g, gg, ggg, etc.
vtu
grep “[aA]gg*[ar][ar]wal” emp.lst
Notice that we don’t require to use –e option three times to get the same output!!!!!
The dot
A dot matches a single character. The shell uses ? Character to indicate that.
w.
Most of the regular expression characters are used for matching patterns, but there
are two that can match a pattern at the beginning or end of a line. Anchoring a pattern is
often necessary when it can occur in more than one place in a line, and we are interested
in its occurance only at a particular location.
m
grep “^[^2]” emp.lst
o
It is possible that some of these special characters actually exist as part of the text.
s.c
Sometimes, we need to escape these characters. For example, when looking for a pattern
g*, we have to use \
To look for [, we use \[
To look for .*, we use \.\*
# ?include +<stdio.h>
m
sed – The Stream Editor
sed is a multipurpose tool which combines the work of several filters. sed uses
instructions to act on text. An instruction combines an address for selecting lines, with
an action to be taken on them.
o
sed options ‘address action’ file(s)
s.c
sed supports only the BRE set. Address specifies either one line number to select a single
line or a set of two lines, to select a group of contiguous lines. action specifies print,
insert, delete, substitute the text.
Line Addressing
Just similar to head –n 3 emp.lst. Selects first three lines and quits
w.
p prints selected lines as well as all lines. To suppress this behavior, we use –n whenever
we use p command
ww
sed –n ‘1,2p
7,9p
$p’ emp.lst
Selecting multiple groups of lines
m
Using Multiple Instructions (-e and –f)
There is adequate scope of using the –e and –f options whenever sed is used with
multiple instructions.
o
sed –n –e ‘1,2p’ –e ‘7,9p’ –e ‘$p’ emp.lst
s.c
Let us consider,
cat instr.fil
1,2p
7,9p
$p
c
-f option to direct the sed to take its instructions from the file
vtu
sed –n –f instr.fil emp.lst
Context Addressing
m
Sed –n ‘/sa[kx]s*ena/p
/gupta/p’ emp.lst
o
We can also use ^ and $, as part of the regular expression syntax.
s.c
sed –n ‘/50…..$/p’ emp.lst
c
We can use w command to write the selected lines to a separate file.
Line addressing also is possible. Saves first 500 lines in foo1 and the rest in foo2
Text Editing
sed supports inserting (i), appending (a), changing (c) and deleting (d) commands
for the text.
$ sed ‘1i\
> #include <stdio.h>\
> #include <unistd.h>
> ’foo.c > $$
Will add two include lines in the beginning of foo.c file. Sed identifies the line without
the \ as the last line of input. Redirected to $$ temporary file. This technique has to be
followed when using the a and c commands also. To insert a blank line after each line of
the file is printed (double spacing text), we have,
sed ‘a\
m
’ emp.lst
o
sed –n ‘/director/!p’ emp.lst > olist
s.c
Selects all lines except those containing director, and saves them in olist
Substitution (s)
c
Substitution is the most important feature of sed, and this is one job that sed does
exceedingly well.
vtu
[address]s/expression1/expression2/flags
Only the first instance of | in a line has been replaced. We need to use the g
ww
sed also uses regular expressions for patterns to be substituted. To replace all occurrence
of agarwal, aggarwal and agrawal with simply Agarwal, we have,
We can also use ^ and $ with the same meaning. To add 2 prefix to all emp-ids,
m
sed ‘s/^/2/’ emp.lst | head –n 1
o
sed ‘s/$/.00/’ emp.lst | head –n 1
s.c
2233 | a.k.shukla | gm | sales | 12/12/52 | 6000.00
sed ‘s/<I>/<EM>/g
c
s/<B>/<STRONG>/g
s/<U>/<EM>/g’ form.html
vtu
An instruction processes the output of the previous instruction, as sed is a stream editor
and works on data stream
sed ‘s/<I>/<EM>/g
s/<EM>/<STRONG>/g’ form.html
When a ‘g’ is used at the end of a substitution instruction, the change is performed
w.
globally along the line. Without it, only the left most occurrence is replaced. When there
are a group of instructions to execute, you should place these instructions in a file instead
and use sed with the –f option.
2233|a.k.shukla|g.m|sales|12/12/52|6000
9876|jai sharma|director|production|12/03/50|7000
5678|sumit chakrobarty|dgm|mrking|19/04/43|6000
Consider the below three lines which does the same job
The // representing an empty regular expression is interpreted to mean that the search and
m
substituted patterns are the same
o
Three more additional types of expressions are:
s.c
The repeated patterns - &
The interval regular expression (IRE) – { }
The tagged regular expression (TRE) – ( )
c
To make the entire source pattern appear at the destination also
The interval RE - { }
sed and grep uses IRE that uses an integer to specify the number of characters preceding
a pattern. The IRE uses an escaped pair of curly braces and takes three forms:
ww
The value of m and n can't exceed 255. Let teledir.txt maintains landline and mobile
phone numbers. To select only mobile numbers, use IRE to indicate that a numerical can
occur 10 times.
m
The Tagged Regular Expression (TRE)
You have to identify the segments of a line that you wish to extract and enclose each
segment with a matched pair of escaped parenthesis. If we need to extract a number, \([0-
o
9]*\). If we need to extract non alphabetic characters,
s.c
\([^a-zA-Z]*\)
Every grouped pattern automatically acquires the numeric label n, where n signifies the
nth group from the left.
UNIT 6
Text Book
m
6. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
o
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
s.c
Reference Books
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
c
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
vtu
w.
ww
Definition:
Shell is an agency that sits between the user and the UNIX system.
m
Description:
Shell is the one which understands all user directives and carries them out. It processes
the commands issued by the user. The content is based on a type of shell called Bourne
shell.
o
Shell Scripts
s.c
When groups of command have to be executed regularly, they should be stored in a file,
and the file itself executed as a shell script or a shell program by the user. A shell
program runs in interpretive mode. It is not complied with a separate executable file as
with a C program but each statement is loaded into memory when it is to be executed.
Hence shell scripts run slower than the programs written in high-level language. .sh is
c
used as an extension for shell scripts. However the use of extension is not mandatory.
Shell scripts are executed in a separate child shell process which may or may not be same
vtu
as the login shell.
Example: script.sh
#! /bin/sh
# script.sh: Sample Shell Script
w.
The # character indicates the comments in the shell script and all the characters that
follow the # symbol are ignored by the shell. However, this does not apply to the first line
which beings with #. This because, it is an interpreter line which always begins with #!
followed by the pathname of the shell to be used for running the script. In the above
example the first line indicates that we are using a Bourne Shell.
To run the script we need to first make it executable. This is achieved by using the chmod
command as shown below:
$ chmod +x script.sh
Then invoke the script name as:
$ script.sh
Once this is done, we can see the following output :
m
Welcome to Shell Programming
Today’s date: Mon Oct 8 08:02:45 IST 2007
This month’s calendar:
o
October 2007
Su Mo Tu We Th Fr Sa
s.c
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 c
vtu
My Shell: /bin/Sh
As stated above the child shell reads and executes each statement in interpretive mode.
We can also explicitly spawn a child of your choice with the script name as argument:
sh script.sh
w.
Note: Here the script neither requires a executable permission nor an interpreter line.
The read statement is the shell’s internal tool for making scripts interactive (i.e. taking
input from the user). It is used with one or more variables. Inputs supplied with the
standard input are read into these variables. For instance, the use of statement like
read name
causes the script to pause at that point to take input from the keyboard. Whatever is
entered by you will be stored in the variable name.
Example: A shell script that uses read to take a search string and filename from the
terminal.
#! /bin/sh
# emp1.sh: Interactive version, uses read to accept two inputs
#
echo “Enter the pattern to be searched: \c” # No newline
m
read pname
echo “Enter the file to be used: \c” # use echo –e in bash
read fname
o
echo “Searching for pattern $pname from the file $fname”
s.c
grep $pname $fname
echo “Selected records shown above”
Running of the above script by specifying the inputs when the script pauses twice:
$ emp1.sh
c
vtu
Enter the pattern to be searched : director
Enter the file to be used: emp.lst
Searching for pattern director from the file emp.lst
Shell scripts also accept arguments from the command line. Therefore e they can be run
non interactively and be used with redirection and pipelines. The arguments are assigned
to special shell variables. Represented by $1, $2, etc; similar to C command arguments
argv[0], argv[1], etc. The following table lists the different shell parameters.
m
$* Complete set of positional parameters as a single string
“$ @” Each quoted string treated as separate argument
$? Exit status of last command
o
$$ Pid of the current shell
s.c
$! PID of the last background job.
Table: shell parameters
Example 1:
vtu
$ cat foo
Returns nonzero exit status. The shell variable $? Stores this status.
w.
Example 2:
Exit status is used to devise program logic that braches into different paths depending on
success or failure of a command
The shell provides two operators that aloe conditional execution, the && and ||.
Usage:
m
cmd1 && cmd2
cmd1 || cmd2
&& delimits two commands. cmd 2 executed only when cmd1 succeeds.
o
Example1:
s.c
$ grep ‘director’ emp.lst && echo “Pattern found”
Output:
9876 Jai Sharma
c
Director Productions
vtu
2356 Rohit Director Sales
Pattern found
Example 2:
w.
Example 3:
grep “$1” $2 || exit 2
echo “Pattern Found Job Over”
The if Conditional
The if statement makes two way decisions based on the result of a condition. The
following forms of if are available in the shell:
m
Form 1 Form 2 Form 3
o
then then then
execute commands execute commands execute commands
s.c
fi else elif command is successful
execute commands then...
fi else...
c fi
vtu
If the command succeeds, the statements within if are executed or else statements in else
block are executed (if else present).
Example:
#! /bin/sh
w.
else
echo “Pattern Not Found”
fi
Output1:
$ emp3.sh ftp
ftp: *.325:15:FTP User:/Users1/home/ftp:/bin/true
Pattern Found
Output2:
$ emp3.sh mail
Pattern Not Found
While: Looping
m
To carry out a set of instruction repeatedly shell offers three features namely while, until
and for.
o
Syntax:
s.c
while condition is true
do
Commands
done
c
The commands enclosed by do and done are executed repeatedly as long as condition is
true.
vtu
Example:
#! /bin/usr
ans=y
while [“$ans”=”y”]
w.
do
echo “Enter the code and description : \c” > /dev/tty
read code description
echo “$code $description” >>newlist
ww
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
m
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
o
Output:
$ cat newlist
s.c
03 | analgestics
04 | antibiotics
05 | OTC drugs
Test doesn’t display any output but simply returns a value that sets the parameters $?
Numeric Comparison
Operator Meaning
ww
-eq Equal to
-ne Not equal to
-gt Greater than
-ge Greater than or equal to
-lt Less than
-le Less than or equal
Table: Operators
Operators always begin with a – (Hyphen) followed by a two word character word and
enclosed on either side by whitespace.
Numeric comparison in the shell is confined to integer values only, decimal values are
simply truncated.
m
Ex:
$x=5;y=7;z=7.2
o
2. $test $x –lt $y; echo $?
s.c
0 True
The script emp.sh uses test in an if-elif-else-fi construct (Form 3) to evaluate the shell
parameter $#
#!/bin/sh
#emp.sh: using test, $0 and $# in an if-elif-else-fi construct
w.
#
If test $# -eq 0; then
Echo “Usage : $0 pattern file” > /dev/tty
Elfi test $# -eq 2 ;then
Grep “$1” $2 || echo “$1 not found in $2”>/dev/tty
ww
Else
echo “You didn’t enter two arguments” >/dev/tty
fi
It displays the usage when no arguments are input, runs grep if two arguments are entered
and displays an error message otherwise.
Run the script four times and redirect the output every time
$emp31.sh>foo
Usage : emp.sh pattern file
$emp31.sh ftp>foo
You didn’t enter two arguments
$emp31.sh henry /etc/passwd>foo
Henry not found in /etc/passwd
$emp31.sh ftp /etc/passwd>foo
ftp:*:325:15:FTP User:/user1/home/ftp:/bin/true
m
Shorthand for test
[ and ] can be used instead of test. The following two forms are equivalent
Test $x –eq $y
o
and
s.c
[ $x –eq $y ]
String Comparison
c
Test command is also used for testing strings. Test can be used to compare strings with
the following set of comparison operators as listed below.
vtu
Test True if
s1=s2 String s1=s2
s1!=s2 String s1 is not equal to s2
-n stg String stg is not a null string
-z stg String stg is a null string
w.
Example:
#!/bin/sh
#emp1.sh checks user input for null values finally turns emp.sh developed previously
#
if [ $# -eq 0 ] ; then
echo “Enter the string to be searched :\c”
read pname
if [ -z “$pname” ] ; then
echo “You have not entered th e string”; exit 1
fi
echo “Enter the filename to be used :\c”
read flname
if [ ! –n “$flname” ] ; then
echo “ You have not entered the flname” ; exit 2
fi
emp.sh “$pname” “$flname”
m
else
emp.sh $*
fi
o
Output1:
$emp1.sh
s.c
Enter the string to be searched :[Enter]
You have not entered the string
Output2:
$emp1.sh
Enter the string to be searched :root
c
Enter the filename to be searched :/etc/passwd
Root:x:0:1:Super-user:/:/usr/bin/bash
vtu
When we run the script with arguments emp1.sh bypasses all the above activities and
calls emp.sh to perform all validation checks
$emp1.sh jai
You didn’t enter two arguments
9878|jai sharma|director|sales|12/03/56|70000
Because $* treats jai and sharma are separate arguments. And $# makes a wrong
argument count. Solution is replace $* with “$@” (with quote” and then run the script.
File Tests
Test can be used to test various file attributes like its type (file, directory or symbolic
links) or its permission (read, write. Execute, SUID, etc).
m
Example:
$ ls –l emp.lst
o
-rw-rw-rw- 1 kumar group 870 jun 8 15:52 emp.lst
$ [-f emp.lst] ; echo $? Ordinary file
s.c
0
$ [-x emp.lst] ; echo $? Not an executable.
1
$ [! -w emp.lst] || echo “False that file not writeable”
c
False that file is not writable.
vtu
Example: filetest.sh
#! /bin/usr
#
if [! –e $1] : then
w.
Output:
$ filetest.sh emp3.lst
m
Test True if
-f file File exists and is a regular file
o
-r file File exists and readable
-w file File exists and is writable
s.c
-x file File exists and is executable
-d file File exists and is a directory
-s file File exists and has a size greater than zero
-e file File exists (Korn & Bash Only)
-u file
-k file
c
File exists and has SUID bit set
File exists and has sticky bit set
vtu
-L file File exists and is a symbolic link (Korn & Bash Only)
f1 –nt f2 File f1 is newer than f2 (Korn & Bash Only)
f1 –ot f2 File f1 is older than f2 (Korn & Bash Only)
f1 –ef f2 File f1 is linked to f2 (Korn & Bash Only)
Table: file-related Tests with test
w.
The case statement is the second conditional offered by the shell. It doesn’t have a
parallel either in C (Switch is similar) or perl. The statement matches an expression for
more than one alternative, and uses a compact construct to permit multiway branching.
case also handles string tests, but in a more efficient manner than if.
Syntax:
case expression in
Pattern1) commands1 ;;
Pattern2) commands2 ;;
Pattern3) commands3 ;;
…
Esac
Case first matches expression with pattern1. if the match succeeds, then it executes
commands1, which may be one or more commands. If the match fails, then pattern2 is
matched and so forth. Each command list is terminated with a pair of semicolon and the
m
entire construct is closed with esac (reverse of case).
Example:
#! /bin/sh
o
#
echo “ Menu\n
s.c
1. List of files\n2. Processes of user\n3. Today’s Date
4. Users of system\n5.Quit\nEnter your option: \c”
read choice
case “$choice” in
1) ls –l;;
2) ps –f ;;
3) date ;;
4) who ;;
5) exit ;;
c
vtu
*) echo “Invalid option”
esac
Output
$ menu.sh
w.
Menu
1. List of files
2. Processes of user
ww
3. Today’s Date
4. Users of system
5. Quit
Enter your option: 3
Mon Oct 8 08:02:45 IST 2007
Note:
case can not handle relational and file test, but it matches strings with compact
code. It is very effective when the string is fetched by command substitution.
case can also handle numbers but treats them as strings.
m
case can also specify the same action for more than one pattern . For instance to test a
user response for both y and Y (or n and N).
o
Example:
s.c
Echo “Do you wish to continue? [y/n]: \c”
Read ans
Case “$ans” in
Y | y );;
esac
N | n) exit ;;
c
vtu
Wild-Cards: case uses them:
case has a superb string matching feature that uses wild-cards. It uses the filename
matching metacharacters *, ? and character class (to match only strings and not files in
the current directory).
w.
Example:
Case “$ans” in
[Yy] [eE]* );; Matches YES, yes, Yes, yEs, etc
ww
The Broune shell uses expr command to perform computations. This command combines
the following two functions:
Performs arithmetic operations on integers
Manipulates strings
m
Computation:
expr can perform the four basic arithmetic operations (+, -, *, /), as well as modulus (%)
o
functions.
Examples:
s.c
$ x=3 y=5
$ expr 3+5
8
$ expr $x-$y
-2
c
vtu
$ expr 3 \* 5 Note:\ is used to prevent the shell from interpreting * as metacharacter
15
$ expr $y/$x
1
w.
$ expr 13%5
3
Example1:
ww
Example2:
$ x=5
$ x=`expr $x+1`
$ echo $x
6
String Handling:
expr is also used to handle strings. For manipulating strings, expr uses two expressions
separated by a colon (:). The string to be worked upon is closed on the left of the colon
and a regular expression is placed on its right. Depending on the composition of the
expression expr can perform the following three functions:
m
1. Determine the length of the string.
2. Extract the substring.
3. Locate the position of a character in a string.
o
1. Length of the string:
s.c
The regular expression .* is used to print the number of characters
matching the pattern .
Example1:
Example2:
c
vtu
while echo “Enter your name: \c” ;do
read name
if [`expe “$name” :’.*’` -gt 20] ; then
echo “Name is very long”
else
break
w.
fi
done
2. Extracting a substring:
ww
expr can extract a string enclosed by the escape characters \ (and \).
Example:
$ st=2007
$ expr “$st” :’..\(..\)’
07 Extracts last two characters.
expr can return the location of the first occurrence of a character inside a string.
Example:
m
$ stg = abcdefgh ; expr “$stg” : ‘[^d]*d’
4 Extracts the position of character d
o
$0: Calling a Script by Different Names
s.c
There are a number of UNIX commands that can be used to call a file by different names
and doing different things depending on the name by which it is called. $0 can also be to
call a script by different names.
Example:
#! /bin/sh
#
c
vtu
lastfile=`ls –t *.c |head -1`
command=$0
exe=`expr $lastfile: ‘\(.*\).c’`
case $command in
*runc) $exe ;;
*vic) vi $lastfile;;
*comc) cc –o $exe $lastfile &&
w.
ln comc.sh comc
ln comc.sh runc
ln comc.sh vic
Output:
$ comc
hello.c compiled successfully.
While: Looping
To carry out a set of instruction repeatedly shell offers three features namely while, until
and for.
m
Synatx:
o
do
Commands
s.c
done
The commands enclosed by do and done are executed repadetedly as long as condition is
true.
Example:
#! /bin/usr
c
vtu
ans=y
while [“$ans”=”y”]
do
echo “Enter the code and description : \c” > /dev/tty
read code description
w.
Y* | y* ) answer =y;;
N* | n*) answer = n;;
*) answer=y;;
esac
done
Input:
Enter the code and description : 03 analgestics
Enter any more [Y/N] :y
Enter the code and description : 04 antibiotics
Enter any more [Y/N] : [Enter]
m
Enter the code and description : 05 OTC drugs
Enter any more [Y/N] : n
o
Output:
$ cat newlist
s.c
03 | analgestics
04 | antibiotics
05 | OTC drugs
Synatx:
list here comprises a series of character strings. Each string is assigned to variable
specified.
Example:
m
Output:
o
s.c
Sources of list:
List from variables: Series of variables are evaluated by the shell before
executing the loop
Example:
c
$ for var in $PATH $HOME; do echo “$var” ; done
vtu
Output:
/bin:/usr/bin;/home/local/bin;
/home/user1
Example:
List from wildcards: Here the shell interprets the wildcards as filenames.
Example:
Example: emp.sh
#! /bin/sh
for pattern in “$@”; do
m
grep “$pattern” emp.lst || echo “Pattern $pattern not found”
done
Output:
o
$emp.sh 9876 “Rohit”
s.c
9876 Jai Sharma Director Productions
2356 Rohit Director Sales
Example1:
$basename /home/user1/test.pl
Ouput:
w.
test.pl
Example2:
Ouput:
test2
The set statement assigns positional parameters $1, $2 and so on, to its arguments. This is
used for picking up individual fields from the output of a program.
m
Example 1:
$ set 9876 2345 6213
o
$
s.c
This assigns the value 9876 to the positional parameters $1, 2345 to $2 and 6213 to $3. It
also sets the other parameters $# and $*.
Example 2:
$ set `date` c
$ echo $*
vtu
Mon Oct 8 08:02:45 IST 2007
Example 3:
Shift transfers the contents of positional parameters to its immediate lower numbered one.
This is done as many times as the statement is called. When called once, $2 becomes $1,
$3 becomes S2 and so on.
Example 1:
$ echo “$@” $@ and $* are interchangeable
Mon Oct 8 08:02:45 IST 2007
$ echo $1 $2 $3
Mon Oct 8
$shift
$echo $1 $2 $3
Mon Oct 8 08:02:45
$shift 2 Shifts 2 places
$echo $1 $2 $3
m
08:02:45 IST 2007
Example 2: emp.sh
o
#! /bin/sh
Case $# in
s.c
0|1) echo “Usage: $0 file pattern(S)” ;exit ;;
*) fname=$1
shift
for pattern in “$@” ; do
done;;
c
grep “$pattern” $fname || echo “Pattern $pattern not found”
vtu
esac
Output:
$emp.sh emp.lst
Insufficient number of arguments
w.
Inorder for the set to interpret - and null output produced by UNIX commands the –
option is used . If not used – in the output is treated as an option and set will interpret it
wrongly. In case of null, all variables are displayed instead of null.
Example:
$set `ls –l chp1`
Output:
-rwxr-xr-x: bad options
m
Example2:
$set `grep usr1 /etc/passwd`
o
Correction to be made to get correct output are:
s.c
$set -- `ls –l chp1`
$set -- `grep usr1 /etc/passwd`
Example:
w.
MARK
The string (MARK) is delimiter. The shell treats every line following the command and
delimited by MARK as input to the command. Kumar at the other end will see three lines
of message text with the date inserted by command. The word MARK itself doesn’t show
up.
A shell script can be made to work non-interactively by supplying inputs through here
document.
Example:
m
$ search.sh << END
> director
>emp.lst
>END
o
Output:
s.c
Enter the pattern to be searched: Enter the file to be used: Searching for director from file
emp.lst
Normally, the shell scripts terminate whenever the interrupt key is pressed. It is not a
good programming practice because a lot of temporary files will be stored on disk. The
trap statement lets you do the things you want to do when a script receives a signal. The
trap statement is normally placed at the beginning of the shell script and uses two lists:
ww
When a script is sent any of the signals in signal_list, trap executes the commands in
command_list. The signal list can contain the integer values or names (without SIG
prefix) of one or more signals – the ones used with the kill command.
Example: To remove all temporary files named after the PID number of the shell:
trap ‘rm $$* ; echo “Program Interrupted” ; exit’ HUP INT TERM
trap is a signal handler. It first removes all files expanded from $$*, echoes a message
and finally terminates the script when signals SIGHUP (1), SIGINT (2) or SIGTERM(15)
are sent to the shell process running the script.
m
A script can also be made to ignore the signals by using a null command list.
Example:
trap ‘’ 1 2 15
o
Programs
s.c
1)
#!/bin/sh
IFS=“|”
While echo “enter dept code:\c”; do
Read dcode
Set -- `grep “^$dcode”<<limit
01|ISE|22
02|CSE|45
c
vtu
03|ECE|25
04|TCE|58
limit`
Case $# in
3) echo “dept name :$2 \n emp-id:$3\n”
*) echo “invalid code”;continue
esac
w.
done
Output:
$valcode.sh
Enter dept code:88
Invalid code
ww
2)
#!/bin/sh
x=1
While [$x –le 10];do
echo “$x”
x=`expr $x+1`
done
m
#!/bin/sh
sum=0
for I in “$@” do
echo “$I”
sum=`expr $sum + $I`
o
done
Echo “sum is $sum”
s.c
3)
#!/bin/sh
sum=0
for I in `cat list`; do
echo “string is $I”
x= `expr “$I”:’.*’`
Echo “length is $x”
Done
c
vtu
4)
This is a non-recursive shell script that accepts any number of arguments and prints them
in a reverse order.
#!/bin/sh
if [ $# -lt 2 ]; then
echo "please enter 2 or more arguments"
exit
fi
ww
for x in $@
do
y=$x" "$y
done
echo "$y"
Run1:
[root@localhost shellprgms]# sh sh1a.sh 1 2 3 4 5 6 7
7654321
argument an is this
o m
5)
The following shell script to accept 2 file names checks if the permission for these files
are identical and if they are not identical outputs each filename followed by permission.
s.c
#!/bin/sh
if [ $# -lt 2 ]
then
echo "invalid number of arguments"
exit
fi
c
vtu
str1=`ls -l $1|cut -c 2-10`
str2=`ls -l $2|cut -c 2-10`
if [ "$str1" = "$str2" ]
w.
then
echo "the file permissions are the same: $str1"
else
echo " Different file permissions "
ww
Run2:
m
#!/bin/sh
if [ $# -gt 2 ]
then
o
echo "usage sh flname dir"
exit
s.c
fi
if [ -d $1 ]
then
ls -lR $1|grep -v ^d|cut -c 34-43,56-69|sort -n|tail -1>fn1
c
echo "file name is `cut -c 10- fn1`"
echo " the size is `cut -c -9 fn1`"
vtu
else
echo "invalid dir name"
fi
Run1:
w.
7)This shell script that accepts valid log-in names as arguments and prints their
corresponding home directories. If no arguments are specified, print a suitable error
message.
if [ $# -lt 1 ]
then
echo " Invlaid Arguments....... "
exit
fi
for x in "$@"
do
grep -w "^$x" /etc/passwd | cut -d ":" -f 1,6
done
Run1:
m
[root@localhost shellprgms]# sh 4a.sh root
root:/root
o
Run2:
s.c
[root@localhost shellprgms]# sh 4a.sh
Invalid Arguments.......
8) This shell script finds and displays all the links of a file specified as the first argument
to the script. The second argument, which is optional, can be used to specify the directory
c
in which the search is to begin. If this second argument is not present .the search is
to begin in current working directory.
vtu
#!/bin/bash
if [ $# -eq 0 ]
then
echo "Usage:sh 8a.sh[file1] [dir1(optional)]"
exit
w.
fi
if [ -f $1 ]
then
dir="."
ww
if [ $# -eq 2 ]
then
dir=$2
fi
inode=`ls -i $1|cut -d " " -f 2`
echo "Hard links of $1 are"
find $dir -inum $inode -print
m
Run1:
o
[root@localhost shellprgms]$ sh 5a.sh hai.c
Hard links of hai.c are
s.c
./hai.c
Soft links of hai.c are
./hai_soft
c
9) This shell script displays the calendar for current month with current date replaced by
* or ** depending on whether date has one digit or two digits.
vtu
#!/bin/bash
n=` date +%d`
echo " Today's date is : `date +%d%h%y` ";
cal > calfile
w.
if [ $n -gt 9 ]
then
sed "s/$n/\**/g" calfile
else
ww
22 23 24 25 26 27 28
29 30 31
10) This shell script implements terminal locking. Prompt the user for a password after
accepting, prompt for confirmation, if match occurs it must lock and ask for password, if
it matches terminal must be unlocked
trap “ " 1 2 3 5 20
m
clear
echo -e “\nenter password to lock terminal:"
stty -echo
o
read keynew
stty echo
s.c
echo -e “\nconfirm password:"
stty -echo
read keyold
stty echo
if [ $keyold = $keynew ]
then
c
vtu
echo "terminal locked!"
while [ 1 ]
do
echo "retype the password to unlock:"
stty -echo
w.
read key
if [ $key = $keynew ]
then
ww
stty echo
echo "terminal unlocked!"
stty sane
exit
fi
echo "invalid password!"
done
else
m
[root@localhost shellprgms]# sh 13.sh
enter password:
confirm password:
o
terminal locked!
retype the password to unlock:
s.c
invalid password!
retype the password to unlock:
terminal unlocked!
-------------------------------------------------------------------------------------------------
c
vtu
****
w.
ww
UNIT 7
m
Text Book
o
7. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
s.c
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
vtu
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
m
Simple awk Filtering
o
awk is not just a command, but a programming language too. In other words, awk utility
is a pattern scanning and processing language. It searches one or more files to see if they
contain lines that match specified patterns and then perform associated actions, such as
s.c
writing the line to the standard output or incrementing a counter each time it finds a
match.
Syntax:
c
awk option ‘selection_criteria {action}’ file(s)
Here, selection_criteria filters input and selects lines for the action component to act
vtu
upon. The selection_criteria is enclosed within single quotes and the action within the
curly braces. Both the selection_criteria and action forms an awk program.
Output:
w.
In the above example, /manager/ is the selection_criteria which selects lines that are
processed in the action section i.e. {print}. Since the print statement is used without any
field specifiers, it prints the whole line.
Note: If no selection_criteria is used, then action applies to all lines of the file.
Since printing is the default action of awk, any one of the following three forms can be
used:
awk ‘/manager/ ’ emp.lst
awk ‘/manager/ { print }’ emp.lst
awk ‘/manager/ { print $0}’ emp.lst $0 specifies complete line.
m
Awk uses regular expression in sed style for pattern matching.
o
Output:
s.c
2356 Rohit Manager Sales
5683 Rakesh Manager Marketing
Output:
In the above example, comma (,) is used to delimit field specifications to ensure that each
field is separated from the other by a space so that the program produces a readable
output.
Note: We can also specify the number of lines we want using the built-in variable NR as
illustrated in the following example:
Example: awk –F “|” ‘NR==2, NR==4 { print NR, $2, $3, $4 }’ emp.lst
Output:
2 Jai Sharma Manager Productions
3 Rahul Accountant Productions
4 Rakesh Clerk Productions
m
printf: Formatting Output
o
The printf statement can be used with the awk to format the output. Awk accepts most of
the formats used by the printf function of C.
s.c
Example: awk –F “|” ‘/[kK]u?[ar]/ { printf “%3d %-20s %-12s \n”, NR, $2, $3}’
>emp.lst
Output:
4
8
R Kumar
Sunil kumaar
c Manager
Accountant
vtu
4 Anil Kummar Clerk
Here, the name and designation have been printed in spaces 20 and 12 characters wide
respectively.
w.
The print and printf statements can be separately redirected with the > and | symbols. Any
command or a filename that follows these redirection symbols should be enclosed within
double quotes.
Example1: use of |
printf “%3d %-20s %-12s \n”, NR, $2, $3 | “sort”
Variables and expressions can be used with awk as used with any programming language.
Here, expression consists of strings, numbers and variables combined by operators.
m
Example: (x+2)*y, x-15, x/y, etc..,
Note: awk does not have any data types and every expression is interpreted either as a
o
string or a number. However awk has the ability to make conversions whenever required.
A variable is an identifier that references a value. To define a variable, you only have to
s.c
name it and assign it a value. The name can only contain letters, digits, and underscores,
and may not start with a digit. Case distinctions in variable names are important: Salary
and salary are two different variables. awk allows the use of user-defined variables
without declaring them i.e. variables are deemed to be declared when they are used for
the first time itself.
Example: X= “4”
X= “3”
c
vtu
Print X
Print x
Strings in awk are enclosed within double quotes and can contain any character. Awk
strings can include escape sequence, octal values and even hex values. Octal values are
preceded by \ and hex values by \x. Strings that do not consist of numbers have a numeric
value of 0.
ww
Example 1: z = "Hello"
print z prints Hello
String concatenation can also be performed. Awk does not provide any operator for this,
however strings can be concatenated by simply placing them side-by-side.
Example 3: x = “UNIX”
y = “LINUX”
print x “&” y prints UNIX & LINUX
m
A numeric and string value can also be concatenated.
o
Print l m prints 82 by converting m to string.
Print l - m prints 6 by converting l as number.
s.c
Print m + n prints 2 by converting n to numeric 0.
Expressions also have true and false values associated with them. A nonempty string or
any positive number has true value.
awk also provides the comparison operators like >, <, >=, <= ,==, !=, etc..,
w.
Output:
Output:
m
Rakesh Clerk 6000
The above example illustrates the use of != and && operators. Here all the employee
o
records other than that of manager and chairman are displayed.
s.c
~ and !~ : The Regular Expression Operators:
Note:
The operators ~ and !~ work only with field specifiers like $1, $2, etc.,.
w.
For instance, to locate g.m s the following command does not display the expected output,
because the word g.m. is embedded in d.g.m or c.g.m.
To avoid such unexpected output, awk provides two operators ^ and $ that indicates the
beginning and end of the filed respectively. So the above command should be modified
as follows:
The following table depicts the comparison and regular expression matching operators.
Operator Significance
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
m
>= Greater than or equal to
> Greater than
o
~ Matches a regular expression
!~ Doesn’t matches a regular expression
s.c
Table 1: Comparison and regular expression matching operators.
Number Comparison:
Awk has the ability to handle numbers (integer and floating type). Relational test or
c
comparisons can also be performed on them.
Output:
In the above example, the details of employees getting salary greater than 7500 are
displayed.
ww
Output:
m
In the above example, the details of employees getting salary greater than 7500 or whose
year of birth is 1980 are displayed.
o
s.c
Number Processing
Numeric computations can be performed in awk using the arithmetic operators like +, -, /,
*, % (modulus). One of the main feature of awk w.r.t. number processing is that it can
c
handle even decimal numbers, which is not possible in shell.
Example: $ awk –F “|” ‘$3’ == “manager” {
vtu
> printf “%-20s %-12s %d\n”, $2, $3, $5, $5*0.4}’ emp.lst
Output:
Variables
Awk allows the user to use variables of there choice. You can now print a serial number,
using the variable kount, and apply it those directors drawing a salary exceeding 6700:
$ awk –F”|” ‘$3 == “director” && $6 > 6700 {
kount =kount+1
printf “ %3f %20s %-12s %d\n”, kount,$2,$3,$6 }’ empn.lst
m
The initial value of kount was 0 (by default). That’s why the first line is correctly
assigned the number 1. awk also accepts the C- style incrementing forms:
Kount ++
o
Kount +=2
Printf “%3d\n”, ++kount
s.c
THE –f OPTION: STORING awk PROGRAMS INA FILE
You should holds large awk programs in separate file and provide them with the
.awk extension for easier identification. Let’s first store the previous program in the file
empawk.awk:
$ cat empawk.awk
c
Observe that this time we haven’t used quotes to enclose the awk program. You
vtu
can now use awk with the –f filename option to obtain the same output:
something before processing the first line, for example, a heading, then the BEGIN
section can be used gainfully. Similarly, the end section useful in printing some totals
after processing is over.
The BEGIN and END sections are optional and take the form
BEGIN {action}
END {action}
ww
These two sections, when present, are delimited by the body of the awk program. You
can use them to print a suitable heading at the beginning and the average salary at the
end. Store this program, in a separate file empawk2.awk
Like the shell, awk also uses the # for providing comments. The BEGIN section
prints a suitable heading , offset by two tabs (\t\t), while the END section prints the
average pay (tot/kount) for the selected lines. To execute this program, use the –f option:
Like all filters, awk reads standard input when the filename is omitted. We can make awk
behave like a simple scripting language by doing all work in the BEGIN section. This is
how you perform floating point arithmetic:
m
This is something that you can’t do with expr. Depending on the version of the awk the
prompt may be or may not be returned, which means that awk may still be reading
standard input. Use [ctrl-d] to return the prompt.
BUILT-IN VARIABLES
o
Awk has several built-in variables. They are all assigned automatically, though it
is also possible for a user to reassign some of them. You have already used NR, which
s.c
signifies the record number of the current line. We’ll now have a brief look at some of the
other variable.
The FS Variable: as stated elsewhere, awk uses a contiguous string of spaces as the
default field delimeter. FS redefines this field separator, which in the sample database
happens to be the |. When used at all, it must occur in the BEGIN section so that the body
c
of the program knows its value before it starts processing:
BEGIN {FS=”|”}
vtu
This is an alternative to the –F option which does the same thing.
The OFS Variable: when you used the print statement with comma-separated arguments,
each argument was separated from the other by a space. This is awk’s default output field
separator, and can reassigned using the variable OFS in the BEGIN section:
w.
BEGIN { OFS=”~” }
When you reassign this variable with a ~ (tilde), awk will use this character for delimiting
the print arguments. This is a useful variable for creating lines with delimited fields.
ww
The NF variable: NF comes in quite handy for cleaning up a database of lines that don’t
contain the right number of fields. By using it on a file, say emp.lst, you can locate those
lines not having 6 fields, and which have crept in due to faulty data entry:
The FILENAME Variable: FILENAME stores the name of the current file being
processed. Like grep and sed, awk can also handle multiple filenames in the command
line. By default, awk doesn’t print the filename, but you can instruct it to do so:
With FILENAME, you can device logic that does different things depending on the file
that is processed.
ARRAYS
m
An array is also a variable except that this variable can store a set of values or
elements. Each element is accessed by a subscript called the index. Awk arrays are
different from the ones used in other programming languages in many respects:
They are not formally defined. An array is considered declared the
moment it is used.
o
Array elements are initialized to zero or an empty string unless initialized
explicitly.
Arrays expand automatically.
s.c
The index can be virtually any thing: it can even be a string.
In the program empawk3.awk, we use arrays to store the totals of the basic pay, da, hra
and gross pay of the sales and marketing people. Assume that the da is 25%, and hra 50%
of basic pay. Use the tot[] array to store the totals of each element of pay, and also the
gross pay: c
Note that this time we didn’t match the pattern sales and marketing specifically in a field.
We could afford to do that because the patterns occur only in the fourth field, and there’s
vtu
no scope here for ambiguity. When you run the program, it outputs the average of the two
elements of pay:
C-programmers will find the syntax quite comfortable to work with except that awk
w.
simplifies a number of things that require explicit specifications in C. there are no type
declarations, no initialization and no statement terminators.
Associative arrays
ww
Even though we used integers as subscripts in the tot [ ] array, awk doesn’t treat
array indexes as integers. Awk arrays are associative, where information is held as key-
value pairs. The index is the key that is saved internally as a string. When we set an array
element using mon[1]=”mon”, awk converts the number 1 to a string. There’s no
specified order in which the array elements are stored. As the following example suggests,
the index “1” is different from “01”:
$ awk ‘BEGIN {
direction [“N”] = “North” ; direction [“S”] ;
direction [“E”] = “East” ; direction [“W”] = “West” ;]
m
Printf(“mon[\”1\”] is also %s \n”, mon[“1”]);
Printf(“But mon[\”01\”] is %s\n”, mon[“01”]);
}
o
There are two important things to be learned from this output. First, the setting with index
“1” overwrites the setng made with index 1. accessing an array element with subscript 1
and 01 actually locates the element with subscript “1”. Also note that mon[“1”] is
s.c
different from mon[“01”].
FUNCTIONS
Awk has several built in functions, performing both arithmetic and string
ww
operations. The arguments are passed to a function in C-style, delimited by commas and
enclosed by a matched pair of parentheses. Even though awk allows use of functions with
and without parentheses (like printf and printf()), POSIX discourages use of functions
without parentheses.
Some of these functions take a variable number of arguments, and one (length) uses no
arguments as a variant form. The functions are adequately explained here so u can
confidently use them in perl which often uses identical syntaxes.
There are two arithmetic functions which a programmer will except awk to offer. int
calculates the integral portion of a number (without rounding off),while sqrt calculates
square root of a number. awk also has some of the common string handling function you
can hope to find in any language. There are:
length: it determines the length of its arguments, and if no argument is present, the enire
line is assumed to be the argument. You can use length (without any argument) to locate
lines whose length exceeds 1024 characters:
m
awk –F”|” ‘length > 1024’ empn.lst
you can use length with a field as well. The following program selects those people who
have short names:
awk –F”|” ‘length ($2) < 11’ empn.lst
o
index(s1, s2): it determines the position of a string s2within a larger string s1. This
s.c
function is especially useful in validating single character fields. If a field takes the
values a, b, c, d or e you can use this function n to find out whether this single character
field can be located within a string abcde:
x = index (“abcde”, “b”)
This returns the value 2.
c
substr (stg, m, n): it extracts a substring from a string stg. m represents the starting point
of extraction and n indicates the number of characters to be extracted. Because string
values can also be used for computation, the returned string from this function can be
vtu
used to select those born between 1946 and 1951:
awk –F”|” ‘substr($5, 7, 2) > 45 && substr($5, 7, 2) < 52’ empn.lst
2365|barun sengupta|director|personel|11/05/47|7800|2365
3564|sudhir ararwal|executive|personnel|06/07/47|7500|2365
4290|jaynth Choudhury|executive|production|07/09/50|6000|9876
9876|jai sharma|director|production|12/03/50|7000|9876
w.
you can never get this output with either sed and grep because regular expressions can
never match the numbers between 46 and 51. Note that awk does indeed posses a
mechanism of identifying the type of expression from its context. It identified the date
field string for using substr and then converted it to a number for making a numeric
ww
comparison.
split(stg, arr, ch): it breaks up a string stg on the delimiter ch and stores the fields in an
array arr[]. Here’s how yo can convert the date field to the format YYYYMMDD:
You can also do it with sed, but this method is superior because it explicitly picks up the
fifth field, whereas sed would transorm the only date field that it finds.
system: you may want to print the system date at the beging of the report. For running a
UNIX command within a awk, you’ll have to use the system function. Here are two
examples:
BEGIN {
m
system(“tput clear”) Clears the screen
system(“date”) Executes the UNIX date command
}
o
Awk has practically all the features of a modern programming language. It has
conditional structures (the if statement) and loops (while or for). They all execute a body
s.c
of statements depending on the success or failure of the control command. This is simply
a condition that is specified in the first line of the construct.
Function Description
int(x) returns the integer value of x
sqrt(x) returns the square root of x
length
length(x)
substr(stg, m, n)
c returns the complete length of line
returns length of x
returns portion of string of length n, starting from position
vtu
m in string stg.
index(1s, s2) returns position of string s2 in string s1
splicit(stg, arr, ch) splicit string stg into array arr using ch as delimiter, returns
number of fields.
System(“cmd”) runs UNIX command cmd and returns its exit status
The if statement can be used when the && and || are found to be inadequate for
w.
certain tasks. Its behavior is well known to all programmers. The statement here takes the
form:
If (condition is true) {
Statement
} else {
ww
Statement
}
Like in C, none of the control flow constructs need to use curly braces if there’s
only one statement to be executed. But when there are multiple actions take, the
statement must be enclosed within a pair of curly braces. Moreover, the control command
must be enclosed in parentheses.
Most of the addresses that have been used so far reflect the logic normally used in
the if statement. In a previous example, you have selected lines where the basic pay
exceeded 7500, by using the condition as the selection criteria:
$6 > 7500 {
An alternative form of this logic places the condition inside the action component
rather than the selection criteria. But this form requires the if statement:
m
if can be used with the comparison operators and the special symbols ~ and !~ to match a
regular expression. When used in combination with the logical operators || and &&, awk
programming becomes quite easy and powerful. Some of the earlier pattern matching
expressions are rephrased in the following, this time in the form used by if:
if ( NR > = 3 && NR <= 6 )
o
if ( $3 == “director” || $3 == “chairman” )
if ( $3 ~ /^g.m/ )
s.c
if ( $2 !~ / [aA]gg?[ar]+wal/ )
if ( $2 ~[cC]ho[wu]dh?ury|sa[xk]s?ena/ )
To illustrate the use of the optional else statement, let’s assume that the dearness
allowance is 25% of basic pay when the latter is less than 600, and 1000 otherwise. The
if-else structure that implants this logic looks like this:
If ( $6 < 6000 )
else
c
da = 0.25*$6
vtu
da = 1000
You can even replace the above if construct with a compact conditional structure:
This is the form that C and perl use to implement the logic of simple if-else
w.
If ( $6 < 6000 ) {
hra = 0.50*$6
da = 0.25*$6
}else {
hra = 0.40*$6
da = 1000
}
awk supports two loops – for and while. They both execute the loop body as long
as the control command returns a true value. For has two forms. The easier one
resembles its C counterpart. A simple example illustrates the first form:
This form also consists of three components; the first component initializes the value of k,
m
the second checks the condition with every iteration, while the third sets the increment
used for every iteration. for is useful for centering text, and the following examples uses
awk with echo in a pipeline to do that:
$echo “
o
>Income statement\nfor\nthe month of august, 2002\nDepartment : Sales” |
>awk ‘ { for (k=1 ; k < (55 –length($0)) /2 ; k++)
s.c
>printf “%s”,” “
>printf $0}’
Income statement
for
the month of August, 2002
c
Department : Sales
The loop here uses the first printf statement to print the required number of spaces (page
vtu
width assumed to be 55 ). The line is then printed with the second printf statement,
which falls outside the loop. This is useful routine which can be used to center some titles
that normally appear at the beginning of a report.
for ( k in array )
commamds
ww
Here, k is the subscript of the array arr. Because k can also be a string, we can use this
loop to print all environment variables. We simply have to pick up each subscript of the
ENVIRON array:
$ nawk ‘BIGIN {
>for ( key in ENVIRON )
>print key “=” ENVIRON [key]
>}’
LOGNAME=praveen
MAIL=/var/mail/Praveen
PATH=/usr/bin::/usr/local/bin::/usr/ccs/bin
TERM=xterm
HOME=/home/praveen
SHELL=/bin/bash
Because the index is actually a string, we can use any field as index. We can even use
elements of the array counters. Using our sample databases, we can display the count of
m
the employees, grouped according to the disgnation ( the third field ). You can use the
string value of $3 as the subscript of the array kount[]:
o
>print desig, kount[desig] }’ empn.lst
s.c
g.m 4
chairman 1
executive 2
director 4
manager 2
d.g.m 2
c
The program here analyzes the databases to break up of the employees, grouped on their
vtu
designation. The array kount[] takes as its subscript non-numeric values g.m., chairman,
executive, etc.. for is invoked in the END section to print the subscript (desig) and the
number of occurrence of the subscript (kount[desig]). Note that you don’t need to sort the
input file to print the report!
command succeeds. For example, the previous for loop used for centering text can be
easily replaced with a while construct:
k=0
while (k < (55 – length($0))/2) {
ww
printf “%s”,“ ”
k++
}
print $0
The loop here prints a space and increments the value of k with every iteration. The
condition (k < (55 – length($0))/2) is tested at the beginning of every iteration, and the
loop body only if the test succeeds. In this way, entire line is filled with a string
spacesbefore the text is printed with print $0.
Not that the length function has been used with an argument ($0). This awk understands
to be the entire line. Since length, in the absence of arguments, uses the entire line
anyway, $0 can be omitted. Similarly, print $0 may also be replaced by simply print.
Programs
1)awk script to delete duplicate
m
lines in a file.
BEGIN { i=1;}
{
o
flag=1;
for(j=1; j<i && flag ; j++ )
s.c
{
if( x[j] == $0 )
flag=0;
}
if(flag) c
{
vtu
x[i]=$0;
printf "%s \n",x[i];
i++;
}
}
w.
Run1:
world
world
hello
this
is
this
Output:
m
2)awk script to print the transpose of a matrix.
o
BEGIN{
system(“tput clear”)
s.c
count =0
}
{
split($0,a);
for(j=1;j<=NF;j++) c
vtu
{ count = count+1
arr[count] =a[j]
}
K=NF
}
w.
END{
printf(“Transpose\n”);
for(j=1;j<=K;j++)
{
ww
Transpose
m
2 5
3 6
o
3)Awk script that folds long line into 40 columns. Thus any line that exceeds 40
s.c
Characters must be broken after 40th and is to be continued with the residue. The inputs
to be supplied through a text file created by the user.
BEGIN{
start=1; }
{ len=length;
c
for(i=$0; length(i)>40; len-=40)
{
vtu
print substr(i,1,40) "\\"
i=substr(i,41,len);
}
print i; }
Run1:
w.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaaaaa
aaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\
aaaaaaaaa
Output:
m
4)This is an awk program to provide extra spaces at the end of the line so that the line
length is maintained as 127.
awk ‘ { y=127 – length($0)
o
printf “%s”, $0
s.c
if(y > 0)
for(i=0;i<y;i++)
printf “%s”, “ ”
printf “\n”
}’ foo c
5)A file contains a fixed number of fields in the form of space-delimited numbers. This is
an awk program to print the lines as well as total of its rows.
vtu
awk ‘{ split($0,a)
for (i=1;i<=NF;i++) {
row[NR]+=a[$i]
}
w.
printf “%s”, $0
printf “%d\n”, row[NR]
} ’ foo
ww
----------------------------------------------------------------------------------
UNIT 8
m
Text Book
o
8. “UNIX – Concepts and Applications”, Sumitabha Das, 4th Edition, Tata McGraw
Hill, 2006.
s.c
(Chapters 1.2, 2, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19).
Reference Books
c
UNIX and Shell Programming, Behrouz A. Forouzan and Richard F. Gilberg, Thomson,
2005.
vtu
Unix & Shell Programming, M.G. Venkateshmurthy, Pearson Education, 2005.
w.
ww
.
Perl – The Mater Manipulator
Introduciton
The following sections tell you what Perl is, the variables and operators in perl, the string
handling functions. The chapter also discusses file handling in perl as also the lists, arrays
and associative arrays (hashes) that have made perl a popular scripting language. One or
m
two lines of code in perl accomplish many lines of code in a high level language. We
finally discuss writing subroutines in perl.
Objectives
o
perl preliminaries
The chop function
Variables and Operators
s.c
String handling functions
Specifying filenames in a command line
$_(Default Variable)
$. (Current Line Number) and .. (The Range Operator)
Lists and Arrays
c
ARGV[]: Command Line Arguments
foreach: Looping Through a List
split: Splitting into a List or Array
vtu
join: Joining a List
dec2bin.pl: Converting a Decimal Number to Binary
grep: Searching an Array for a Pattern
Associative Arrays
Regular Expressions and Substitution
File Handling
w.
Subroutines
Conclusion
1. Perl preliminaries
Perl: Perl stands for Practical Extraction and Reporting Language. The language was
ww
Perl is a simple yet useful programming language that provides the convenience of shell
scripts and the power and flexibility of high-level programming languages. Perl programs
are interpreted and executed directly, just as shell scripts are; however, they also contain
control structures and operators similar to those found in the C programming language.
This gives you the ability to write useful programs in a very
short time.
A perl program runs in a special interpretive model; the entire script is compiled
internally in memory before being executed. Script errors, if any, are generated before
execution. Unlike awk, printing isn’t perl’s default action. Like C, all perl statements end
m
with a semicolon. Perl statements can either be executed on command line with the –e
option or placed in .pl files. In Perl, anytime a # character is recognized, the rest of the
line is treated as a comment.
o
#!/usr/bin/perl
# Script: sample.pl – Shows the use of variables
s.c
#
print(“Enter your name: “);
$name=<STDIN>;
Print(“Enter a temperature in Centigrade: “);
$centigrade=<STDIN>;
$fahr=$centigrade*9/5 + 32;
c
print “The temperature in Fahrenheit is $fahr\n”;
print “Thank you $name for using this program.”
vtu
There are two ways of running a perl script. One is to assign execute (x) permission on
the script file and run it by specifying script filename (chmod +x filename). Other is to
use perl interpreter at the command line followed by the script name. In the second case,
we don’t have to use the interpreter line viz., #!/usr/bin/perl.
program, the variable $name will contain the input entered as well as the newline
character that was entered by the user. In order to remove the \n from the input variable,
we use chop($name).
Example: chop($var); will remove the last character contained in the string specified by
the variable var.
ww
Note that you should use chop function whenever you read a line from the keyboard or a
file unless you deliberately want to retain the newline character.
m
12034.
Comparison Operators
Perl supports operators similar to C for performing numeric comparison. It also provides
o
operators for performing string comparison, unlike C where we have to use either
strcmp() or strcmpi() for string comparison. The are listed next.
s.c
Numeric comparison String comparison
== eq
!= ne
> gt
< lt
>=
<=
c
Concatenating and Repeating Strings
ge
le
vtu
Perl provides three operators that operate on strings:
The . operator, which joins two strings together;
The x operator, which repeats a string; and
The .= operator, which joins and then assigns.
Example:
$a = “Info" . “sys"; # $a is now “Infosys"
$x=”microsoft”; $y=”.com”; $x=$x . $y; # $x is now “microsoft.com”
This join operation is also known as string concatenation.
The x operator (the letter x) makes n copies of a string, where n is the value of the right
ww
operand:
Example:
$a = “R" x 5; # $a is now “RRRRR"
The .= operator combines the operations of string concatenation and assignment:
Example:
$a = “VTU";
$a .= “ Belgaum"; # $a is now “VTU Belgaum"
m
5. Specifying Filenames in Command Line
Unlike awk, perl provides specific functions to open a file and perform I/O operations on
it. We will look at them in a subsequent section. However, perl also supports special
o
symbols that perform the same functionality. The diamond operator, <> is used for
reading lines from a file. When you specify STDIN within the <>, a line is read from the
standard input.
s.c
Example:
1. perl –e ‘print while (<>)’ sample.txt
2. perl –e ‘print <>’ sample.txt
In the first case, the file opening is implied and <> is used in scalar context (reading one
line).
In the second case, the loop is also implied but <> is interpreted in list context (reading
all lines). c
The following script will print all Gupta’s and Agarwal/Aggarwal’s contained in a file
vtu
(specified using an ERE) that is specified as a command line parameter along with the
script name.
#!/usr/bin/perl
printf(%30s”, “LIST OF EMPLOYEES\n”);
while(<>) {
print if /\bGupta|Ag+[ar][ar]wal/ ;
}
w.
By default, any function that accepts a scalar variable can have its argument omitted. In
this case, Perl uses $_, which is the default scalar variable. chop, <> and pattern matching
operate on $_ by default, the reason why we did not specify it explicitly in the print
statement in the previous script. The $_ is an important variable, which makes the perl
script compact.
chop(<STDIN>);
In this case, a line is read from standard input and assigned to default variable $_, of
which the last character (in this case a \n) will be removed by the chop() function.
Note that you can reassign the value of $_, so that you can use the functions of perl
without specifying either $_ or any variable name as argument.
m
7. $. (Current Line number) And .. (The range operator)
$. is the current line number. It is used to represent a line address and to select lines from
anywhere.
o
Example:
perl –ne ‘print if ($. < 4)’ in.dat # is similar ro head –n 3 in.dat
perl –ne ‘print if ($. > 7 && $. < 11)’ in.dat # is similar to sed –n ‘8,10p’
s.c
.. is the range operator.
Example:
perl –ne ‘print if (1..3)’ in.dat # Prints lines 1 to 3 from in.dat
perl –ne ‘print if (8..10)’ in.dat # Prints lines 8 to 10 from in.dat
You can also use compound conditions for selecting multiple segments from a file.
c
Example: if ((1..2) || (13..15)) { print ;} # Prints lines 1 to 2 and 13 to 15
vtu
8. Lists and Arrays
Perl allows us to manipulate groups of values, known as lists or arrays. These lists can be
assigned to special variables known as array variables, which can be processed in a
variety of ways.
A list is a collection of scalar values enclosed in parentheses. The following is a simple
example of a list:
(1, 5.3, "hello", 2)
w.
This list contains four elements, each of which is a scalar value: the numbers 1 and 5.3,
the string "hello", and the number 2.
To indicate a list with no elements, just specify the parentheses: ()
You can use different ways to form a list. Some of them are listed next.
Lists can also contain scalar variables:
(17, $var, "a string")
ww
Arrays
Perl allows you to store lists in special variables designed for that purpose. These
variables are called array variables. Note that arrays in perl need not contain similar type
of data. Also arrays in perl can dynamically grow or shrink at run time.
@array = (1, 2, 3); # Here, the list (1, 2, 3) is assigned to the array variable @array.
Perl uses @ and $ to distinguish array variables from scalar variables, the same name can
m
be used in an array variable and in a scalar variable:
$var = 1;
@var = (11, 27.1, "a string");
Here, the name var is used in both the scalar variable $var and the array variable @var.
These are two completely separate variables. You retrieve value of the scalar variable by
o
specifying $var, and of that of array at index 1 as $var[1] respectively.
s.c
Following are some of the examples of arrays with their description.
x = 27; # list containing one element
@y = @x; # assign one array variable to another
@x = (2, 3, 4);
@y = (1, @x, 5); # the list (2, 3, 4) is substituted for @x, and the resulting list
# (1, 2, 3, 4,5) is assigned to @y.
$len = @y;
c
# When used as an rvalue of an assignment, @y evaluates to the
# length of the array.
vtu
$last_index = $#y; # $# prefix to an array signifies the last index of the array.
Note that $ARGV[0], the first element of the @ARGV array variable, does not contain
the name of the program. This is a difference between Perl and C.
The splice function can do everything that shift, pop, unshift and push can do. It uses
m
upto four arguments to add or remove elements at any location in the array. The second
argument is the offset from where the insertion or removal should begin. The third
argument represents the number of elements to be removed. If it is 0, elements have to be
added. The new replaced list is specified by the fourth argument (if present).
splice(@list, 5, 0, 6..8); # Adds at 6th location, list becomes 1 2 3 4 5 6 7 8 9
o
splice(@list, 0, 2); # Removes from beginning, list becomes 3 4 5 6 7 8 9
s.c
10. foreach: Looping Through a List
foreach construct is used to loop through a list. Its general form is,
foreach $var in (@arr) {
statements
}
c
Example: To iterate through the command line arguments (that are specified as numbers)
and find their square roots,
foreach $number (@ARGV) {
vtu
print(“The square root of $number is ” .
sqrt($number) . “\n”);
}
You can even use the following code segment for performing the same task. Here note
the use of $_ as a default variable.
foreach (@ARGV) {
print(“The square root of $_ is “ . sqrt() . “\”);
w.
Another Example
#!/usr/bin/perl
@list = ("This", "is", "a", "list", "of", "words");
ww
The current element of the list being used as the counter is stored in a special scalar
variable, which in this case is $temp. This variable is special because it is only defined
for the statements inside the foreach loop.
m
11. split: Splitting into a List or Array
There are two important array handling functions in perl that are very useful in CGI
programming, viz., split and join.
o
split breaks up a line or expression into fields. These fields are assigned either to
variables or an array.
s.c
Syntax:
($var1, $var2, $var3 ….… ) = split(/sep/, str);
@arr = split(/sep/, str);
It splits the string str on the pattern sep. Here sep can be a regular expression or a literal
string. str is optional, and if absent, $_ is used as default. The fields resulting from the
split are assigned to a set of variables , or to an array.
script that converts a input decimal number into its binary equivalent. The script logic is
to repeatedly divide the number by two and collecting the remainders and finally printing
the reverse of all the collected remainders. The script is as follows:
#!/usr/bin/perl
foreach $num (@ARGV) {
$temp = $num;
until ($num == 0) {
$bit = $num % 2;
unshift(@bit_arr, $bit);
$num = int($num/2);
}
$binary_num = join(“”,@bit_arr);
print (“Binary form of $temp is $binary_num\n”);
splice(@bit_arr, 0, $#bit_arr+1);
}
The output of the above script (assuming script name is dec2bin.pl) is,
$ dec2bin.pl 10
m
Binary form of 10 is 1010
$ dec2bin.pl 8 12 15 10
Binary form of 8 is 1000
Binary form of 12 is 1100
o
Binary form of 15 is 1111
Binary form of 10 is 1010
s.c
$
elements. When you define an associative array, you specify the scalar values you want
to use to access the elements of the array. For example, here is a definition of a simple
associative array:
%fruits=("apple", 9, "banana", 23, "cherry", 11);
It alternates the array subscripts and values in a comma separated strings. i.e., it is
basically a key-value pair, where you can refer to a value by specifying the key.
ww
Normally, keys returns the key strings in a random sequence. To order the list
alphabetically, use sort function with keys.
1. foreach $key (sort(keys %region)) { # sorts on keys in the associative array, region
2. @key_list = reverse sort keys %region; # reverse sorts on keys in assoc. array, region
m
The s function: Substitution
You can use the =~ operator to substitute one string for another:
o
$val =~ s/a+/xyz/; # replace a, aa, aaa, etc., with xyz
$val =~ s/a/b/g; # replace all a's with b's;It also uses the g flag for global
# substitution
s.c
Here, the s prefix indicates that the pattern between the first / and the second is to be
replaced by the string between the second / and the third.
[A-Za-z0-9_].
\W doesn’t match a word character, same as [^a-zA-Z0-9_]
\s matches any whitespace (any character not visible on the screen); it is
equivalent to [ \r\t\n\f].
perl accepts the IRE and TRE used by grep and sed, except that the curly braces
and parenthesis are not escaped.
m
For example, to locate lines longer than 512 characters using IRE:
perl –ne ‘print if /.{513,}/’ filename # Note that we didn’t escape the curly braces
o
redirect output to a temporary file and then rename it back to the original file.
s.c
To edit multiple files in-place, use –I option.
perl –p –I –e “s/<B>/<STRONG>/g” *.html *.htm
The above statement changes all instances of <B> in all HTML files to <STRONG>. The
files themselves are rewritten with the new output. If in-place editing seems a risky thing
to do, oyu can back the files up before undertaking the operation:
perl –p –I .bak –e “tr/a-z/A-Z” foo[1-4]
c
This first backs up foo1 to foo1.bak, foo2 to foo2.bak and so on, before converting all
lowercase letters in each file to uppercase.
vtu
17. File Handling
To access a file on your UNIX file system from within your Perl program, you must
perform the following steps:
1. First, your program must open the file. This tells the system that your Perl program
wants to access the file.
2. Then, the program can either read from or write to the file, depending on how you
have opened the file.
w.
3. Finally, the program can close the file. This tells the system that your program no
longer needs access to the file.
INFILE is the file handle. The second argument is the pathname. If only the filename is
supplied, the file is assumed to be in the current working directory.
open(OUTFILE,”>report.dat”); # Opens the file in write mode
open(OUTFILE,”>>report.dat”); # Opens the file in append mode
The following script demonstrates file handling in perl. This script copies the first three
lines of one file into another.
#!/usr/bin/perl
open(INFILE, “desig.dat”) || die(“Cannot open file”);
open(OUTFILE, “>desig_out.dat”);
while(<INFILE>) {
m
and even find command that we have already seen. You can perform tests on filenames to
see whether the file is a directory file or an ordinary file, whether the file is readable,
executable or writable, and so on. Some of the file tests are listed next, along with a
description of what they do.
o
if -d filename True if file is a directory
if -e filename True if this file exists
s.c
if -f filename True if it is a file
if -l filename True if file is a symbolic link
if -s filename True if it is a non-empty file
if -w filename True if file writeable by the person running the program
if -x filename True if this file executable by the person running the program
if -z filename True if this file is empty
if -B filename
if -T filename
c
True if this is a binary file
True if this is a text file
vtu
19. Subroutines
The use of subroutines results in a modular program. We already know the advantages of
modular approach. (They are code reuse, ease of debugging and better readability).
Frequently used segments of code can be stored in separate sections, known as
subroutines. The general form of defining a subroutine in perl is:
sub procedure_name {
w.
Example: The following is a routine to read a line of input from a file and break it into
words.
ww
sub get_words {
$inputline = <>;
@words = split(/\s+/, $inputline);
}
Note: The subroutine name must start with a letter, and can then consist of any number of
letters, digits, and underscores. The name must not be a keyword.
Precede the name of the subroutine with & to tell perl to call the subroutine.
The following example uses the previous subroutine get_words to count the number of
occurrences of the word “the”.
#!/usr/bin/perl
$thecount = 0;
&get_words; Call the subroutine
while ($words[0] ne "") {
for ($index = 0; $words[$index] ne "";
m
$index += 1) {
$thecount += 1 if $words[$index] eq "the";
}
&get_words;
}
o
Return Values
s.c
In perl subroutines, the last value seen by the subroutine becomes the subroutine's return
value. That is the reason why we could refer to the array variable @words in the calling
routine.
Conclusion
Perl is a programming language that allows you to write programs that manipulate files,
c
strings, integers, and arrays quickly and easily. perl is a superset of grep, tr, sed, awk and
the shell. perl also has functions for inter- process communication. perl helps in
vtu
developing minimal code for performing complex tasks. The UNIX spirit lives in perl.
perl is popularly used as a CGI scripting lan
w.
ww