0% found this document useful (0 votes)

112 views11 pages

Unix Text Processing

The document provides examples of using common Linux text processing utilities like awk, sed, and perl to manipulate text in files. It demonstrates how each tool can be used for tasks like extracting/formatting specific fields, inserting/deleting lines, substituting text, and more. The examples are intended to illustrate the specific strengths of each tool.

Uploaded by

rsplenum

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

112 views11 pages

Unix Text Processing

Uploaded by

rsplenum

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 11

Title: Text Processing in Linux Author: Darin Brezeale Date 11-22-2004 Created: Updated: Saturday, 26-Apr-2008 18:09:30 EDT

Here are some examples of using the utilities found on Unix (available on some other platforms also) for manipulating the text in files. awk and perl both allow writing full programs, but I primarily use both as short one-liner programs which allows them to be piped to/from other Unix programs. Each of these programs has capabilities that make it better than the others in some situations which I have attempted to demonstrate below. I don't claim any of these to be original to me; references are at the bottom of the page. I have collected this information over the course of several years, during which time I have used Sun Solaris and various flavors of Linux. Note that the versions of these tools included with Solaris don't entirely match the GNU versions, so some of what you see below may need to be tinkered with to make work. The philosophy of Unix utilities is to develop a tool that is very good at doing a specific thing. The results of these tools can be sent to another tool via the pipe (i.e., the | character) as shown in several examples below. So, one program's output becomes the next program's input. awk cat csplit cut find fmt fold grep head join nl paste perl sdiff sed sort split tail uniq wc Examples References

sed, awk, and perl

awk good for working with files that contain information in columns. 1. Display only the first three columns of the file S O M E F I L E , using tabs to separate the results: a w k ' { p r i n t $ 1 " \ t \ t " $ 2 " \ t " $ 3 } ' S O M E F I L E 2. Display the first and fifth columns of the password file with a tab between them a w k F : ' { p r i n t $ 1 " \ t " $ 5 } ' / e t c / p a s s w d F : changes the column delimiter from spaces (the default) to a colon (:) 3. Display the second column of the file using double colons as the field separator a w k v ' F S = : : ' ' { p r i n t $ 2 } ' r a t i n g s . d a t 4. replace first column as "ORACLE" in S O M E F I L E a w k ' { $ 1 = " O R A C L E " p r i n t } ' S O M E F I L E 5. print the last field of every input line: a w k ' { p r i n t $ N F } ' S O M E F I L E 6. print the first 50 characters of each line. if a line has fewer than 50 characters, then the line is padded with spaces. a w k ' { p r i n t f ( " % 5 0 . 5 0 s \ n " , $ 0 ) } ' S O M E F I L E 7. sum the values in column 1 a w k ' B E G I N { t o t a l = 0 } { t o t a l + = $ 1 } E N D { p r i n t " t o t a l i s " , t o t a l } ' S O M E F I L E 8. sum the values in columns 1, 2 and 4 in order to calculate precision and recall a w k F ' , ' ' B E G I N { T P = 0 F P = 0 F N = 0 } { T P + = $ 1 F P + = $ 2 F N + = $ 4 } E N D { p r i n t " p r e c i s i o n i s " , T P / ( F P + T P ) p r i n t " r e c a l l i s " , T P / ( F N + T P ) } ' p r e c r e c a l l 2 s t a t e s . t x t 9. sum each row

a w k ' { s u m = 0 f o r ( i = 1 i < = N F i + + ) { s u m + = $ i } p r i n t s u m } ' S O M E F I L E

sed from the man page: Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is seds ability to filter text in a pipeline which particularly distinguishes it from other types of editors. 1. Double space i n f i l eand send the output to o u t f i l e s e d G o u t f i l e I use the input/output notation shown above. It is appropriate in many, if not all, cases to leave out the less than sign, e.g., s e d G i n f i l e > o u t f i l e 2. Double space a file which already has blank lines in it. Output file should contain no more than one blank line between lines of text. s e d ' / ^ $ / d G ' o u t f i l e 3. Triple space a file s e d ' G G ' o u t f i l e 4. Undo double-spacing (assumes even-numbered lines are always blank) s e d ' n d ' o u t f i l e 5. Insert a blank line above every line which matches r e g e x ("regex" represents a regular expression) s e d ' / r e g e x / { x p x } ' o u t f i l e 6. Print the line immediately before r e g e x , but not the line containing r e g e x s e d n ' / r e g e x p / { g 1 ! p } h ' o u t f i l e 7. Print the line immediately after r e g e x , but not the line containing r e g e x s e d n ' / r e g e x p / { n p } ' o u t f i l e 8. Insert a blank line below every line which matches r e g e x s e d ' / r e g e x / G ' o u t f i l e 9. Insert a blank line above and below every line which matches r e g e x s e d ' / r e g e x / { x p x G } ' o u t f i l e 10. Convert DOS newlines (CR/LF) to Unix format s e d ' s / ^ M $ / / ' o u t f i l e # in bash/tcsh, to get ^ Mpress Ctrl-V then Ctrl-M 11. Print only those lines matching the regular expressionsimilar to grep s e d n ' / s o m e _ w o r d / p ' i n f i l e s e d ' / s o m e _ w o r d / ! d ' 12. Print those lines that do not match the regular expressionsimilar to grep -v s e d n ' / r e g e x p / ! p ' s e d ' / r e g e x p / d ' 13. Skip the first two lines (start at line 3) and then alternate between printing 5 lines and skipping 3 for the entire file

s e d n ' 3 , $ { p n p n p n p n p n n n } ' o u t f i l e Notice that there are five p's in the sequence, representing the five lines to print. The three lines to skip between each set of lines to print are represented by the n n n at the end of the sequence. 14. Delete trailing whitespace (spaces, tabs) from end of each line s e d ' s / [ \ t ] * $ / / ' o u t f i l e 15. Substitute (find and replace) f o o with b a ron each line s e d ' s / f o o / b a r / ' o u t f i l e # replaces only 1st instance in a line s e d ' s / f o o / b a r / 4 ' o u t f i l e # replaces only 4th instance in a line s e d ' s / f o o / b a r / g ' o u t f i l e # replaces ALL instances in a line 16. Replace each occurrence of the hexadecimal character 92 with an apostrophe: s e d s / \ x 9 2 / ' / g " < o l d _ f i l e . t x t > n e w _ f i l e . t x t 17. Print section of file between two regular expressions (inclusive) s e d n ' / r e g e x 1 / , / r e g e x 1 / p ' < o l d _ f i l e . t x t > n e w _ f i l e . t x t 18. Combine the line containing R E G E Xwith the line that follows it s e d e ' N ' e ' s / R E G E X \ n / R E G E X / ' < o l d _ f i l e . t x t > n e w _ f i l e . t x t

perl can do anything sed and awk can do, but not always as easily as shown in the examples above. 1. replace OLDSTRING with NEWSTRING in the file(s) in FILELIST [e.g., f i l e 1 f i l e 2or * . t x t ] p e r l p i . b a k e ' s / O L D S T R I N G / N E W S T R I N G / g ' F I L E L I S T The options used are: e allows a one-line script to be ran from the command line i files are edited in place. In the example above, the .bak extension will be placed on original files p causes the script to be placed in a while loop that iterates over the filename arguments 2. the full perl program to do the same as the one-liner (without creating backup copies) is
# ! / u s r / b i n / p e r l #p e r l e x a m p l e . p l w h i l e( < > ) { s / O L D S T R I N G / N E W S T R I N G / g ; p r i n t ; }

run using . / p e r l e x a m p l e . p l F I L E L I S T 3. remove the carriage returns necessary for DOS text files from files on the Unix system p e r l p i . b a k e ' s / \ r $ / / g ' F I L E L I S T

Assorted Utilities
Some of the examples below use the following files: f i l e 1
T o m1 2 3M a i n D i c k4 7 8 7W e s t H a r r y9 8N o r t h

f i l e 2
T o mp r o g r a m m e r D i c kl a w y e r H a r r ya r t i s t

S u e1 0 3 5C o o p e r

g a . t x t
T h eG e t t y s b u r gA d d r e s s G e t t y s b u r g ,P e n n s y l v a n i a N o v e m b e r1 9 ,1 8 6 3 F o u rs c o r ea n ds e v e ny e a r sa g oo u rf a t h e r sb r o u g h tf o r t ho nt h i sc o n t i n e n t , an e wn a t i o n ,c o n c e i v e di nL i b e r t y ,a n dd e d i c a t e dt ot h ep r o p o s i t i o nt h a t a l lm e na r ec r e a t e de q u a l . N o ww ea r ee n g a g e di nag r e a tc i v i lw a r ,t e s t i n gw h e t h e rt h a tn a t i o n ,o ra n y n a t i o ns oc o n c e i v e da n ds od e d i c a t e d ,c a nl o n ge n d u r e .W ea r em e to nag r e a t b a t t l e f i e l do ft h a tw a r .W eh a v ec o m et od e d i c a t eap o r t i o no ft h a tf i e l d , a saf i n a lr e s t i n gp l a c ef o rt h o s ew h oh e r eg a v et h e i rl i v e st h a tt h a tn a t i o n m i g h tl i v e .I ti sa l t o g e t h e rf i t t i n ga n dp r o p e rt h a tw es h o u l dd ot h i s . B u t ,i nal a r g e rs e n s e ,w ec a nn o td e d i c a t e-w ec a nn o tc o n s e c r a t e-w e c a nn o th a l l o w-t h i sg r o u n d .T h eb r a v em e n ,l i v i n ga n dd e a d ,w h os t r u g g l e d h e r e ,h a v ec o n s e c r a t e di t ,f a ra b o v eo u rp o o rp o w e rt oa d do rd e t r a c t .T h e w o r l dw i l ll i t t l en o t e ,n o rl o n gr e m e m b e rw h a tw es a yh e r e ,b u ti tc a nn e v e r f o r g e tw h a tt h e yd i dh e r e .I ti sf o ru st h el i v i n g ,r a t h e r ,t ob ed e d i c a t e d h e r et ot h eu n f i n i s h e dw o r kw h i c ht h e yw h of o u g h th e r eh a v et h u sf a rs o n o b l ya d v a n c e d .I ti sr a t h e rf o ru st ob eh e r ed e d i c a t e dt ot h eg r e a tt a s k r e m a i n i n gb e f o r eu s-t h a tf r o mt h e s eh o n o r e dd e a dw et a k ei n c r e a s e dd e v o t i o n t ot h a tc a u s ef o rw h i c ht h e yg a v et h el a s tf u l lm e a s u r eo fd e v o t i o n-t h a tw e h e r eh i g h l yr e s o l v et h a tt h e s ed e a ds h a l ln o th a v ed i e di nv a i n-t h a tt h i s n a t i o n ,u n d e rG o d ,s h a l lh a v ean e wb i r t ho ff r e e d o m-a n dt h a tg o v e r n m e n t o ft h ep e o p l e ,b yt h ep e o p l e ,f o rt h ep e o p l e ,s h a l ln o tp e r i s hf r o mt h ee a r t h . S o u r c e :T h eC o l l e c t e dW o r k so fA b r a h a mL i n c o l n ,V o l .V I I ,e d i t e db yR o y P .B a s l e r .

In the examples using these files, the percent sign (%) at the beginning of the line represents the command prompt. Comments of what is happening follow the pound sign (#).

grep prints the lines of a file that match a search string (s t r i n gcan be a regular expression) g r e p i s t r i n g s o m e _ f i l e # print the lines containing s t r i n gregardless of case g r e p v s t r i n g s o m e _ f i l e # print the lines that don't contain s t r i n g g r e p E " s t r i n g 1 | s t r i n g 2 " s o m e _ f i l e # print the lines that contain s t r i n g 1or s t r i n g 2 find find has many parameters for restricting what it finds, but I only demonstrate here how to use it to recursively search from the current location for files containing t h e _ w o r d . More examples of using find. f i n d . t y p e f p r i n t | x a r g s g r e p t h e _ w o r d 2 > / d e v / n u l l f i n d . t y p e f e x e c g r e p ' t h e _ w o r d ' { } \ p r i n t In the first example, results of the f i n dcommand are piped to g r e p x a r g s is used to pass the filenames one at a time to g r e p . The value of STDERR (the errors) is eliminated by using 2 > / d e v / n u l l . The second example shows how to g r e peach filename by using a command-line option of f i n d .

Operations on entire files cat concatenate files and print on the standard output
%c a tEf i l e 2 #d i s p l a yf i l e 2 ,s h o w i n g$a te n do fe a c hl i n e T o mp r o g r a m m e r $ D i c kl a w y e r $ H a r r ya r t i s t $

c a tvs o m e f i l e #d i s p l a ys o m e f i l e ,s h o w i n gn o n p r i n t i n gc h a r a c t e r su s i n g^a n dM -n o t a t i o n ,e x c e p tf o rL F Da n dT A B c a tes o m e f i l e #d i s p l a ys o m e f i l e ,c o m b i n i n gt h ee f f e c t so fva n dE

nl Number lines of files

%n lf i l e 1 1 T o m1 2 3M a i n 2 D i c k4 7 8 7W e s t 3 H a r r y9 8N o r t h 4 S u e1 0 3 5C o o p e r

wc print the number of bytes, words, and lines in files

%w clf i l e 1 #p r i n tn u m b e ro fl i n e s 4f i l e 1 %w cwf i l e 1 #p r i n tn u m b e ro fw o r d s 1 2f i l e 1 %w cmf i l e 1 #p r i n tn u m b e ro fc h a r a c t e r s 6 0f i l e 1 %w cf i l e 1 #p r i n tn u m b e ro fl i n e s ,c h a r a c t e r s ,a n dw o r d s 4 1 2 6 0f i l e 1

Alter the format of a file fmt Reformat each paragraph of a file

%f m tw5 0g a . t x t#r e f o r m a tt o5 0c h a r a c t e r sp e rl i n e T h eG e t t y s b u r gA d d r e s sG e t t y s b u r g ,P e n n s y l v a n i a N o v e m b e r1 9 ,1 8 6 3 F o u rs c o r ea n ds e v e ny e a r sa g oo u rf a t h e r s b r o u g h tf o r t ho nt h i sc o n t i n e n t ,an e wn a t i o n , c o n c e i v e di nL i b e r t y ,a n dd e d i c a t e dt ot h e p r o p o s i t i o nt h a ta l lm e na r ec r e a t e de q u a l . N o ww ea r ee n g a g e di nag r e a tc i v i lw a r ,t e s t i n g w h e t h e rt h a tn a t i o n ,o ra n yn a t i o ns oc o n c e i v e d a n ds od e d i c a t e d ,c a nl o n ge n d u r e .W ea r em e to n ag r e a tb a t t l e f i e l do ft h a tw a r .W eh a v ec o m e t od e d i c a t eap o r t i o no ft h a tf i e l d ,a saf i n a l r e s t i n gp l a c ef o rt h o s ew h oh e r eg a v et h e i rl i v e s t h a tt h a tn a t i o nm i g h tl i v e .I ti sa l t o g e t h e r f i t t i n ga n dp r o p e rt h a tw es h o u l dd ot h i s . B u t ,i nal a r g e rs e n s e ,w ec a nn o td e d i c a t ew ec a nn o tc o n s e c r a t e-w ec a nn o th a l l o wt h i sg r o u n d .T h eb r a v em e n ,l i v i n ga n dd e a d ,w h o s t r u g g l e dh e r e ,h a v ec o n s e c r a t e di t ,f a ra b o v e o u rp o o rp o w e rt oa d do rd e t r a c t .T h ew o r l dw i l l l i t t l en o t e ,n o rl o n gr e m e m b e rw h a tw es a yh e r e , b u ti tc a nn e v e rf o r g e tw h a tt h e yd i dh e r e .I ti s f o ru st h el i v i n g ,r a t h e r ,t ob ed e d i c a t e dh e r e t ot h eu n f i n i s h e dw o r kw h i c ht h e yw h of o u g h th e r e h a v et h u sf a rs on o b l ya d v a n c e d .I ti sr a t h e r f o ru st ob eh e r ed e d i c a t e dt ot h eg r e a tt a s k r e m a i n i n gb e f o r eu s-t h a tf r o mt h e s eh o n o r e d d e a dw et a k ei n c r e a s e dd e v o t i o nt ot h a tc a u s ef o r w h i c ht h e yg a v et h el a s tf u l lm e a s u r eo fd e v o t i o n -t h a tw eh e r eh i g h l yr e s o l v et h a tt h e s ed e a d s h a l ln o th a v ed i e di nv a i n-t h a tt h i sn a t i o n , u n d e rG o d ,s h a l lh a v ean e wb i r t ho ff r e e d o ma n dt h a tg o v e r n m e n to ft h ep e o p l e ,b yt h ep e o p l e , f o rt h ep e o p l e ,s h a l ln o tp e r i s hf r o mt h ee a r t h . S o u r c e :T h eC o l l e c t e dW o r k so fA b r a h a mL i n c o l n , V o l .V I I ,e d i t e db yR o yP .B a s l e r .

fold wrap each input line to fit in specified width

%f o l dw5 0g a . t x t T h eG e t t y s b u r gA d d r e s s G e t t y s b u r g ,P e n n s y l v a n i a N o v e m b e r1 9 ,1 8 6 3 F o u rs c o r ea n ds e v e ny e a r sa g oo u rf a t h e r sb r o u g h t f o r t ho nt h i sc o n t i n e n t , an e wn a t i o n ,c o n c e i v e di nL i b e r t y ,a n dd e d i c a t e d t ot h ep r o p o s i t i o nt h a t a l lm e na r ec r e a t e de q u a l . N o ww ea r ee n g a g e di nag r e a tc i v i lw a r ,t e s t i n gw h e t h e rt h a tn a t i o n ,o ra n y

n a t i o ns oc o n c e i v e da n ds od e d i c a t e d ,c a nl o n ge n d u r e .W ea r em e to nag r e a t b a t t l e f i e l do ft h a tw a r .W eh a v ec o m et od e d i c a t e ap o r t i o no ft h a tf i e l d , a saf i n a lr e s t i n gp l a c ef o rt h o s ew h oh e r eg a v et h e i rl i v e st h a tt h a tn a t i o n m i g h tl i v e .I ti sa l t o g e t h e rf i t t i n ga n dp r o p e rt h a tw es h o u l dd ot h i s . B u t ,i nal a r g e rs e n s e ,w ec a nn o td e d i c a t e-w e c a nn o tc o n s e c r a t e-w e c a nn o th a l l o w-t h i sg r o u n d .T h eb r a v em e n ,l i v i n ga n dd e a d ,w h os t r u g g l e d h e r e ,h a v ec o n s e c r a t e di t ,f a ra b o v eo u rp o o rp o w e rt oa d do rd e t r a c t .T h e w o r l dw i l ll i t t l en o t e ,n o rl o n gr e m e m b e rw h a tw e s a yh e r e ,b u ti tc a nn e v e r f o r g e tw h a tt h e yd i dh e r e .I ti sf o ru st h el i v i n g ,r a t h e r ,t ob ed e d i c a t e d h e r et ot h eu n f i n i s h e dw o r kw h i c ht h e yw h of o u g h t h e r eh a v et h u sf a rs o n o b l ya d v a n c e d .I ti sr a t h e rf o ru st ob eh e r ed e d i c a t e dt ot h eg r e a tt a s k r e m a i n i n gb e f o r eu s-t h a tf r o mt h e s eh o n o r e dd e a dw et a k ei n c r e a s e dd e v o t i o n t ot h a tc a u s ef o rw h i c ht h e yg a v et h el a s tf u l lm e a s u r eo fd e v o t i o n-t h a tw e h e r eh i g h l yr e s o l v et h a tt h e s ed e a ds h a l ln o th a v e d i e di nv a i n-t h a tt h i s n a t i o n ,u n d e rG o d ,s h a l lh a v ean e wb i r t ho ff r e e d o m-a n dt h a tg o v e r n m e n t o ft h ep e o p l e ,b yt h ep e o p l e ,f o rt h ep e o p l e ,s h a l ln o tp e r i s hf r o mt h ee a r t h . S o u r c e :T h eC o l l e c t e dW o r k so fA b r a h a mL i n c o l n ,V o l .V I I ,e d i t e db yR o y P .B a s l e r .

Output parts of files head Output the first part of files

%h e a d2f i l e 1 #p r i n tt h ef i r s tt w ol i n e s T o m1 2 3M a i n D i c k4 7 8 7W e s t

tail Output the last part of files

%t a i l2f i l e 1 #d i s p l a yt h el a s t2l i n e s H a r r y9 8N o r t h S u e1 0 3 5C o o p e r

split Split a file into pieces (default is 1000 lines each)

s p l i ts o m e f i l e #c r e a t ef i l e so ft h ef o r mx a a ,x a b ,a n ds oo n s p l i tl5 0 0s o m e f i l e #e a c hn e wf i l ew i l lb ea tm o s t5 0 0l i n e sl o n g

csplit split a file into sections determined by context lines

c s p l i tb i g f i l e/ T h eE n d / + 4 #b r e a ka tt h el i n et h a ti s4l i n e sb e l o wT h eE n d c p s l i tkb i g f i l e/ T h eE n d / + 1" { 9 9 } " #b r e a ka tt h el i n eb e l o we a c ho c c u r r e n c eo fT h eE n du pt o9 9t i m e s

Operate on fields within a line cut print selected parts of lines from
%c u tc 1 1 0f i l e 2 T o mp r o g r a D i c kl a w y e H a r r ya r t i %c u td""f 2f i l e 1 1 2 3 4 7 8 7 #c u tc h a r a c t e r s1t h r o u g h1 0f r o mf i l e 2

#c u tt h es e c o n dc o l u m n( f 2 ) ;u s eas p a c ea st h ed e l i m i t e r( d"" )

9 8 1 0 3 5 l s* . t x t|c u tc 1 3|x a r g sm k d i r #c r e a t ed i r e c t o r i e sw i t ht h en a m e so ft h ef i r s tt h r e el e t t e r so fe a c h. t x tf i l e

paste merge lines of files, separated by tabs. The columns of the input files are placed side-by-side with each other.
%p a s t ef i l e 1f i l e 2 T o m1 2 3M a i n T o mp r o g r a m m e r D i c k4 7 8 7W e s t D i c kl a w y e r H a r r y9 8N o r t h H a r r ya r t i s t S u e1 0 3 5C o o p e r

join join lines of two files on a common field (files should be sorted by common field)
%j o i na2a1o1 . 1 , 1 . 2 , 2 . 2e""f i l e 1f i l e 2 T o m1 2 3p r o g r a m m e r D i c k4 7 8 7l a w y e r H a r r y9 8a r t i s t S u e1 0 3 5 j o i na2a1o1 . 1 , 1 . 2 , 2 . 2e""1123f i l e 1f i l e 2

a l i s t u n p a i r a b l e l i n e s i n f i l e 1 a n d f i l e 2 o d i s p l a y f i e l d s 1 a n d 2 o f f i l e 1 f i e l d 2 o f f i l e 2 e r e p l a c e a n y e m p t y o u t p u t f i e l d s w i t h b l a n k s 1 j o i n o n f i e l d 1 o f f i l e 1 2 j o i n o n f i e l d 3 o f f i l e 2 sdiff print differences between files s d i f f s f i l e 1 f i l e 2 s supress identical lines

Operate on sorted files sort sort lines of text files

%s o r t+ 1f i l e 1 S u e1 0 3 5C o o p e r T o m1 2 3M a i n D i c k4 7 8 7W e s t H a r r y9 8N o r t h #s o r to nt h es e c o n dc o l u m n( t h ec o u n ts t a r t sa tz e r o )

%s o r tn+ 1f i l e 1 #p e r f o r man u m e r i cs o r t( n )b yt h es e c o n dc o l u m n H a r r y9 8N o r t h T o m1 2 3M a i n S u e1 0 3 5C o o p e r D i c k4 7 8 7W e s t

use lensort to sort by line length use chunksort to sort paragraphs separated by a blank line uniq displays unique lines from a sorted file
c a tS O M E F I L E|s o r t|u n i q u n i qcf i l e n a m e u n i qdf i l e n a m e u n i qDf i l e n a m e u n i qif i l e n a m e u n i qsf i l e n a m e u n i quf i l e n a m e #t h i sc o u l dh a v eb e e nd o n ee a s i e rw i t h s o r tS O M E F I L E|u n i q #p r e f i xl i n e sb yt h en u m b e ro fo c c u r r e n c e s #d i s p l a yt h el i n e st h a ta r en o tu n i q u e #p r i n ta l ld u p l i c a t el i n e s #i g n o r ed i f f e r e n c e si nc a s ew h e nc o m p a r i n g #a v o i dc o m p a r i n gt h ef i r s tNc h a r a c t e r s #o n l yp r i n tu n i q u el i n e s

To perform these operations on multiple files, it is often helpful to create a simple shell script to operate on the appropriate files.

Assorted Examples that Combine Tools

These examples don't necessarily rely on the sample files given above. 1. find all files beginning in the current directory and sum the number of lines in them f i n d . e x e c w c l { } \ | a w k ' { t o t a l = t o t a l + $ 1 p r i n t t o t a l " " $ 1 " " $ 2 } ' 2. print the 4th, 3rd, and 2nd columns of S O M E F I L E(in that order), and sort on the last column (the 2nd column of the original file) c a t S O M E F I L E | a w k ' { p r i n t $ 4 " " $ 3 " " $ 2 } ' | s o r t + 2 3. print total size of all files f i n d . t y p e f n a m e " * . * " l s | a w k ' B E G I N { F I L E C N T = 0 T _ S I Z E = 0 } { T _ S I Z E + = $ 7 F I L E C N T + + } E N D { p r i n t " T o t a l F i l e s : " , F I L E C N T , " T o t a l S i z e : " , T _ S I Z E , " A v e r a g e S i z e : " , T _ S I Z E / F I L E C N T } ' 4. list all files with a size less than 100 bytes l s l | a w k ' { i f ( $ 5 < 1 0 0 ) { p r i n t $ 5 " " $ 8 } } ' here $ 5represents the column of file sizes produced by l s l 5. delete all files with a size less than 100 bytes l s l | a w k ' { i f ( $ 5 < 1 0 0 ) { p r i n t $ 8 } } ' | x a r g s i t r m \ { } 6. if the number in the second column is less than 1000, prefix it with a zero a w k ' { i f ( $ 2 < 1 0 0 0 ) { p r i n t $ 1 " 0 " $ 2 " " $ 3 } e l s e { p r i n t $ 1 " " $ 2 " " $ 3 } } ' < d v d t i t l e s 2 . s h > d v d t i t l e s 3 . s h 7. combine f i l e 1and f i l e 2and show TAB characters as ^ I % p a s t e f i l e 1 f i l e 2 | c a t T T o m 1 2 3 M a i n ^ I T o m p r o g r a m m e r D i c k 4 7 8 7 W e s t ^ I D i c k l a w y e r H a r r y 9 8 N o r t h ^ I H a r r y a r t i s t S u e 1 0 3 5 C o o p e r ^ I 8. sort ratings.dat on column 2 and subsort on column 0 using :as the delimiter, redirecting the output to ratingssorted.dat s o r t t : n + 2 + 0 r a t i n g s . d a t > r a t i n g s s o r t e d . d a t 9. cut the first and third columns of movies-ratings.dat, using the :as the delimiter, and count the unique lines c u t d : f 1 , 3 m o v i e s r a t i n g s . d a t | u n i q c 10. In a file where each line begins with 'File' followed by one or more digits followed by '=', e.g., 'File23=', find the duplicates a w k F = ' { p r i n t $ 2 } ' u n t i t l e d . p l s | s o r t | u n i q c | s o r t 11. Find all files from the current location with filenames of at least 50 characters f i n d . e x e c b a s e n a m e { } \ | s e d n ' / ^ . \ { 5 0 \ } / p ' 12. A file of closed captions needs to be cleaned up. Search for the blank lines and remove them as well as the two lines that follow the blank lines. This works by not printing everything from the blank line (/^$/) to the line with the colons (/:/). Since the first section to clean up doesn't have a blank line to look for, begin on the 3rd line of the file. % h e a d 7 0 2 7 3 m a r y _ s h e l l e y s _ f r a n k e n s t e i n . c c 1

0 0 : 0 0 : 3 0 , 0 6 3 > 0 0 : 0 0 : 3 3 , 0 6 6 [ W o m a n ] " I B U S I E D M Y S E L F T O T H I N K O F A S T O R Y . . . 2 0 0 : 0 0 : 3 3 , 0 6 6 > 0 0 : 0 0 : 3 7 , 5 7 0 " W H I C H W O U L D S P E A K T O T H E M Y S T E R I O U S F E A R S O F O U R N A T U R E . . . 3 0 0 : 0 0 : 3 7 , 5 7 0 > 0 0 : 0 0 : 3 9 , 5 7 2 " A N D A W A K E N . . . % % s e d n ' 3 , $ { / ^ $ / , / : / ! p } ' < 3 3 7 0 b e t r a y e d . c c > 3 3 7 0 b e t r a y e d . c c . c l e a n % % h e a d 7 0 2 7 3 m a r y _ s h e l l e y s _ f r a n k e n s t e i n . c c . c l e a n [ W o m a n ] " I B U S I E D M Y S E L F T O T H I N K O F A S T O R Y . . . " W H I C H W O U L D S P E A K T O T H E M Y S T E R I O U S F E A R S O F O U R N A T U R E . . . " A N D A W A K E N . . . 13. Search for lines containing : : 0 0 3 8 : :or : : 0 1 4 8 : :or : : 0 1 8 7 : : , use sed to replace the : :field delimiters with a %, and then perform a numerical sort on the second column. Note that egrep is equivalent to grep -E $ e g r e p " : : 0 0 3 8 : : | : : 0 1 4 8 : : | : : 0 1 8 7 : : " r a t i n g s . d a t | s e d ' s / : : / % / g ' | s o r t t % + 1 n > m a t c h r a t i n g s . t x t 14. determine the disk usage of each subdirectory of the current directory, sort in descending order, and format for readability $ d u s * | s o r t n r | a w k ' { p r i n t f ( " % 8 . 0 f K B % s \ n " , $ 1 , $ 2 ) } ' 2 9 2 2 3 8 2 0 K B b o b 2 3 0 3 8 6 6 0 K B t o m 1 9 9 9 9 3 7 6 K B s u e 1 1 0 1 0 2 8 8 K B a n d y 15. for columns 3-6125, find those columns that have some value other than '0,' and count the number of occurrences
# ! / b i n / s h f o rc o li n$ ( s e q36 1 2 5 ) ;d o e c h o" c o l u m n$ c o l " a w k' { p r i n t$ ' $ c o l ' } 'a l l s h o t s 2 n d 1 0 m i n u t e s . s h o t s|g r e pv c" 0 , " d o n e

16. print column 51 followed by the line number for this value, sorted by the values from column 51 $ a w k ' { p r i n t $ 5 1 " \ t " F N R } ' a l l s h o t s 2 n d 5 1 0 t h I f r a m e s s p a r s e . s h o t s | s o r t 17. extract the 6th column from all but the last line of s o m e f i l e $ h e a d n 1 s o m e f i l e | a w k ' { p r i n t $ 6 } ' 18. print all but the first column of s o m e f i l e $ a w k f r e m o v e _ f i r s t _ c o l u m n . a w k s o m e f i l e where the file r e m o v e _ f i r s t _ c o l u m n . a w kconsists of the following:
#r e m o v e _ f i r s t _ c o l u m n . a w k B E G I N{ O R S = " " } { f o r( i=2 ;i< =N F ;i + + )

i f( i= =N F ) p r i n t$ i" \ n " e l s e p r i n t$ i""

19. The first line of f i l e 1contains header information, which we don't want. f i l e 2lacks the column headers and therefore contains one less line than f i l e 1 . Extract all but the first line of f i l e 1and combine with the columns of f i l e 2to create f i l e 3with the vertical bar (|) as the delimiter between the columns of each. $ t a i l n + 2 f i l e 1 | p a s t e d ' | ' f i l e 2 > f i l e 3 20. delete the lines up to and including the regular expression (REGEX) $ s e d ' 1 , / R E G E X / d ' s o m e f i l e . t x t 21. delete the lines up to the regular expression (REGEX) $ s e d e ' / R E G E X / p ' e ' 1 , / R E G E X / d ' s o m e f i l e . t x t 22. delete all newlines (this turns the entire document into a single line $ t r d ' \ n ' < s o m e f i l e . t x t 23. combine groups of nonblank lines into a single line, where each group is separated by a single blank line. This works by first changing each blank line to XXXXX; second, each newline is replaced by a space; third, each XXXXX is now replaced with a newline in order to separate the original groups into lines. $ c a t s o m e f i l e . t x t
t h i si st h e f i r s ts e c t i o no f t h ef i l e t h i si st h e s e c o n ds e c t i o no f t h ef i l e t h i si st h e t h i r ds e c t i o no f t h ef i l e

$ s e d ' s / ^ $ / X X X X X / ' s o m e f i l e . t x t | t r ' \ n ' ' ' | s e d ' s / X X X X X / \ n / g ' | s e d ' s / ^ / / '
t h i si st h ef i r s ts e c t i o no ft h ef i l e t h i si st h es e c o n ds e c t i o no ft h ef i l e t h i si st h et h i r ds e c t i o no ft h ef i l e

24. remove non-alphabetic characters and convert uppercase to lowercase $ t r c s " [ : a l p h a : ] " " " < s o m e f i l e . t x t | t r " [ : u p p e r : ] " " [ : l o w e r : ] "

References
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. GNU core utilities Using the GNU text utilities awk one-liners The GNU Awk User's Guide Awk: Dynamic Variables How to Use Awk (Hartigan) sed one-liners sed scripts Sed - An Introduction Perl one-liners Perl one-liners Perl regular expressions Unix Power Tools, 2nd Ed., O'Reilly Linux Cookbook, 2nd Ed., No Starch Press

15. Unix in a Nutshell, 3rd Ed., O'Reilly 16. John & Ed's Miscellaneous Unix Tips 17. Classic Shell Scripting, O'Reilly great overview of the Unix philosophy of combining small tools that are each very good at a specific thing

Kioti CK2510, CK2510H Tractor Operator's Manual
0% (1)
Kioti CK2510, CK2510H Tractor Operator's Manual
15 pages
Pseudocode Standard
No ratings yet
Pseudocode Standard
8 pages
Basics of Medium-Voltage Wiring For PV Power Plant AC Collection Systems
100% (1)
Basics of Medium-Voltage Wiring For PV Power Plant AC Collection Systems
15 pages
NS2 - How To Add New Routing Protocol - Causal Nexus
No ratings yet
NS2 - How To Add New Routing Protocol - Causal Nexus
10 pages
Function Pointers and Cooperative Multitasking
100% (1)
Function Pointers and Cooperative Multitasking
5 pages
Device Drivers Part 16
No ratings yet
Device Drivers Part 16
4 pages
What's New in Java 1
No ratings yet
What's New in Java 1
9 pages
50 Most Frequently Used UNIX Linux Commands With Examples
No ratings yet
50 Most Frequently Used UNIX Linux Commands With Examples
32 pages
Device Drivers, Part 13
No ratings yet
Device Drivers, Part 13
10 pages
Gnuplot Tutorial
No ratings yet
Gnuplot Tutorial
2 pages
15 Rsync Command Examples
No ratings yet
15 Rsync Command Examples
16 pages
How Do I Get The Name of An Object's Type in JavaScript - Stack Overflow
No ratings yet
How Do I Get The Name of An Object's Type in JavaScript - Stack Overflow
10 pages
Linux Install and Configure Pound Reverse Proxy For Apache HTTP - Https Web Server PDF
No ratings yet
Linux Install and Configure Pound Reverse Proxy For Apache HTTP - Https Web Server PDF
9 pages
Delphi Threading by Example
No ratings yet
Delphi Threading by Example
5 pages
Text and Graphic: Printing
No ratings yet
Text and Graphic: Printing
4 pages
Functions - C++ Tutorials
No ratings yet
Functions - C++ Tutorials
7 pages
Python Tornado Web Server With WebSockets - Part I - Codestance
No ratings yet
Python Tornado Web Server With WebSockets - Part I - Codestance
6 pages
Lab 05
No ratings yet
Lab 05
9 pages
Solucionario - Mecanica de Fluidos - Sexta Edicion - Robert L Mott PDF
75% (8)
Solucionario - Mecanica de Fluidos - Sexta Edicion - Robert L Mott PDF
298 pages
Converting DOS Batch Files To Shell Scripts
No ratings yet
Converting DOS Batch Files To Shell Scripts
4 pages
Assignment 3: CS 115 001 - Object-Oriented Design Due: March 23, 2024
No ratings yet
Assignment 3: CS 115 001 - Object-Oriented Design Due: March 23, 2024
3 pages
WT Question 13,14,15
No ratings yet
WT Question 13,14,15
7 pages
The Thinsp Package: Palle Jørgensen 4th December 2007
No ratings yet
The Thinsp Package: Palle Jørgensen 4th December 2007
12 pages
Device Drivers Part 4
No ratings yet
Device Drivers Part 4
7 pages
The Grep Command Syntax: G/re/p
No ratings yet
The Grep Command Syntax: G/re/p
26 pages
Monitoring External Jobs - Jenkins - Jenkins Wiki
No ratings yet
Monitoring External Jobs - Jenkins - Jenkins Wiki
2 pages
Hash Table1
No ratings yet
Hash Table1
8 pages
List of MS-DOS Commands - Wikipedia, The Free Encyclopedia
No ratings yet
List of MS-DOS Commands - Wikipedia, The Free Encyclopedia
47 pages
Nils Liberg's Kontakt Script Editor
0% (1)
Nils Liberg's Kontakt Script Editor
9 pages
SQL (DML) Help - How To Delete Duplicate Rows With Primary Key
No ratings yet
SQL (DML) Help - How To Delete Duplicate Rows With Primary Key
5 pages
02 - UNIX - Reading: 1 UNIX Commands For Data Scientists
No ratings yet
02 - UNIX - Reading: 1 UNIX Commands For Data Scientists
8 pages
PLT Lecture Notes
No ratings yet
PLT Lecture Notes
5 pages
Data Structures and Algorithms I Fall 2019 Programming Assignment #1
No ratings yet
Data Structures and Algorithms I Fall 2019 Programming Assignment #1
7 pages
Sed 1 Line
No ratings yet
Sed 1 Line
8 pages
Data Structures and Algorithms I Spring 2020 Programming Assignment #1
No ratings yet
Data Structures and Algorithms I Spring 2020 Programming Assignment #1
7 pages
Sed 1 Liners
No ratings yet
Sed 1 Liners
8 pages
Performing Raw SQL Queries - Django Documentation - Django
No ratings yet
Performing Raw SQL Queries - Django Documentation - Django
5 pages
Chapter 8 Strings: 8.1 A String Is A Sequence
No ratings yet
Chapter 8 Strings: 8.1 A String Is A Sequence
12 pages
Bacula Concurrent Jobs Multiple Storage Devices Client Labeled Pools Debian Installation Configuration PDF
No ratings yet
Bacula Concurrent Jobs Multiple Storage Devices Client Labeled Pools Debian Installation Configuration PDF
18 pages
Return-To-Libc Attacks and Arbitrary Code Execution On Non-Executable Stacks
No ratings yet
Return-To-Libc Attacks and Arbitrary Code Execution On Non-Executable Stacks
3 pages
Introduction To Unix (CA263) Passing Arguments: by Tariq Ibn Aziz Dammam Community College
No ratings yet
Introduction To Unix (CA263) Passing Arguments: by Tariq Ibn Aziz Dammam Community College
23 pages
Device Drivers Part 9
No ratings yet
Device Drivers Part 9
8 pages
Exam Notes
No ratings yet
Exam Notes
2 pages
Spinners: Populate The Spinner With User Choices
No ratings yet
Spinners: Populate The Spinner With User Choices
3 pages
The AmigaDOS Manual
100% (2)
The AmigaDOS Manual
304 pages
How To Install Asterisk 11 in RHEL - CentOS - Fedora and Ubuntu - Debian - Linux Mint
No ratings yet
How To Install Asterisk 11 in RHEL - CentOS - Fedora and Ubuntu - Debian - Linux Mint
7 pages
Lecture Three - Files, Structures and Memory Allocation: What These Lecture Notes Cover
No ratings yet
Lecture Three - Files, Structures and Memory Allocation: What These Lecture Notes Cover
10 pages
File Operations in C
No ratings yet
File Operations in C
9 pages
Chapter 8
No ratings yet
Chapter 8
10 pages
Install LEMP (Linux, Nginx, MySQL 5.5.29, PHP 5.4 PDF
No ratings yet
Install LEMP (Linux, Nginx, MySQL 5.5.29, PHP 5.4 PDF
13 pages
Microsoft - Visual FoxPro - Excel Import
No ratings yet
Microsoft - Visual FoxPro - Excel Import
7 pages
UiPath Notes
No ratings yet
UiPath Notes
73 pages
Sed Commands
No ratings yet
Sed Commands
8 pages
UNIX Interview Questions
No ratings yet
UNIX Interview Questions
25 pages
Android - How To Use Spinner - Stack Overflow
No ratings yet
Android - How To Use Spinner - Stack Overflow
3 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
Import MS Excel Data To SQL Server Table Using C# - CodeProject
No ratings yet
Import MS Excel Data To SQL Server Table Using C# - CodeProject
4 pages
PC - Linux Magazine 03-2007
No ratings yet
PC - Linux Magazine 03-2007
43 pages
Busy Developers' Guide To HSSF and XSSF Features
No ratings yet
Busy Developers' Guide To HSSF and XSSF Features
23 pages
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Rahul Wants An Ideological War With RSS, But Cong Needs An Ideology First
No ratings yet
Rahul Wants An Ideological War With RSS, But Cong Needs An Ideology First
3 pages
Journals-115120901213 - 1.html: Sonia Gandhi The Associated Journals Registrar of Companies
No ratings yet
Journals-115120901213 - 1.html: Sonia Gandhi The Associated Journals Registrar of Companies
1 page
For Uprooting Evil, Force Is Inevitable
No ratings yet
For Uprooting Evil, Force Is Inevitable
2 pages
Historic Eight Documents
No ratings yet
Historic Eight Documents
44 pages
Media Release - SS 644
No ratings yet
Media Release - SS 644
9 pages
Internship Report
No ratings yet
Internship Report
47 pages
Topic 7 Forensic 2
No ratings yet
Topic 7 Forensic 2
7 pages
Zsjet-Lab052pr10 130102
No ratings yet
Zsjet-Lab052pr10 130102
25 pages
Mba (Tech.) Placements: Mukesh Patel School of Technology Management & Engineering
No ratings yet
Mba (Tech.) Placements: Mukesh Patel School of Technology Management & Engineering
21 pages
En 14331 PDF
No ratings yet
En 14331 PDF
5 pages
Ppe Unit-4
100% (1)
Ppe Unit-4
29 pages
Ultrasonic Listening Device
No ratings yet
Ultrasonic Listening Device
6 pages
The TBV Advantage
No ratings yet
The TBV Advantage
24 pages
Eplc Training Plan Template
No ratings yet
Eplc Training Plan Template
11 pages
Tracking A Sine Wave: Fundamentals of Kalman Filtering: A Practical Approach
No ratings yet
Tracking A Sine Wave: Fundamentals of Kalman Filtering: A Practical Approach
83 pages
Sa5150 750adv
No ratings yet
Sa5150 750adv
2 pages
TBI Fuel Only
No ratings yet
TBI Fuel Only
15 pages
GHHKH
No ratings yet
GHHKH
8 pages
Algiz 7 QSG
No ratings yet
Algiz 7 QSG
4 pages
Unit Hydrograph Examples
No ratings yet
Unit Hydrograph Examples
13 pages
Static Mixers Waste & Water
No ratings yet
Static Mixers Waste & Water
3 pages
A B C D: Model Fasies Pengendapan (Horton, 1996)
No ratings yet
A B C D: Model Fasies Pengendapan (Horton, 1996)
1 page
STTP 2020
No ratings yet
STTP 2020
2 pages
Vinafix - VN - ASUS K50IN PDF
No ratings yet
Vinafix - VN - ASUS K50IN PDF
91 pages
Lifting Equipment Inspection
No ratings yet
Lifting Equipment Inspection
113 pages
Aa19721 03
No ratings yet
Aa19721 03
5 pages
A-1 0
No ratings yet
A-1 0
4 pages
Peugeot SBox Install Instructions
No ratings yet
Peugeot SBox Install Instructions
3 pages
Chapter 6 Solution PDF
No ratings yet
Chapter 6 Solution PDF
7 pages
Metric Series: Plain Washers BS 4320
No ratings yet
Metric Series: Plain Washers BS 4320
5 pages
For: P: Raymond R. Balino
No ratings yet
For: P: Raymond R. Balino
18 pages
2 - Compressibility of Soils
No ratings yet
2 - Compressibility of Soils
34 pages