Sunday, 14 December 2008

Mastering File Manipulation in Unix

***sed and grep two powerful filters in editing lines of the files***

sed syntax 1 : sed options 'address action' file(s)

some useful sed commands:

In line addressing :

1)sed -n -e '1,2p' - prints first 2 lines of the file.
(-e option is to use multiple instructions, -n options is to suppress duplicate lines from printing)
(always use -n option when action p (print) is used)

2)sed -n '1,2!p' emp.txt - don't print 1 and 2 lines.

In Context addressing :

1)sed -n '/director/p' emp.lst - prints lines containing director string.

2)sed -n '/sriram/,/manager/p' emp.lst - prints group of lines containing from director till first manager line.

3)sed -n '1,/echo/p' - prints from line 1 to till first echo is encountered in the script.

writing selected lines to a file:

1) sed -n '/echo/w echo.txt ' - writes lines containing "echo" to echo.txt file.
( -n here also is used to suppress printing of all lines of the file on the terminal)

2) sed -n '/echo/w echo.txt
/while/w whiles.txt
/if/w if.txt' - writes echo lines,while lines,if lines into three seperate files.

3) sed -n '1,500w file1
501,$ file2' - writes first 500 lines of to file1 and the rest to file2.

Text editing insert-i,append-a,change-c and delete-d :

1) sed '/echo/d' > noecho.txt - select all lines except those containing "echo" and saves into noecho.txt
(this is same as grep -v "echo", don't use -n with action d(delete))

2) sed '^[space,tab]*$/d' - removes all blank line from the file

Substitutions :

sed syntax 2 : [address]s/expression1/expression2/flags

1) sed '1,5s/director/manager/' emp.lst - replaces director with manager from the first five lines.

Note : In sed "s" is same as 1,$s

2)sed 's/|/:/g' ma.done - replace all pipes with colons.(g - global flag)

3)sed '1,5s/^/#/' - comments first five lines of script.

Note: when ^ or $ is used solely,it means that the target pattern should be placed at that locaiton.

4)sed 's/[Aa]gg*[ar][ar]wal/Agarwal/g' emp.lst - replaces aggrawal,agarwal,agrawal with "Agarwal"

Note: regular expression makes sed more powerful to use.

5)sed 's/ *|/|/g' - replaces multiple spaces with a single pipe in all the fields containing |.

6)sed 's/|//g' - removes pipe from all lines.

Interval Regular expression :

sed syntax 3 :

ch\{m\}' - meta character ch can occur m times in the file.
ch\{m,n\} - meta characte ch can occur between m and n times.
ch\{m,\} - ch can occur atleast m times.

Note: ch can be a literal character,a . (dot) or a class of characters (e.g:[0-9] )

1) grep '[0-9]\{10\}' mobiledir.txt - fetches lines containing mobile numbers from mobliedir.txt.

2) ls -l | sed -n '/^.\{2,5\}w/p' - prints the ls output which contains single character 'w'(write) in either 2nd or 5th position.

drwx------+ 3 sperumal ???????? 0 Oct 18 13:01 2090
-rwx------+ 1 sperumal ???????? 49355 Oct 10 17:57 test1.done
-rw-r--r-- 1 sperumal mkgroup-l-d 1459 Oct 18 23:49 passwd
-rwx-w----+ 1 sperumal ???????? 20137 Oct 17 14:03 site.dat
-rwx-w----+ 1 sperumal ???????? 11211 Oct 17 14:47 site1.dat
-rwx-w----+ 1 sperumal ???????? 20143 Oct 17 13:48 stream.dat
-rwx-w----+ 1 sperumal ???????? 11181 Oct 17 14:56 stream1.dat
-rwx-w----+ 1 sperumal ???????? 20139 Oct 17 14:43 tran.dat
-rwx-w----+ 1 sperumal ???????? 11319 Oct 17 15:06 tran1.dat

Tagged Regular Expression : (To break a line into groups and extract one or more of these groups)

sed syntax 4 : sed 's///' filename

echo `date` | sed 's/\([0-9]*\):\([0-9]*\):\([0-9]*\)/\3,\2,\1/'

output : Tue Oct 28 20,55,18 IST 2008

actual date : Tue Oct 28 18:55:20 IST 2008

Explanation : I want to print the output of date command by transforming the hh:mm:ss to ss,mm,hh style...
for this, target pattern I used is with labels \1,\2,\3 which matches the selected groups sequentially.

grep syntax : grep options "pattern" file(s)

BREs used in grep or sed :

g* - zero or more occurrences of a previous character.(g here)
g$ - matching at the end of the line(pattern g at end of the line)
. - matching a single character
^g - pattern g at the beginning of the line
[1-3] - A digit b/w 1 and 3 (includes 1 and 3 ) in matching
[abc] - Matches a single character a,b or c.
[^abc] - caret when used inside [] negates the pattern match(niether a,b or c)
[^a-zA-Z] - means selecting a non-alphabetic character
^echo$ - searches echo as the only word in the line
^$ - A line containing nothing (a blank line)

EREs used in grep or sed :

g+ - matches one or more occurrences of character g
g? - matches zero or one occurrences of character g
sri|ram - matches sri or ram
(sri|ram)dhar - matches either sridhar or ramdhar

Dive into sed through :

No comments:

Post a Comment

Tweets by @sriramperumalla