Sed stands for Stream Editor. It is a powerful utility that can be used for manipulating text and files . An example of what "sed" can do. There is an attached file to this page named "6_29.cpp" . There are line numbers at the beginning of this file We can use the following command to remove the line numbers.
cat 6_29.cpp | sed -r 's/^ *[0-9]+//g'
The re "^ *[0-9]+" is stating that we can have any number of spaces at the beginning followed by at least one numerical digit and if so we remove that match.
cat 6_29.cpp | sed -r '/\/\*/,/\/*\//d'
The command looks for a line with the pattern matching "/*" and then another line that matches the end pattern "*/" and deletes these lines along with any lines in between.
The sed command takes a string
/../../
The "s" states that we are using the substitution command. We specify the pattern and what to replace the pattern with. The part between the first 2 slashes is the pattern and the part between the second and third slash is the replacement string. This is one use of sed and we shall see other ways that sed can manipulate text .
[amittal@hills sed]$ echo "Lemon tree" | sed 's/tree/juice/'
Lemon juice
[amittal@hills sed]$
[amittal@hills sed]$
We do not have to use the forward slash as a separator and can essentially use any character . Using the question mark:
[amittal@hills sed]$ echo "Lemon tree" | sed 's?tree?juice?'
Lemon juice
[amittal@hills sed]$
[amittal@hills sed]$
$ echo "Lemon tree" | sed 's_tree_juice_'
Lemon juice
The below expression replaces any word starting with t or a word that has a t inside it.
$ echo "Lemon tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
Lemon juice juice juice
$ echo "Lemon atree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
Lemon ajuice juice juice
Exercises
1) What does the following do ?
$ echo "this is something for tom." | sed -r 's/^t/T/' | sed -r 's/ t/ T/'
2)
The problem with the below command is that ir changes the words beginning with "t" but also changes a word if t is in the middle of the word. Change it so that only words that begin with the letter "t" are modified. Spaces should be preserved as in the original string.
echo "temon its tree tank top" | sed -r 's/t[a-zA-Z]+/juice /g'
Solutions
2)
echo "temon its tree tank top" | sed -r 's/^t[a-zA-Z]+/juice/g' | sed -r 's/ t[a-zA-Z]+/ juice/g'
Also using the pipe as or but then problem with spaces.
echo "temon its tree tank top" | sed -r 's/(^t[a-zA-Z]+| t[a-zA-Z]+)/juice/g'
The "&" symbol gives us the matched string .
$ echo "Lemon tree" | sed -r 's/tree/& &/'
Lemon tree tree
Rest of the string that is not matched stays the same.
$ echo "Lemon 5-6" | sed -r 's/[+,-]/ & /'
Lemon 5 - 6
In the above whenever we see a "+" or a "-" symbol in the input string we place spaces around it.
[amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/& &/'
123 123 abc
The pattern that was matched was "123" and that got repeated with "& &" .
[amittal@hills sed]$ echo "123 abc" | sed -r 's/[0-9]+/(&)/'
(123) abc
The above line puts brackets around the number "123" . What if we wanted to get rid of the words "abc" and only have "(123)" as the output. We could do something like :
echo "123 abc" | sed -r 's/ [a-zA-Z]+//' | sed -r 's/[0-9][0-9]*/& &/'
$ ./sed2.sh
123 123
We can do this in a better way because sed allows us to specify a particular pattern in our regular expression string.
Exercises:
1) Place the command
echo "123 abc" | sed -r 's/[0-9]+/& &/'
in a shell script and then run the shell script. This method has the advantage of being able to edit the text file and the command is saved for future reference.
Using "()" and "\1"
We can use "() \number" syntax to further isolate patterns and select particular patterns.
[amittal@hills sed]$ echo "123 abc" | sed -r 's/(^[0-9]+) .*/\1/'
123
In the above example the brackets match the number and the rest of the line is matched by the pattern " .*" . The substitute section only has "\1" and the pattern in bracket is matched while the rest of the line is truncated.
The brackets "()" match the pattern "\1" and the next brackets will match "\2". We will get an error if the round brackets do not match the pattern number.
$ echo "123 abc" | sed -r 's/^[0-9]+ .*/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
We are missing the round brackets in the pattern.
$ echo "This is a lemon tree" | sed -r 's/(is) (a)/\2 \1/'
This a is lemon tree
In the above the patterns are "is" and "a" .
The below line shows how we can switch the first and the second word.
[amittal@hills PartOfPattern]$ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2 \1/'
are We in a unix scripting class.
[amittal@hills PartOfPattern]$
What if we wanted to grab the second word only from the above example:
echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+).*/\2/'
There is usually more than one way to write something.
echo "We are in a unix scripting class." | sed -r 's/^[A-Za-z]+ ([A-Za-z]+).*/\1/'
$ echo "We are in a unix scripting class." | sed -r 's/(^[A-Za-z]+) ([A-Za-z]+)/\2/'
are in a unix scripting class.
We are replacing the first 2 words by just the second word.
We can place "\1" on the left hand side also .
[amittal@hills PartOfPattern]$ echo "This This contains a mistake." | sed -r 's/([A-Za-z]+) \1/\1/'
This contains a mistake.
[amittal@hills PartOfPattern]$
[amittal@hills PartOfPattern]$ echo "This contains contains a mistake." | sed -r 's/([A-Za-z]+) \1/\1/'
This contains a mistake.
Removing duplicated words at the beginning and end of the line:
echo "This contains a mistake. This" | sed -r 's/(^[A-Za-z]+)(.*)\1$/\1\2/'
Removing duplicated words.
$ echo "This contains This a mistake." | sed -r 's/(^[A-Za-z]+)(.*)\1/\1\2/'
This contains a mistake.
Exercises
1)
Assume we have a string "We are in a unix scripting class." |
Switch the first and last word.
Switch the first and third word.
Switch the first and third word and remove the second word.
echo "We are in a unix scripting class." | sed -r 'TODO'
Output should be as:
class. are in a unix scripting We
in are We a unix scripting class.
in We a unix scripting class.
2)
-n and p
The flag -n means that lines will not be output to the console.
Ex:
data.txt
This is a test.
The dog is chasing the cat.
A test is coming up.
Are we having fun in this class ?
sed -n 's/test/Test/' data.txt
The "-n" option suppresses the output so we don't get any output printed to the console at all. If we use the "p" flag then the lines that match will get printed out.
[amittal@hills Flags]$ sed -n 's/test/Test/p' data.txt
This is a Test.
A Test is coming up.
The "-n" option suppressed the lines that would normally get printed out and the "p" option prints out the lines that match. What if we have only the "p" option and not the "-n" option.
[amittal@hills Flags]$ sed 's/test/Test/p' data.txt
This is a Test.
This is a Test.
The dog is chasing the cat.
A Test is coming up.
A Test is coming up.
Are we having fun in this class ?
All the lines in the file "data.txt" get printed out and the lines matching the pattern also get printed out.
We can use both the -n and -p flag to simply print the lines that match and not replace anything .
sed -rn '/([a-z]+) \1/p'
The above will print the lines that contain a duplicate word. In this way the sed command is working like a grep.
$ cat data.txt | sed -rn '/fun/p'
Are we having fun in this class ?
The above command prints the lines that have the word "fun" in them.
Exercises
1) What does the below print ?
cat 6_29.cpp | sed -nr 's/([0-9]+)/\1/p'
Flag g
The flag "g" will make the replacements globally.
[amittal@hills Flags]$ echo "Testing the Tesla car." | sed 's/Tes/TES/'
TESting the Tesla car.
Normally the substitution is done on the first pattern match that sed found. Using the "g" flag causes the replacements to occur throughout the line.
[amittal@hills Flags]$ echo "Testing the Tesla car." | sed 's/Tes/TES/g'
TESting the TESla car.
[amittal@hills Flags]$ echo "Testing the Tesla car." | sed -r 's/[^ ]+/(&)/g'
(Testing) (the) (Tesla) (car.)
The above line uses the not operator to mean a combination does not contain a space.
Exercise
1)
echo "suden unflatering noncommital subcommitee" | sed -r 'TODO'
Complete the sed command above so that the d is replaced by 2 dd's and one t is replaced by 2 t's .
2)
Remove both the duplicated words.
This This contains a mistake mistake
Specifying the occurrence
We can specify which matching pattern should be applied.
[amittal@hills Flags]$ echo "Testing the Tesla car." | sed -r 's/[^ ]+/(&)/4'
Testing the Tesla (car.)
In the above line we are stating that the pattern match should apply to the 4th word only. We can use the number with the "g" flag to specify apply the match pattern to the nth occurrence and beyond.
[amittal@hills Flags]$ echo "Testing the Tesla car." | sed -r 's/[^ ]+/(&)/2g'
Testing (the) (Tesla) (car.)
In the above line we are stating that apply the pattern matching to 2nd word and beyond.
Exercises:
1) Write a sed command to work on the file "data.txt" to keep just the first 3 words in each line.
Solution
1)
$ cat data.txt | sed -r 's/[^ ]+//4g'
Writing the output to a file
File even.txt contents
22 Even number
23 Odd Number
24 Even Number
25 Odd Number
sed -n 's/^[0-9]*[02468] /&/w even' even.txt
Contents of the file "even" :
22 Even number
24 Even Number
This can also be done using redirection:
sed -n 's/^[0-9]*[02468] /&/' > even.txt
You can also combine flags such as:
sed -n -r 's/^[0-9]*[02468] /&/w even' even.txt
sed -nr 's/^[0-9]*[02468] /&/w even' even.txt
[amittal@hills Flags]$ var1=`sed -n 's/^[0-9]*[02468] /&/p' even.txt`
[amittal@hills Flags]$ echo $var1
22 Even number 24 Even Number
[amittal@hills Flags]$ echo "$var1"
22 Even number
24 Even Number
[amittal@hills Flags]$
When we print the value of "$var1" without the quotes then the shell comes into play and takes out the newlines .
Exercises
1) Using the file "even.txt" place the values 22,23,24,25 in a variable "var1" . There should only be spaces between the numbers . Write your commands in a script and execute the script. Do not hard code the numbers.
$ ./ex_even.sh
22 23 24 25
Ignoring case
$ echo "cAt ate the fish" | sed -r 's/cat/Cat/i'
Cat ate the fish
echo "Abc" | sed -n '/abc/I p'
We can use the capitol letter "I" character to ignore the case. The above sed does not substitute but merely searches for a pattern in a manner similar to grep.
Exercises
1) What does the below do
$ echo "a dog jumps A fence" | sed -n 's/a/A/2ipw data'
Multiple Commands
Instead of pipes we can use "-e" option to give multiple commands.
[amittal@hills Flags]$ echo "cab is coming" | sed -e 's/a/A/g' -e 's/c/C/g'
CAb is Coming
Exercises:
1) What does the below print ?
$ echo "cab is coming" | sed -e 's/a/A/g' -e 's/A/C/ig'
Using Multiple FileNames
Sed can work with multiple files at the same time. Assume we have the following files.
File "f1"
#12 This is the first line in file f1.
#Abc This is the second line in file f1
File: "f2"
#132 This is a line in f2.
#Abcdef This is another line in f2
$ sed -r 's/^#[^ ]+ //' f1.txt f2.txt
Output:
$ sed -r 's/^#[^ ]+ //' f1.txt f2.txt
This is the firstline in file f1.
This is the second line in file f1
This is a line in f2.
This is another line in f2
Exercises:
1) Use the above sed command to save the output in a variable "var1" and output the contents of the variable.
Using Sed to work with a script in a file
If we have many commands we can place the commands in a file and use the "-f" option to run the sed command.
contents of the "myscript" file.
# sed comment - This script changes lower case vowels to upper case
s/a/A/g
s/e/E/g
s/i/I/g
s/o/O/g
s/u/U/g
[amittal@hills FileBased]$ echo "cat is sitting on the roof" | sed -f myscript
cAt Is sIttIng On thE rOOf
Exercises
1) Place your sed command in a file to increment a 2 digit number so that each digit gets converted to the one higher with 9 getting converted to 0 .
45 -> 56
91 -> 02
99 -> 00
00 -> 11
Printing a specific line using Sed
Let's create a file called "data.txt" containing the following 10 lines.
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
To print out the 5th line we can use the command:
[amittal@hills LineNumbers]$ cat data.txt | sed -n 5p
Line 5
To print out lines 3 to 5 we can use:
[amittal@hills LineNumbers]$ cat data.txt | sed -n 3,5p
Line 3
Line 4
Line 5
We can also specify that a pattern should apply to a specific line.
[amittal@hills LineNumbers]$ cat data.txt | sed -n '3 s/3/31/p'
Line 31
[amittal@hills LineNumbers]$ cat data.txt | sed -n '4 s/3/3/p'
[amittal@hills LineNumbe
Applying a range of line numbers
[amittal@hills LineNumbers]$ cat data.txt | sed -n '1,4 s/3/31/p'
Line 31
The above states that look in the lines 1 to 4 and apply the substitute operation if a match for the string "3" is found.
$ cat data.txt | sed '1,4 s/Line/LINE/'
LINE 1
LINE 2
LINE 3
LINE 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
The above states that look in the lines from 1 to 4 and change the small "Line" to "LINE" .
The "$" sign means till the end of the file.
$ cat data.txt | sed '3,$ s/Line/LINE/'
Line 1
Line 2
LINE 3
LINE 4
LINE 5
LINE 6
LINE 7
LINE 8
LINE 9
LINE 10
Searching for a range by pattern :
[amittal@hills LineNumbers]$ sed '/3/,/5/ s/Line//' data.txt
Line 1
Line 2
3
4
5
Line 6
Line 7
Line 8
Line 9
Line 10
We apply the substitute command upon encountering the first pattern up to the second pattern.
We can apply the range and pattern also.
$ cat data.txt | sed '2,/4/ s/Line/LINE/'
Line 1
LINE 2
LINE 3
LINE 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
The above states that start at line 2 and then go up the line that contains the pattern "4" .
Exercises:
File: "data1.txt" This is a test.
BEGIN The dog is chasing the cat.
A test is coming up.
Are we having fun in this class ? END
Some more lines.
Write a sed command that will place a "#" in the section marked BEGIN to END .
Deleting a line:
[amittal@hills LineNumbers]$ sed 3d data.txt
Line 1
Line 2
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Deleting by a range:
$ sed 1,3d data.txt
Deleting by a pattern:
[amittal@hills LineNumbers]$ sed '/3/ d' data.txt
Line 1
Line 2
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
sed '5,$ d' data.txt
$ sed '5,$ d' data.txt Line 1 Line 2 Line 3 Line 4
Exercise
In this exercise we combine the line range with a pattern. Write a sed command to delete from line "1" to the pattern "3" .
Adding a line after a pattern match:
[amittal@hills LineNumbers]$ sed '/3/ a\ Add' data.txt
Line 1
Line 2
Line 3
Add
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Adding a line after a line number.
sed '3 a\ Add' data.txt
Adding a line at the end of the file.
sed '$ a\ Add' data.txt
Changing a line using the "c" flag
[amittal@hills LineNumbers]$ sed '/3/ c\ Change a line' data.txt
Line 1
Line 2
Change a line
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
Exercises
1) Write a shell script using sed and line ranges to create a file "data1.txt" with the following contents.
Line 6
Line 7
Line 8
Line 9
Line 10
Line 1
Line 2
Line 3
Line 4
Line 5
Adding a line number.
The "=" command can be used to insert line numbers before each line.
$ sed = data.txt 1 Line 1 2 Line 2
File: "data2.txt"
Line a Line b Line c Line d Line e Line f Line g Line h Line i Line j
$ sed -n '/c/ =' data2.txt 3
The above states that match the line with "c" in it and print it's line number.
Transforming Characters
$ sed 'y/ie/IE/' data.txt LInE 1 LInE 2 LInE 3
We use the "y" option to state that "i" should be changed to "I" and "e" should be changed to "E" .
Exercise
1)
Place your sed command in a file to increment a 2 digit number so that each digit gets converted to the one higher with 9 getting converted to 0 .
45 -> 56
91 -> 02
99 -> 00
00 -> 11
Do the above using the "y" option with sed.
The "/u" option
$ echo "cat" | sed -r 's/.$/\u&/'
caT
$ echo "cat" | sed -r 's/.*/\u&/'
Cat
The small "/u" option turns the next character into upper case.
$ echo "cat" | sed -r 's/.*/\U&/'
CAT
Grouping
File: "data.txt"
BEGIN The dog is chasing the cat.
A test is coming up.
#Comment1
Are we having fun in this class ? END
Some more lines.
#Comment2
File: "1.sh"
sed -n '
/BEGIN/,/END/ {
s/#.*//
/^$/ d
p
}
'
Grouping allows us to combine multiple sed commands together.