sed

Sed

sed ("stream editor") is a tool that can parse a file line by line, and transform text, using a compact programming language that can fit in one line. Sed is a powerful tool with a big array of possible commands, but the most common one is the substitution, in which we find a pattern and substitute it for another string. 

sed 's/patternA/patternB/' file.txt > output.txt

if we want to change pattern A to B in every ocurrance of patternA, we would do:

sed 's/patternA/patternB/g' file.txt > output.txt

if we instead want to change pattern A to B only a specific number of occasions, we would specify the number instead:

sed 's/patternA/patternB/4' file.txt > output.txt

All previous commands print the entire text file back, only changing the patterns specified, but we can also print only the lines that get a change:

sed 's/patternA/patternB/p' file.txt > output.txt


One interesting function of sed is that we can use any character as substitution character! Thus, instead of always having to have the format 's/a/b/', we can use any other character, such as | @ ^ ! =.  For example, this is extremely useful when wanting to change paths from a file, which will use the character / already. For example, we want to change an imaginary path from /home/evomics/bin to /usr/bin, then, we could type:

sed ‘s@/home/evomics/bin@/usr/bin@g’

This could also be fixed by escaping the character, which means, forcing the character to be read as just the string text it is. For that, we use the backslash \:

sed ‘s/\/home\/evomics\/bin/\/usr\/bin/g’

As you can see, sed commands can become quite difficult to read sometimes. It is good to find the delimiters first, and try to go one part at a time to read it and understand what it is doing:

sed ‘s/\/home\/evomics\/bin/\/usr\/bin/g’


The beauty of sed is the possibility to keep the code to one-liners. We can do multiple substitutions at once in one same file with one only command, for that, we would separate it with the character ;. For example:

sed 's/patternA/patternB/g;s/patternC/patternD/g' file.txt > output.txt


Finally, an important feature of sed is the possibility to save information from initial pattern. One way to do that is to use the symbol & on the replacement delimiter, which will translate into the entire first pattern.

sed 's/patternA/>&/g' file.txt > output.txt

The above command will transform every instance of patternA to >patternA (for example, creating fasta headers).


We can also save groups of the pattern using parenthesis:

sed 's/\(pattern\)A/\1\ B/g' file.txt > output.txt

The command above will save the string pattern,  in between the parenthesis, and be referred in the substitution pattern as \1. The command would change every patternA to pattern B, adding a space as well, which we also need to escape with the backslash (\). We can save as many groups of patterns as we want, and refer to them as \1, \2, etc.


Finally, remember the power of regular expressions!!

You can use regular expressions to find a series of patterns that have something in common, and change all of them at the same time.