sed
Sed
sed ("stream editor") is a tool that can parse a file line by line, and transform text, using a compact programming language that can fit in one line. Sed is a powerful tool with a big array of possible commands, but the most common one is the substitution, in which we find a pattern and substitute it for another string.
sed 's/patternA/patternB/' file.txt > output.txt
if we want to change pattern A to B in every ocurrance of patternA, we would do:
sed 's/patternA/patternB/g' file.txt > output.txt
if we instead want to change pattern A to B only a specific number of occasions, we would specify the number instead:
sed 's/patternA/patternB/4' file.txt > output.txt
All previous commands print the entire text file back, only changing the patterns specified, but we can also print only the lines that get a change:
sed 's/patternA/patternB/p' file.txt > output.txt
One interesting function of sed is that we can use any character as substitution character! Thus, instead of always having to have the format 's/a/b/', we can use any other character, such as | @ ^ ! =. For example, this is extremely useful when wanting to change paths from a file, which will use the character / already. For example, we want to change an imaginary path from /home/evomics/bin to /usr/bin, then, we could type:
sed ‘s@/home/evomics/bin@/usr/bin@g’
This could also be fixed by escaping the character, which means, forcing the character to be read as just the string text it is. For that, we use the backslash \:
sed ‘s/\/home\/evomics\/bin/\/usr\/bin/g’
As you can see, sed commands can become quite difficult to read sometimes. It is good to find the delimiters first, and try to go one part at a time to read it and understand what it is doing:
sed ‘s/\/home\/evomics\/bin/\/usr\/bin/g’
The beauty of sed is the possibility to keep the code to one-liners. We can do multiple substitutions at once in one same file with one only command, for that, we would separate it with the character ;. For example:
sed 's/patternA/patternB/g;s/patternC/patternD/g' file.txt > output.txt
Finally, an important feature of sed is the possibility to save information from initial pattern. One way to do that is to use the symbol & on the replacement delimiter, which will translate into the entire first pattern.
sed 's/patternA/>&/g' file.txt > output.txt
The above command will transform every instance of patternA to >patternA (for example, creating fasta headers).
We can also save groups of the pattern using parenthesis:
sed 's/\(pattern\)A/\1\ B/g' file.txt > output.txt
The command above will save the string pattern, in between the parenthesis, and be referred in the substitution pattern as \1. The command would change every patternA to pattern B, adding a space as well, which we also need to escape with the backslash (\). We can save as many groups of patterns as we want, and refer to them as \1, \2, etc.
Finally, remember the power of regular expressions!!
You can use regular expressions to find a series of patterns that have something in common, and change all of them at the same time.