grep
Grep
The command grep will print the lines matching a given pattern.
grep PATTERN file
grep -e PATTERN file (Pattern uses regex)
The different flags for the command grep can allow us to explore that specific pattern in more detail:
Counting the ocurrences of a string:
grep -c 'string' file
Printing only the lines not containing the string:
grep -v 'string' file
Printing line numbers of matches with string:
grep -n 'string' file
We can specify range of the output and input:
Prints string, not the line:
grep -o 'string' file
String must match the entire line:
grep -x 'string' file
We can also use grep on multiple files, with the l and L options.
Lists files containing the string:
grep -l 'string' /home/genomics/*txt
Lists files not containing the string:
grep -L 'string' /home/genomics/*txt
Example of grep on a fastq file. Find a specific sequence among your reads:
grep "ACGT" file.fastq
If we observe the fastq header (using head), we can also grep all the sequence identifiers and print them:
grep "@A" file.fastq
We can use grep with the flag -c to count how many reads we have:
grep -c "@A" file.fastq
Example of grep on a fasta file. Print all the sequence headers:
grep ">" file.fasta
We can also count the number of sequences by piping it to wc -l:
grep ">" file.fasta | wc -l
Example of grep on a VCF file. Print every line that is not a comment:
grep -v "#" file.vcf