Option 1
cat file1name file2name | sort | uniq > outputfilename
this will sort ascending
to sort descending add -r option to sort:
cat file1name file2name | sort -r | uniq > outputfilename
Option 2:
sort file1name file2name | uniq -u > diffLines
sort file1name file2name | uniq -d > duplicates
How to compare two files line by line?
comm -1 -2 <(sort first.txt) <(sort second.txt)
wc -l myfile.txt | cut -d' ' -f1
or:
count=`wc -l myfile.txt | cut -d' ' -f1`
sed substitute command: match a regular expression and replace it by something else
sed s/"<www"/"<http:\/\/www"/ mappingbased_properties_en.nt >mappingbased_properties_en_fixed.nt
changing "www" in the "mappingbased_properties_en.nt" file to "http://www" in the "mappingbased_properties_en_fixed.nt" file:
sed -e 's/\//,/g' results.csv > results-all.csv
split -l 1000 file.nt
will split files into separate ones of lenth 1000
add ">" at the end of each line ($ is a regex for end of the line):
sed 's/$/>/' myfile
This will not modify the file. To modify the file add option -i:
sed -i 's/$/>/' myfile
OR
save it to another file:
sed 's/$/>/' myfile > anotherfile
add "<" at the beginning of each line (^ is a regex for end of the line):
sed 's/^/</' myfile
This will not modify the file. To modify the file add option -i:
sed -i 's/^/</' myfile
OR
save it to another file:
sed 's/^/</' myfile > anotherfile
for i in *.*; do mv "$i" "$i.n3"; done
merge content of file1 with all other files in your dir (e.g. add a fixed header with file1 content at the top of each file in your dir):
for i in *.n3; do cat file1 "$i" > "$i.withhead.n3"; done
for i in *.n3; do ascii2uni "$i" > "$i.utf8.n3"; done
e.g. if you have a string with ...."..."..."..".....
and need to convert it to:
...."...'...'..".....
do this:
for i in *.n3; do sed 's/\([\"].*\)[\"]\(.*\)[\"]\(.*[\"]\)/\1'\2'\3/g' "$i" > "$i.quotesfixed.n3"; done
for i in *.n3; do sed 's/\(.*\)[\\]\(.*\)/\1 \2/g' "$i" > "$i.bfixed.n3"; done
e.g. if you have a string with ...."..."..".....
and need to convert it to:
....".....".....
i.e. remove the middle one
for i in *.n3; do sed 's/\([\"].*\)[\"]\(.*[\"]\)/\1 \2/g' "$i" > "$i.3qfixed.n3"; done
count numer of lines of the files with extension .txt in your specified dir:
find /thepathtothedir -maxdepth 1 -name "*.txt" -print0 | xargs -0 -n 1 wc -l
to put that list into a file use:
find . -maxdepth 1 -name "*.txt" -print0 | xargs -0 -n 1 wc -l > lines.txt
Count the files with 0 lines:
grep "^0 " lines.txt | wc -l
or with 10 lines:
grep "^10" lines.txt | wc -l
http://www.tutorialspoint.com/unix/unix-regular-expressions.htm
Input file: expected.constraints in a form:
name Person:0.9 Organisation:0.1
awk '{print $1,$2}' expected.constraints|tr ':' ' '|sort -k 3 -rn|column -t|less
Output:
name Person 0.9
It is very simple to read a CSV file using AWK such as:
awk -F "\"*,\"*" '{print $1 "\t" $2 "\t" $3}' test.csv
where your test.csv file looks like this:
first,second,third
1,2,3
4,5,6
7,8,9
If instead as input you get only the first line:
first,second,third
it is very likely that you need to fix your line endings as they might be CR.
To check you line endings do:
file test.csv
If what you get is:
test.csv: ASCII text, with CR line terminators
you will need to remove CR line terminators.
You can use dos2unix package for that. If you don't have it installed, you can get it using brew:
sudo brew install dos2unix
Finally, you can remove them as following:
mac2unix roles.csv
mac2unix: converting file roles.csv to Unix format...
Now run your awk script again, and it will work.
---
awk -F'\t' 'BEGIN{OFS="\t"}{print 0,0,"somelabel",$2,$3}' input.tsv > output.tsv
--
awk -F'\t' '{print $3}' input.tsv |cut -d' ' -f 3-
Assume you have the format
0 label some text with words
The command above with print 'some text with words'.
Use tr -d '[:blank:]'
for example, in a file that reads
TOPIC#: CARS
the below command will extract CARS without any spaces:
topicid=`grep "TOPIC#:" "$i" | awk -F':' '{print $2}' | tr -d '[:blank:]' `