Texts




$ tee # copies standard input to standard output, making a copy in zero or more files. Used to store intermediate stages of a pipeline to a file while letting pipeline to run.
$ crontab -l | tee crontab-backup.txt | sed 's/old/new/' | crontab –


sed (regular expression)
sed -e 's/oldstuff/newstuff/g' inputFileName > outputFileName
echo Sunday Monday | sed 's/day/night/g'   -> prints Sunnight Monnight
sed as a filter in a pipeline:
generate_data | sed -e 's/x/y/g'

sed doesn't support \d. for | use \|     

$ cd code/resources/test/kbaoutput
$ cat fx* xf* > submission.txt
$ vim submission.txt
    :%s/\n#/#/g             to combine the comment at the end the line
    :w! submission.txt.oneLiner
$ wc -l submission.txt.oneLiner                        # total # of documents
$ sort -k8 submission.txt.oneLiner > submission.txt.oneLiner.sorted            # sort by date-hour
$ uniq submission.txt.oneLiner.sorted > submission.txt.oneLiner.sorted.uniq           
$ wc -l submission.txt.oneLiner.sorted.uniq              # total # of unique slot values
$ vim submission.txt.oneLiner.sorted.uniq
    :%s/#/\r#/g                  # bring the comments back where they were
    :w! submission.txt.oneLiner.sorted.uniq.ready                      # READY FOR SUBMISSION

To get statistics as which slots we have: 
$ cut -d" " -f9 submission.txt.oneLiner.sorted.uniq | sort | uniq >  slotNames     # the slot names we found
Total slots are here: http://trec-kba.org/trec-kba-2013.shtml#ssf

To get which entities we found slot values for:
$ cut -d" " -f4 submission.txt.oneLiner.sorted.uniq | sort | uniq >  entities



wc
-l  count the number of lines : find -name '*.xml' | wc -l    count the number of files found
-L max-line-length
-c count the number of characters


tail -f filename   for very large files, it just shows the very last lines of the file and updates to it, such as catalina.out
tail -f run?Log.txt       will get the tail of all run0Log.txt to any one character number files.

XML Formatting on Command Prompt

Use xmllint  $sudo apt-get install libxml2-utils and you can call
$  xmllint --format ugly.xml --output pretty.xml

cut
$ cut -f2,5 *ptt  # take 2nd nd 5th column
$ cut -c1-20  *ptt   # take characters 1 through 20

reorder columns:
$ tail -n+3 *ptt | cut -f1 > locs      # tail starting from line 3
$ tail -n+3 *ptt | cut -f5 > genes
$ paste genes locs genes | head

nl
$ nl a.txt          # put line numbers for each line

sort
$ sort a.txt
$ sort -r a.txt   # reverse
$ sort -R a.txt # randomize

split # split large files into pieces
$ split -d -l 1000 a.txt    # split to 1000 lines each. numerical output format

























Subpages (2): awk sed
Comments