Bash Commands

Related scrolls


Resources

Debug

To check the value of a string of variables:

 for i in FYEAR FMNTH FDAY FHH FMM ; do eval "echo -e \"\\t$i is \$`eval echo $i`\"" ; done

To debug an entire script, use this at the head of the file:

 #!/bin/bash -xv

or run the script with

 bash -xv script.sh

or place these markers within the script to selectively turn debugging on and off:

 set -x
 set +x

The last line turns debugging off. For instance,

set -x
   # The latest log file
   LASTLOG="$( ls -1 $LDIR/*$STEM.$QNUM 2>/dev/null | tail -n 1 )"
 
   # Get the update time of the log file
   if [ -s "$LASTLOG" ]
     then AGELOG="$( date -r $LASTLOG +%s )" ; AGEDIFF=$[$[NOW-AGELOG]/60]
     else AGEDIFF=?
   fi
set +x

Bash often keeps going even if something is wrong, and the results are unpredictable. Use the set command to be more picky.

To make a script or line fail if a file doesn't exist, add

 set -e

To make a script or line fail if a variable is unset, use

 set -u 

To make a script or line fail if a pipe fails, use

 set -o pipefail

To check if your script is sane, you can also use ShellCheck.

To see what's going wrong in a program written in C, try strace:

  strace userhdhomerun

You might find a file or dependency is missing.

Process control

See Process management

Many scripts take as input a particular number of days ago. For instance, the backup-images-hourly script copies images from Roma to CSX:/tv and CSX:/sweep on a given day, N days ago, and there is a crontab on Roma with this line:

backup-images-hourly 1 x

To run this script for a sequence of days on the command line (or for that matter in a crontab), you can use this in a for loop:

for DAY in {20..01} ; backup-images-hourly $DAY ca ; done

Note you can also add a number outside of this sequence, like this:

for DAY in {20..01} 204 ; do echo $DAY ; done

which gives to the number 204 as the last number in the sequence; any additional numbers can be added after a space.

However, if you need one of the numbers to be a variable, use the seq command, called qseq on OSX:

N=20 ; for DAY in `seq -w $N -1 1` ; do backup-images-hourly $DAY ca ; done

This command generates a sequence of commands of the form

backup-images-hourly 20 ca
backup-images-hourly 19 ca

and so on to yesterday (the -w pads the number with leading zeros).

For loops can also take as input a simple list:

for i in z ab ac ad ae af ; do mkfs.xfs -f /dev/sd"$i"1 ; sleep 5 ; done

Combine with chopping strings -- in this case, strings that look like 2008_01s:

for i in `ls -1` ; do mv $i ${i%s} ; done  (renames 2008_01s to 2008_01)
for i in `ls -1` ; do echo ${i#2} ; done   (shows 008_01s)

Note that "for i in $(ls -1)" is not the same as "for i in `ls -1`" -- the first will generate a list of three files as the content of the $i variable; the second will assign one file name at a time as the content of the $i variable -- the latter result is usually what you want.

Scripts run from crontab typically start at regular intervals, but often you don't want two instances of the same script to run simultaneously. To protect against this, you can use flock in crontab (the simplest solution), or this stanza in the script itself:

if ps -ww | grep "$0" | grep -v grep | grep -v $$ > /dev/null ; then
  terminate "DUPLICATE RUNNING"
fi

Note that this method is advised against and often fails, since multiple PIDs may be generated from one script.

You can also use trap; avi2h264-bulk and -list may implement a trap. Or you could try this:

[ -e ~/.${SELF}.pid ] && exit 1
trap 'rm -f ~/.${SELF}.pid' 0 1 2 3 9
echo $$ > ~/.${SELF}.pid

For ${SELF}, use the name of the script instead. Greg has some better proposals. Directory reservation is successfully implemented by the mpg2h264-daemon script, which also successfully runs under flock in cron (see capture machines).

For loops can also use a file as input --

for FIL in `cat list` ; do
 # Reconstruct the path from a file name
 FFIL=/tv$(echo ${FIL%%_*} | sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\/\1\/\1-\2\/\1-\2-\3\//')
done

This can also be used to edit a succession of files from a list.

For loops can be used to feed a list of parameter values to grep; in the case of metadata tags, use three backslashes to escape the pipe symbol:

for i in TPF\\\| HED\\\| ; do TAG="$( egrep $i $FIL.txt )" ; echo $TAG ; done

See #sed for ideas on manipulating lines in metadata files.

Redirection

Bash output goes to "stardard error" and "standard out", or to a file, see redirection and redirection explained. Some utilities have quirks.

Redirect all output to file, such as null:

&> /dev/null

Merge mp4info's error output with its regular output, so that you can grep them:

if [ "$( mp4info *$ID.mp4 2>&1 | grep open )" != "" ] ; then rm -f *$ID.mp4 ; fi

In contrast, MP4Box doesn't need this redirection; just use a plain pipe:

if [ "$( MP4Box -info *$ID.mp4 | grep open )" != "" ] ; then rm -f *$ID.mp4 ; fi

See dl-CampaignAds.

Note this clever way to assign multiple variables in a single output:

read one two <<< $(date +'%d %m %y')
read STARTSEC ENDSEC <<< $(echo $LIN | egrep -o [0-9.]{14})
IFS="|" read STARTSEC ENDSEC o <<< "$s"

Elegant -- and can likely be extended.

If the delimiter is not a space, see these excellent instructions -- note the variable must be quoted, but not echoed:

 IFS="|" read TS1 TS2 PT TXT <<< "$LIN"

where the variables can be extracted on the fly, here by incremental line number:

 IFS="|" read START2 o o TYPE <<< "$( sed -n "$[LNUM+1] p" $FIL )"

or somewhat more clumsily:

 while IFS='|' read -r SDAT EDAT PTAG CSTYL TXT ; do
   <all processing has to happen within the while loop>
 done <<< $( echo "$LIN" )

A more flexible solution in this case is to use sed to assign the fields to an array:

 FLD=( $( echo "$LIN" | sed -e 's/|/\n/g' ) )

Interaction

To give the user a choice,

A simple forced choice:

read -p "Press y [Enter] to approve or just [Enter] to cancel. " RESPONSE
if [[ "$RESPONSE" = [Yy] ]]

A timeout to no:

echo -en "\tDo you want to adjust the timestamp by an hour?"
read -p "Press y within 10 seconds to approve or any other key to skip. " -n 1 -t 15 -s RESPONSE
if [[ "$RESPONSE" = [Yy] ]] ; then DIF=$[ $NEG$DIF + 3600 ] ; fi

Is there a way to timeout to yes, but leave the option of no? Oddly I'm not seeing it.

For more, see bash read builtin command and Asking yes-no questions (Linux journal).

Echo a counter in place, one tabstop in

tput ht ; echo -en $n ; tput cr

Save and recall the cursor positon

tput sc ; tput rc

tput commands control cursor movement.

tar and zip

To zip files, use something like

 zip -r `date +%F`.zip images/

To search a zip file:

 zipgrep string *.zip

Or a gzip file:

 zgrep cartago *z

To unzip, use

 unzip file.zip

tar is useful when you need to copy a very large number of small files, such as txt files or thumbnails. tar assembles all the files into a single file before they're copied, so the process is speeded up significantly compared to mv, cp, or rsync.

You can copy files within the same machine:

tar cf - /db/tv/2009/ | tar xf - -C /tvspare/

Note that the files will be copied to /tvspare/db/tv/2009/ when you give this command.

You can also copy between machines; in either case, there is no intermediate tarball:

tar cf - -T $SX | ssh x tar xf - -C /tv/$DIR

$SX here is the name of a file that lists the files to be moved. This is used in the backup-images-hourly script and other scripts.

Copy a full directory, but not the path:

ssh -C $ca "tar cf - -C /imagesd/$DDIR $FIL.hq" | tar -xf - -C "$POOL/"

If you want to lock the file first by creating the directory, you change the syntax a bit:

if [ "$( mkdir $POOL/$FIL.hq 2> /dev/null; echo $? )" = "1" ] ; then continue ; fi
ssh -C $ca "tar cf - -C /imagesd/$FIL.hq ." | tar -xf - -C "$POOL/$FIL.hq/"

On the source, note the . to indicate all files inside the directory.

rsync mv ssh

How to synchronize files of a specific type:

rsync -avz --include='*.txt' --include='*.htm' --include='*.html'\
--exclude='*' source/ target/

See also /db/nosyncc for an example of how to place the filtering information in a parameter file. This is what is used in rsync-cc:

rsync -aKvn --delete --max-delete=$MAXDEL --exclude-from=/db/nosyncc\
/db/tv/$YEAR/* cartago:/$DISK/$YEAR/

The K in -aKvn says to respect remote symlinks; the --delete to delete files on target that are not on source, but not more than three per session; and the filter file to copy only files with the extension txt. Other scripts should also use this syntax.

Include partial relative path -- use -R with /./ indicating start:

rsync /tv/./$DDIR/$FIL.{txt,len} fsteen@hoffman2.idre.ucla.edu:/$HDIR/ocr/ -aRv

Copy updated files only -- and let the script expand the networks and rsync expand the file types:

rsync ca:$SRC/{24h,La-1}/$DAY*ES*\{mp4,txt\} /tv/$DIR/ -auvn 2>/dev/null | egrep 'mp4|txt'

The script produces the following command (from cc-integrate-mmm):

rsync 'ca:/mnt/csa01/Individual/CAS/24h/2014-03-22*ES*{mp4,txt}' \
 'ca:/mnt/csa01/Individual/CAS/La-1/2014-03-22*ES*{mp4,txt}' /tv/2014/2014-03/2014-03-22/ -auvn

To move file to a new location (use -n to avoid overwriting):

mv -n $FIL $DIR

To move the file, but keep a numbered version of any existing file at the target:

mv --backup=numbered $FIL $FIL.telxcc

Note you need GNU mv for this, so on OSX use gmv.

To ssh through a different port, add "Port 443" to /etc/ssh/sshd_config and just restart ssh (done on cartago). You can set up an alias in your ~/.ssh/config:

Host ca
   User espana
   HostName 164.67.183.179
   Port 443
 
scp -p <filename> -P 443 espana@164.67.183.179:~/ES/<network>
 
rsync -e 'ssh -p 443' <filename> espana@164.67.183.179:~/ES/<network>/ -avp 

On hoffman2, to turn off strict host key checking in ssh, use

 ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no peter@192.168.0.100

To create a tunnel by forwarding a port through a bridge node, issue

 ssh -N -L 47408:164.67.183.179:22 login3

where 47408 is a random number above 32000, 22 is the regular port number, and login3 is the bridge node.

To copy a file through a forwarded port, issue

 rsync -e ssh -p 62182 tna@localhost:$FIL tna@localhost:$TARGET

Such tunnels are the basis for our main task pipeline on hoffman2.

How to send e-mail

To send an e-mail within a script, use this syntax:

mail -s "Subject of the e-mail" fsteen@ucla.edu <$FILE

Where $FILE is the name of a file that becomes the body of the e-mail. Alternatively,

echo "$OUTPUT" | mail -s "Subject" fsteen@ucla.edu

Or if the message spans several lines, you can do this:

#!/bin/bash
mail -s "Failure notice" "TNA <tna@commstds.ucla.edu>" "Prof Steen <steen@commstds.ucla.edu>" << EOT
Dear Professor Steen,

We regret the Television News Archive suffered a massive and irrecoverable failure on $(date).

Please make appropriate arrangements.

Sincerely yours,
Mail Transfer Agent

This syntax should be used in the sweep script to send warning e-mails when there is a double recording failure.

A tweak of this function is to send an e-mail as an SMS message to a cell phone. For instructions on finding the SMS portal, see http://www.makeuseof.com/tag/email-to-sms/

For example, to send an e-mail as an SMS message to David, set the e-mail address as <number>@vtext.com, where <number> is David's cell phone number.

The SMS variant could be turned on in special cases where we're not close to a computer, but can still take action if we find out something has gone wrong.

Indirection

It is sometimes useful to use the value assigned to a variable as a variable in turn -- this is known as "indirection". Let's say you have a list of two machines, lucca and prato, in the file called "list". A script assigns each line in the list to a variable in turn:

for i in `cat list` ; do
  NODE="$i"
done

Now imagine you want a particular value assigned to each machine name, for instance the number of audio drops. You can then issue:

  eval "$NODE=$DROPS"

within the for loop. $NODE here has some value, either "lucca" or "prato". The eval command will place the number of drops into the variable $lucca and $prato in turn.

Or use the same trick to display a list of variables and their values (from rename-thumbnails):

for i in FYEAR FMNTH FDAY FHH FMM ; do eval "echo -e \"\\t$i is \$`eval echo $i`\"" ; done

It turns out bash handles this function directly in indirect parameter expansion:

 param="parade"; parade="long"; echo ${!param}
 long

Confusingly, you can also use this to output partial matches at both levels, in two ways:

 echo ${!pa*}
 parade param
 echo ${!pa@}
 parade param

For details, see Indirection.

For more complex assignments, use an array. An associative array is a particularly elegant and robust way of handling indirection.

Escape characters

$ echo "A quote is \", backslash is \\, backtick is \`."
A quote is ", backslash is \, backtick is `.
$ echo "A few spaces are    ; dollar is \$. \$X is ${X}."
A few spaces are    ; dollar is $. $X is 5.

Execute shell commands with bash

#!/bin/bash
# use backticks " ` ` " to execute shell command
echo `uname -o`

Alternatively, you can use

echo "$( uname -o )"

If you just want to run one command, and it's a line in a file, you can find the command with tail (or head, or grep), and then run it directly from the commandline by enclosing it in backticks (actually, this appears not to work correctly, though you do get an effect):

`tail ~/commands -n 1`

To list or cat files by more than one extension:

cat {*blck,*nav}
ls -l {*txt,*mp4}

In the bash shell, there are lots of shortcuts.

Note that you can call up the parts of previous commands using Alt + . For more, see how to insert arguments.

Execute delayed commands

To set up a command or script to execute later, you can use cron (crontab -e) for recurring jobs, or at for occasional jobs to be done at specific future times.

The simplest setup is to use echo -- make sure the string is quoted:

echo "rsync /tvspare/db-backup/2010 ca:/tvspare/db-backup/ -a" | at now + 10 minutes

Or you can create a script to run the job at the time set, for example (at least OSX needs the time at the end):

at -f /usr/local/bin/dv2xvid 5:00
at -f record.sh now + 10 minutes
  • atq -- see the at queue
atq
  • atrm -- terminate an at job
atrm 2

For an example, see cc-integrate.

Conditionals

If-then:

if [ "$( hostname -s )" = "cartago" ] ; then echo hi ; fi

You may not need if and the brackets -- this also works:

if grep cartago computer-list ; then echo hi ; fi

And/or:

if [ "$( hostname -s )" = "cartago" -o "$( hostname -s )" = "roma" ] ; then echo hi ; fi

Grouped conditions:

if [ "$LNA1" = "" ] && [ "$( hostname -s )" = "cartago" -o "$( hostname -s )" = "roma" ]

Or within one set of brackets with escaped parentheses:

if [ $FIXDIFF -lt $[LEN+10] -a \( $FIXDIFF -gt 15 -o $FIXDIFF -lt -15 \) ]

Conditions in double brackets for partial matching, double pipe for logical or and && for and:

Instead of if and then you use && and ||:

[[ 3 < 4 ]] && echo "true"

Or more simply, for instance to check if you have write access to a file:

 [ -w $OUT ] || echo -e "\n\tNo permission to output to $OUT\n"

If the file may or may not exist already, use this:

 if >> $OUT ; then echo -n ; else echo -e "\n\tNo permission to output to $OUT.\n" ; exit ; fi

You can usefully match against character types:

 if [[ $LIN == [A-Z]* ]]

These can be nested with single parentheses, which should not be escaped:

if [[ -f $POOL/$FIL.mp4 && (( -d $POOL/$FIL.hq && -f $POOL/$FIL.ocr ) \
 || $FIL == *_TV5_* || $FIL == *_KMEX_* ) ]]

Test if an input is an integer:

  # Define the class of integers
  INT='^[0-9]+$'
  if [[ $2 =~ $INT ]] 
    then MAXDEL=$2
    else echo -e "\n\tThe parameter must be an integer.\n" ; exit
  fi

Double pipe to execute something only if a command returns non-zero, for instance to test a lock:

md test 2>/dev/null || echo "No luck -- test already exists"
No luck -- test already exists

See also mispipe in #moreutils.

Conversely, double ampersand to execute something only if a command returns zero:

md test && cp file test/

A condition can be complicated -- as in this while condition from hoffman2's fetch-daemon.sh:

# Wait a minute if more than five hq files are waiting to be processed
   while [ "$( A=( `ls -1d $POOL/*.{hq,ocr,OCRed}` ) ; n=0
     for l in `seq 0 ${#A[@]}` ; do F="${A[$l]}" F=${F%.*} F=${F##*/}
       if [ "$( echo "${A[@]}" | grep $F.hq )" = "" ] ; then continue
         elif [ "$( echo "${A[@]}" | grep $F.OCRed )" ] ; then continue
         elif [ "$( echo "${A[@]}" | grep $F.ocr )" ] ; then continue
         else n=$[n+1]
       fi ; done ; echo $n )" -gt 5 ] ; do

Note that you can define an array and test its content as part of a simple comparison -- the output of the condition is something like "8 -gt 5".

Compare numbers

To test whether a number is in a particular range, as in blacklisting a series of nodes:

 ((HOST>=2211 && HOST<=2220)) && qdel $QNUM

To compare decimals, the simplest is to use bc's comparison function; the output is 1 if true and 0 if false:

echo "scale=3; .009 > .010" | bc

Alternatively, you can use awk:

NUMBER=0.1009; awk "BEGIN{exit ($NUMBER < .04)}" || echo Lower

source

Arithmetic

To do basic arithmetic operations, use one of these methods.

Display the difference in the number of words in two files

echo $[ $( cat $FIL | wc -w ) - $( cat ${FIL%.*}.old | wc -w ) ]

or (same effect)

echo $(( $( cat $FIL | wc -w ) - $( cat ${FIL%.*}.old | wc -w ) ))

Make a counter

n=0 
while true; do
  n=$[n+1]
done

For calculations involving decimals, use bc:

N="Some string ending in a number such as 3"
echo "scale = 5; ( ${N##* } + 1 ) / 1.333 " | bc

Floating-point comparisons (note single quotes around larger-than sign):

while [ "$( echo $( cat /proc/loadavg | cut -d" " -f1 | bc ) '>' 1.8 | bc )" -eq 1 ]
  do sleep 60
done

Add a column of numbers (elegant):

paste -sd+ infile|bc

Rounding with awk:

DIF="$( echo "$TIM1 - $TIM0" | bc | awk '{printf "%.0f\n", $1}' )"

ls

The list command has several useful flags, some of which have been added to the .bashrc alias list (q.v.). Here's how to list directories only:

ls -d */

In addition, you can use simple filters, for instance to list two file types:

ls -l {*txt,*avi}

Last six files of two extensions:

l {*f,*4} -h | tail -n 6

List files with one of a list of values:

l 2008-08-21_0{0,1,3}00_US_CNN* -d

List files within a range of values:

l 2008-08-21_0{0..3}00_US_CNN* -d

In addition to the * wildcard, meaning any value (including nothing), you can use the ? wildcard, meaning a single character. For instance,

l ????-??-??_?3??*

works for listing files that fit this pattern.

You can even use a variable for the filter list inside the curly brackets:

FILTER=_NO_,TV5,WWW
ls -1 *{$FILTER}*mp4

These filters apparently also work for cp, mv, scp, and rsync (not fully tested).

Note that you should not use ls is scripts at all. Rather than

 for FIL in $( ls -1 *Hardball*txt 2>/dev/null )

-- which will fail to list one file for each instance of $FIL, just use

 for FIL in *Hardball*txt ; do

For details see Parsing ls.

tee

Used to split output; see for instance the channel script, or this example:

    wget -O - http://example.com/dvd.iso | tee dvd.iso | sha1sum > dvd.sha1

For examples and syntax, see

info coreutils 'tee invocation'

The `>(command)' syntax in the following example relies on a feature of modern shells called "process substitution" -- check it out, it could well be useful:

    wget -O - http://example.com/dvd.iso \
      | tee >(sha1sum > dvd.sha1) \
            >(md5sum > dvd.md5) \
      > dvd.iso

In this example, the file is sent to two different hash utilities at the same time as it's written to file.

So on hoffman2 you might do this to list some processes and also count them:

 date ; myjobs | grep OCR-4c-24l | tee >(wc -l)

grep

Pick out lines that contain a date and time matching the CSA naming convention:

cat list | egrep '[0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{4}'

Select lines that contain either one string or another:

PMON=$( date +%Y-%m )
LMON=$( date -d "-1 month" +%Y-%m )
egrep "$PMON|$LMON" ~/tmp/pool

See also egrep -o below to extract a substring.

Grep a string for the presence of an array element:

MONTH="$( for n in `seq 0 $[${#B[@]}-1]` ; do  if [ "$( echo "$F" | grep -w "${B[$n]}" )" \
!= "" ] ; then echo -e "${B[$n]}" ; fi ; done )"

Advanced: how to grep values within an array:

echo ${Array[*]}| tr ' ' '\n' | fgrep -f - filename.txt

The list of values to be grepped are in filename.txt (not tested).

Find all files that do not contain a certain string

 grep -L ^DUR */*.txt

See also head and tail, tac and rev.

awk

Designed to "select particular records in a file and perform operations upon them."

Find and display the line with the largest number of fields

awk '{if (NF > max) {max = NF; line=$0}} END{print line}' $FIL

Remove the second instance of a line starting with a certain string -- in this case a duplicated LBT line:

awk '/^LBT/&&c++ {next} 1' $FIL | sponge $FIL

Split a file into multiple files at every occurrence of the pattern START:

awk '/START/{x="F"++i;}{print > x;}' file2
awk '/\[Event /{x="F"++i;}{print > x;}' 2014.PGN

sed

On OSX, use gsed.

Append some string -- like cat string >>

sed -i '$a hello world' filename

Add a line to a file, following a line with a known content

sed '/old line match/a this is the new line' < oldfile > newfile

where "old line match" might pick out a standard URL header line

sed "/^URL|/a TTS|$TTS"

or insert the line before the match:

sed "/^END|/i $SEG" $FIL

or add an extension to all lines in a list that start with a slash (file names):

 sed -r 's/(\/.*)/\1.txt/' removed-recordings >new

Find the line number of a matching string -- in this case the video length (use sed to grep):

LOC="$( sed -rn '/[0-1]{1}:[0-9]{2}:[0-9]{2}.[0-9]{2,3}/=' $FIL.txt )"

The excellent 2009 tutorial on appending, inserting, replacing and counting lines with sed says that this command (the -n =) only picks out the first line number, but my version of bash on roma lists them all. To print just the number of the first matching line, use (see sed one-liners):

sed -n '/^CC/=' $FIL | sed '1q'

For metadata tags, you may need to use three backslashes to escape the pipe symbol; cf. #Process control.

Capture the string you find:

sed -r 's/([0-1]{1}:[0-9]{2}:[0-9]{2}.[0-9]{2,3})/\1/' $FIL.txt 

Display the matching string -- and not alternate strings in regex with parens and pipe:

echo 2013-07-13_1700_RT.txt | sed -rn '/^[0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{4}_RT\.(mp4|txt)/p'

The string described only needs to match part of the line, and will display the entire line:

crontab -l | sed -rn "/channel\ $NWK\ [0-9]{1,3}\ \"$SHW\".*/p"

Or don't display anything unless it does not match (actually, this displays a newline):

echo 2013-07-13_1700_RT.txt | sed -r 's/^[0-9]{4}-[0-9]{2}-[0-9]{2}_[0-9]{4}_RT.txt//'

Or display the line after the match:

sed -n '/regexp/{n;p;}'

You can even run a nested command inside the sed command:

sed -n "/$(readlink `which $CX`)$/{n;p;}" )

Get the content of a line by number, say 5:

sed -n "5 p" $FIL.txt

This can be used to work around the fact that for loops parse lines into words if they contain spaces -- there are some complicated and somewhat error-prone workarounds, so this works:

for N in {1..30} ; do sed -n "$N p" filename

-- for instance to turn a list into a wiki menu:

for i in {1..26} ; do TAG=`sed -n "$i p" i` ; echo ":* $TAG" ; done

-- or for converting a list of embedded start times to UTC in a digitizing project:

for N in {1..6} ; do read VRC PullDate Week ShowDate From To Start Show <<< $( sed -n "$N p" CutList.txt ) ; 
  echo -e "\t$VRC $PullDate $Week $ShowDate $From $To $Start 
     \t$( eval "date -ud 'TZ=\"America/Los_Angeles\" $ShowDate $Start' +%Y-%m-%d_%H%M" )_US_$Show" ; 
done 

Find the line numbers of lines with some matching string and collect those lines only:

for N in `sed -n "/^% /=" < filename` ; do  (from mplayer-slave-check)


  # Get the content of the tag
  TAG="$( sed -n "$N p" filename )"

The n and p can also be used as a negative, to clip something out of a string:

 wc -w 2013-07-25_1700_US_MSNBC_News_Live.wc1 | gsed -n "s/$FIL.wc1//p"

The result is the word count 7971 clipped out of wc'd output "7971 2013-07-25_1700_US_MSNBC_News_Live.wc1".

Replace a string in all lines except lines that contain either Brexit or Grexit:

sed -E '/(Brexit|Grexit)/!s/America/USA/g' test

Replace a line by number (the replacement string in this case starts with %):

sed '6 c% Start Justice Mayor-home-raided' $FIL.seg

Replace a matching string in a particular line by number -- in this example line 11:

sed -i "$LineNum s/$TS1/$TOP/" $FIL
sed -i '11 s/20160831060100/20160831060001/' 2016-08-31_0600_US_KNBC_Channel_4_News_at_11PM.txt

Replace a string in the content of a variable instead of in a file:

 sed -r 's/.*Published\ on([^<]*).*/\1/' <<< "$HED"

Replace a line that matches part of a string (so use -r):

sed -r "/LBT\|/cLBT|$LBT" $FIL.txt

Or some explicit regular expression:

crontab -l | sed -r "/^([0-9 ]{5}[ \*]{6}\tmpg2h264-daemon)/c \
`date -d '+1 min' +%M\ %H` * * *\tmpg2h264-daemon" > crontab-new

Or require the beginning of the line and keep part of the line:

sed -r 's/TTL\|(.*)_(.*)/TTL\|\1 - \2/' $F

Or use parentheses to keep part of the line in a global replace:

sed -i -r 's/^(.)/\/tv\/2009\/2009-03\/2009-03-20\/\1/g' $FIL

Replace strings in a file using a simple global replace:

sed -r 's/--/\ --\ /g' $FIL

Replace a string in multiple files (g if multiple in each file):

sed -i -r 's/windows/x11/' *R

Convert an uppercase SNAKE_CASE to Camel_Snake -- (^|_) is a way to say "beginning or underscore"; \U will uppercase and \l will lowercase:

 sed -r "s/(^|_)(.)/\1\U\2/g" <<< ${TTL,,}

Show a selection of lines -- say, from 1 to 2 and then 4:

sed -n -e 1,2p -e 4p somefile.txt

This is used in cc-change-TYPE-tags to ensure consecutive tag matches. See also more on line extraction.

Cut a range of lines from a file -- lines 5-10 and 12 are cut and a backup of the original created in $FIL.bak:

sed -i.bak -e '5,10d;12d' $FIL

Or with variables for the cut points ($FIL:2345,3456):

eval $( echo "sed -i.bak -e '"${i#*:}"d' ${i%:*}" )

Excerpt lines between one string and another -- like copy and paste (script twin):

sed -n '/CRONTAB FOR/,/WEEKDAY SCHEDULE/p' file

Take a line with delimiters and place the field values in an array (used in cc-tag-commercials):

FLD=( $( echo "$LIN" | sed -e 's/|/\n/g' ) )

Insert a line after a given line number, in this case 3

sed "3a$TAG" < oldfile > newfile

Insert a line before a given line number, in this case 1

sed "1i$TAG" < oldfile > newfile

Insert the line on the first match only (cf. cc-tag-commercials):

sed -i "1,/^$MATCHING_STRING/ {/^$MATCHING_STRING/i\
$NEWLINE
}" $FIL.txt

This can also be used to prepend FIL1 to FIL2 (cf. mplayer-slave-move):

NB="$( wc -l $FIL1 )" NB=${NB% *} NB=$[NB-1] OFS=$IFS IFS=$'\n'
A=( $(cat "$FIL1") ) IFS=$OFS  # read the file into an array
for n in `seq $NB -1 0` ; do   # write each line in reverse order
  LN="${A[$n]}" ; sed -i "1i$LN" $FIL2
done

Delete a line or lines by number (use the -i switch to edit the file in place):

sed -i 4d file
sed -e '5,10d;12d' file

Delete a line that contains a string unless it also contains another string -- in this case delete the line if it contains "News" unless it starts with #

sed -e '/.*\"News\".*/{/^#/!d;}' file

With variables, add a space after the exclamation point (used in schedule script):

 sed -r "/$NWK.*\"?$SHOW\"?.*/{/^#/! d;}" file

Note that substitution and expansion is carried out within double quotes, but not within single quotes.

Add the string "%Video length" to the video length number, if it is missing:

sed -r 's/(^[0-1]{1}:[0-9]{2}:[0-9]{2}.[0-9]{2})/% Video length \1/' $FIL > $FIL.1

Remove blank lines with no spaces

sed '/^$/d' input.txt > output.txt

Or blank lines with any number of spaces:

sed '/^\s*$/d'

Match a string until some character -- in this case underscore for NWK -- useful for non-ascii strings:

eval $( echo "$FIL" | $SED -r 's/[0-9_-]{15}_([A-Z]{2})_([^_]*)_(.*)/COUNTRY=\1 NWK=\2 SHW=\3/' )

Delete everything between two characters or strings, including those characters or strings -- in this case between the string >> and the character colon -- the parentheses are optional:

sed -ir 's/>>([^:]*)://g' $FIL
sed -ir 's/>>[^:]*://g' $FIL

To avoid a greedy match, you specify remove everything until you get to the colon -- and then remove the colon too.

Similarly remove square brackets and everything inside them -- note only the first square bracket is escaped -- again, the parentheses are optional:

 sed -ir 's/\[([^]]*)]//g'
 sed -ir 's/\^*]//g'

Similarly remove parentheses and their contents -- protect the parentheses characters in square brackets:

 sed -ir 's/[(][^)]*[)]//g'

Take a string of uppercase letters and capitalize the first letter after underscore, hyphen, and space:

sed 's/[^ _-]*/\L\u&/g'

Change the extension of a file (for checking inventory during conversion):

line=/mnt/roma/2008/2008-06/2008-06-02/2008-06-02_1000_KTLA_Jerry_Springer_Show.avi
echo $line | gsed -r s/'(.*)\.avi/\1.mpg/'
/mnt/roma/2008/2008-06/2008-06-02/2008-06-02_1000_KTLA_Jerry_Springer_Show.mpg

Pick out part of a string by matching a complex pattern -- say, the number of seconds in a video (first example) or the amount of space left on a hard drive (second example):

mp4info $FIL.mp4 | grep video | $SED -r s/'.*\ ([0-9]{1,4}\.[0-9]{1,4}\ secs).*/\1/'
df -h | grep tv1 | sed -r 's/([a-z,0-9\/-]+)( )+([a-z,0-9,A-Z]+)( )+\
([a-z,0-9,A-Z]+)( )+([a-z,0-9,A-Z]+).*/\7/'

Optional string -- there may or may not be a period after the month variable:

DAY="$( echo "$F" | sed -r "s/.*$MONTH.?\ ([0-9]{1,2}).*/\1/" )"

Swap the order of two strings -- start out with this, the output of 'grep URL *txt':

 2014-01-08_1048_AF_Tolo_TOLOnews.txt:URL|http://youtube.com/watch?v=uML1YENQKe8
 grep URL *txt | sed 's/\([0-9a-zA-Z_.-]*\):URL|\([0-9a-zA-Z:/._?=-]*\)/\2 \1/'
 http://youtube.com/watch?v=uML1YENQKe8 2014-01-08_1048_AF_Tolo_TOLOnews.txt

That seems very tedious -- to list every single character used! Note you're not using sed -r, but instead escaping the parentheses.

Extract a series of matching substrings and use them in a single variable:

 RDIR="$( echo "$FIL" | sed -r 's/.*([0-9]{4})-([0-9]{2})-([0-9]{2}).*/\1\/\1-\2\/\1-\2-\3/' )"

Extract a series of matching substrings using a range, and use them in a single variable:

 BASE="$( echo $FIL | sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2}\_[0-9]{4}\_[a-zA-Z0-9-]{1,25}).*/\1/' )"

Extract a series of matching substrings and define them as variables:

 FIL=2008-06-02_1000_KTLA_Jerry_Springer_Show.mp4
 eval $( echo $FIL | sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2}).*/YEAR=\1 MONTH=\2 DAY=\3/' )

Convert an IP address listed in /etc/hosts to a short name:

 NAME=$( grep $line /etc/hosts | sed -r 's/.*[\t\ ](.{3,8})$/\1/' )

Sed here turns on regular expressions with '-r', then does a substitution command, 's', followed by the first of the three dividers, '/'. The '.*' means 'any number of any characters' -- that is to say, match anything. The expression '[\t\ ]' means 'either a TAB character or a space'. The content of the parenthesis, '(.{3,8})', means 'between three and eight characters' -- and they have to be at the end of the line, signaled by '$'. The '\1' then echoes what matches the parenthesis, in this case the short host name.

Change the order of fields in a delimited file:

 L="20130615100024.089|20130615100025.457|CC1|RU2|HEADQUARTERS IN ATLANTA."
 echo $L | sed -r 's/(.*)\|(.*)\|(.*)\|(.*)\|(.*)/\1|\2|\3|\5|Style=\4/'
 20130615100024.089|20130615100025.457|CC1|HEADQUARTERS IN ATLANTA.|Style=RU2

The same for a whole file:

 sed -r 's/(.*)\|(.*)\|(.*)\|(.*)\|(.*)/\1|\2|\3|\5|Style=\4/' infile > outfile

Use eval to place a date command inside a sed substitution string:

cat some-file | eval "$( sed -r s/'(.*flv).*([0-9]{2}).*/echo \1\t`date -d \2 +%F`/' )"

Remove * and pipe to a new file -- but note this catches only one * per line!

 sed -r 's/(.*)(\*)(.*)/\1\3/' $FIL.orig >$FIL.nostars

Remove all lines that contain an *:

 $ sed '/\*/d' input.txt > output.txt

Sort and format text in columns

To pad individual strings with spaces, use the rpad function (see hoffman's cleanup.sh):

# Custom trailing space padding (http://techknack.net/bash-rpad-lpad-and-center-pad/)
function rpad {   
 word="$1"
 while [ ${#word} -lt $2 ]; do
   word="$word$3";
 done;
 echo "$word";
}

Then use it like this to help output align to columns (you can use other characters than space, after backslash):

echo -e "\t`rpad $NODE 11 \ `"

For whole tables, enclose the command that generates the text output in parentheses, and pipe the result to the bash command "column". Define TAB to be the column delimiter (with -s TAB), and arrange the output in nice columns (with -t); also in this case sort reverse by numbers in column 2 (this is the .bashrc function dol on roma):

# List the number of conversions done by each of the machines
function dol () {  
if [ -z "$1" ] ; then AGO=0 ; else AGO=$1 ; fi
echo -e "\n\t     Division of labor during the last $AGO days and today"
echo -e   "\t(Number of files converted from mpgts to mp4 by each machine)\n"
(for i in `cat /usr/local/bin/computer-list-tna` ; do
  echo -en "\t\t\t$i\t" ; spool $AGO | grep -c $i
 done ) | column -s TAB -t | sort -nrk 2 ; echo ""
}

See also 12.4. Text Processing Commands (Advanced Bash Scripting Guide).

Use column -c 3:

# Output    
 ( echo -e "\
 \tSTART tags \t  $STARTT\t  $STARTV
 \tEND tags   \t  $ENDT  \t  $ENDV  
 \tTotal      \t  $TN  \t  $VN\n" ) | column -c 3 

Use printf to align numbers in columns -- this is what gives you the most control:

 printf "\t%s\t%4s\t%5s\n" START\ tags $STARTT $STARTV
 printf "\t%s\t%4s\t%5s\n" END\ tags $ENDT $ENDV
 printf "\t%s\t\t%4s\t%5s\n" Total $TN $VN
 START tags       112      56
 END tags          92      74
 Total            204     130

You can get the number of lines and columns in the current terminal easily from tput:

 lines=$(tput lines)
 columns=$(tput cols)

Use this to scale the display to the terminal used.

Sort a file on the last word of each line when the lines vary in the number of words. In this case, the last word is a number which may be a decimal:

for i in `cat $DIR/$FIL.nav | egrep -o '([0-9.]{1,99})$' | sort -n ` ; do
  grep $i $DIR/$FIL.nav >> $DIR/$FIL.nav.0
done ; mv $DIR/$FIL.nav.0 $DIR/$FIL.nav

There doesn't seem to be a more elegant solution in bash; possibly arrays could be used.

Sort the words in a file and remove duplicates:

 tsort /tmp/i | sort -u

Here tsort outputs one word per line, which sort then operates on.

Deduplicate without sorting:

 awk ' !x[$0]++' $FIL

Sort successively by specified columns -- first by 6 and then by 7:

 myjobs | sort -k6,7

Reuse the same line:

 # Progress (\r returns to left margin and \033[0K erases to end of line)
 echo -en "\r\t$FIL\033[0K"

head tail sponge

Keep only the first five lines:

head -n 5 $FIL

All but the first five lines:

tail -n +5 $FIL

Last five lines

tail -n 5 $FIL

All but the last five lines (untested):

head -n -5 $FIL

See #sed for picking out a set of numbered lines, consecutive or not.

Select a number of stanzas from a file -- each stanza starts with "[Event", and the array lists the line number of the start of each stanza:

a=( $(sed -rn '/\[Event /=' $PGN ) ) ; head -n +${a[$END]} $PGN | tail -n +${a[$[BEG-1]]} > ${PGN%.*}-$BEG-$END.pgn

Pick out the last timestamp before line number $N:

head -n $N $FIL.txt | tac |\
egrep -o -m1 '([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2}):([0-9]{2}):([0-9]{2})'

Prepend lines to a file using sponge -- it soaks up the content before writing it:

(echo "text to prepend"; cat file.txt) | sponge file.txt

Sponge also works on a remote system -- first find the line number, then used head to include lines up to that number:

RemHed=$( ssh $TARGET "/opt/local/bin/gsed -n '/WEEKDAY SCHEDULE/=' ~/.crontab/$TARGET-$TIM |\
  /opt/local/bin/gsed '1q'" )
ssh $TARGET "( head -n $RemHed ~/.crontab/$TARGET-$TIM ; cat ~/.crontab/$SOURCE-$TIM ) \
 | /opt/local/bin/sponge ~/.crontab/$SOURCE-$TIM"

Note that ssh does not read .profile or .bashrc, so its path is often incomplete; the workaround is to give the full path for an executable.

Even simpler, instead of using head you can use sed to include text from one string to another:

ssh $TARGET "( /opt/local/bin/gsed -n '/CRONTAB FOR/,/WEEKDAY SCHEDULE/p' $STORE/$TARGET-$TIM ;\
cat $STORE/$SOURCE-$TIM ) | /opt/local/bin/sponge $STORE/$SOURCE-$TIM"

tac - concatenate and print files in reverse

rev - reverse the order of characters in all lines of a file or files

find

See Find command tutorial.

Delete empty files from a day directory:

find . -maxdepth 1 -empty -type f -print0 | xargs -0 rm

Find files smaller than 100 bytes:

find ~/netapp/stripped/* -size -100c

Determine whether a file is present (any extension) (the printf command strips leading ./):

find -maxdepth 1 -name "*$FIL.*" -printf '%P\n'

Change permissions on all directories, follow symlinks:

find -L /tv -type d -exec chmod 0755 {} \;

Change permissions on all non-directory files, follow symlinks:

find -L /tv -type f -exec chmod 0644 {} \;

Delete all files older than a certain number of days

find . -mtime +208 -type f | xargs rm

Count the number of files in a directory (without triggering Argument line too long):

find ./$DIR -mindepth 1 | wc -l

Note that you find outputs unsorted results, so you often will need to sort first.

For some reason the following commands don't do anything if you're in the /db directory -- possibly a bug in findutils? Move to any other directory. It's also sensitive to locale -- the sorts work on en_US only.

To find all closed captioning files:

find /db/tv/* -name *txt 

Count the number of files in the archive:

find /db/tv/* -name *txt | grep -c txt

To find all instances of a particular program:

find /db/tv/* -name *Leno*txt

To search all closed captioning files for some string, ignoring case:

find /db/tv/* -name *txt | xargs grep -i TSUNAMI

Allow one spelling mistake:

glimpse -i -1 TSUNAMI

To limit the search to a particular year or month:

find /db/tv/2005/2005-04* -name *txt | xargs grep -i STORM

To limit the search to a particular network:

find /db/tv/* -name *txt | xargs grep -i TSUNAMI | grep ABC

To see three lines of context before and after:

find /db/tv/2005/2005-04* -name *txt | xargs grep -C 3 BAGHDAD

To find two words within a dozen lines of each other:

find /db/tv/* -name *txt | xargs grep -C 5 TASER | grep UCLA 

Punctuation:

find //db/tv/2006/2006-11* -name *txt | xargs grep -i "U\.C\.L\.A\."

To show the number of times a word occurs in each file and sort by frequency, highest first:

find /db/tv/* -name *txt | xargs grep -i TSUNAMI -c | sort -t : -k 2 -n -r | more

Show the line number where the match is found:

find /db/tv/* -name *txt | xargs grep -i TSUNAMI -n 

Count the number of times one word occurs within a few lines of another:

find /db/tv/* -name *txt | xargs grep -i UCLA -C2 | grep -i taser -c | sort -t : -k 2 -n -r

Count how many times UCLA is mentioned in the news:

find /db/tv/* -name *txt | xargs grep -c -i UCLA

Display the individual programs that mentioned UCLA the most times:

find /db/tv/* -name *txt | xargs grep -i UCLA -c | sort -t : -k 2 -n -r | more

ps

Find the process ID (PID):

PID="$( ps -C mpg2h264-daemon -o pid= )"

Process ID of one of two processes on a remote machine:

PID="$( ssh $SYS "ps x | grep -v grep | grep $FIL | egrep 'handbrake|ffmpeg' |\
awk '{ print $1 }'" )"

moreutils

chronic: runs a command quietly unless it fails (0.43)
combine: combine the lines in two files using boolean operations
errno: look up errno names and descriptions (0.47)
ifdata: get network interface info without parsing ifconfig output
isutf8: check if a file or standard input is utf-8
ifne: run a command if the standard input is not empty
mispipe: pipe two commands, returning the exit status of the first
parallel: run multiple jobs at once
pee: tee standard input to pipes
sponge: soak up standard input and write to a file
ts: timestamp standard input
vidir: edit a directory in your text editor
vipe: insert a text editor into a pipe
zrun: automatically uncompress arguments to command

Its web page is here: http://kitenet.net/~joey/code/moreutils/

numutils and shuf

average: calculate the average of numbers
bound: find the boundary numbers (min and max) of input
interval: show the numeric intervals between each number in a sequence
normalize: normalize a set of numbers between 0 and 1 by default
numgrep: like normal grep, but for sets of numbers
numprocess: do mathmatical operations on numbers.
numsum: add up all the numbers
random: generate a random number from a given expression
range: generate a set of numbers in a range expression
round: round each number according to its value

See http://suso.suso.org/programs/num-utils/ (installed on cartago 2013-04-14 from ~/software/tar/numutils)

For random, you can also use shuf -- for instance

STOR=$( shuf -n1 -e ca ca ca roma )

or

UA=$( shuf -e -n1 \
"Mozilla/5.0 (X11; U; Linux x86_64; en-us) AppleWebKit/531.2+ (KHTML, like Gecko) Version/5.0 Safari/531.2+" \
"Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120501 Firefox/12.0 SeaMonkey/2.9.1 Lightning/1.4" \
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5" \
)

The -e switch treats each argument as an input line; seems to be needed when spaces in string.

stat

Find the size of a file:

stat -L -c%s\ %n $FIL (follow symbolic links, print size, a space, and name)

This command can also tell you about ownership, modification times, and so on.

Count the number of files whose file size starts with a 1:

stat -c%s * | grep -c ^1

Count the number of empty files in a list:

stat -c%s `cat list` | grep -c ^0

Expand the path of a list of files and count the non-empty files:

stat -c%s `cat list |\
sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})(.*)/\/tv\/\1\/\1-\2\/\1-\2-\3\/\1-\2-\3\4/'` |\
grep -vc ^0

See ca:/tvspare/images/check for a more reasonable implementation.

Compare a file size to a fixed number of bytes:

if [ "$( stat -c%s /tv/$DDIR/$FIL.txt )" -le "100" ] 

Compare two files sizes -- the file names are array elements (thumbnails-keyframes-roma-sweep):

if [ "$( stat -c%s ${MATCH[0]} )" -le "$( stat -c%s ${MATCH[1]} )" ]

xargs

See man page and usage.

The simplest type of use is to feed arguments directly to a new command -- in this case, to list each word on a separate line:

cat file | xargs -n1

Perform a calculation on a column -- note the use of xargs -I {} to signal that the value will be needed in the placeholder {}:

cat c* | cut -f3 | xargs -I {} echo "scale = 2; 1 / {}" | bc

This is a very powerful function we've not been using.

Change permissions recursively (source):

find . -user old_user -group old_group -print0 | xargs -0 chown -v -h new_user:new_group

Find the duration of a video file

Duration can be given in the format 0:00:00 (LEN1) or 0000 seconds (SECS); cf. thumbnails-check.

On Linux, you can use ffprobe for both mp4 and avi files:

LEN1="$( ffprobe -show_files -pretty $FIL 2>/dev/null | grep duration \
| $SED -r s/'.*([0-9]{1}:[0-9]{1,2}:[0-9]{1,2}).*/\1/' )"

On OSX and Linux, you can use mp4info for mp4 files:

SECS="$( mp4info $FIL.EXT | grep video \
| $SED -r s/'.*\ ([0-9]{1,4}\.[0-9]{1,4}\ secs).*/\1/' | cut -d"." -f1 )"
LEN1="$( echo $( date -d "+$SECS seconds"\ $(date +%F) +%H:%M:%S ) )"

and tcprobe (from transcode) for avi (and likely mpg) files:

LEN1="$( tcprobe -i $FIL.EXT 2> /dev/null | grep duration \
| $SED -r s/'.*([0-9]{1}:[0-9]{1,2}:[0-9]{1,2}).*/\1/' )"
SECS="$(echo $($DAT -u -d 1970-01-01\ $LEN1 +%s))"

Date manipulations

Read about the secrets of dates in "info coreutils date". Note we could also use dateutils.

Many of these functions can be seriously simplified if you use the new Debian package dateutils(upstream) -- as of 2013-10-29 in testing, but not yet in stable.

Change the time

To change the time of a file to a day earlier:

 touch -r dn2013-1128.mp4 -d '-1 day' dn2013-1128.mp4

To change the mtime, add --time=mtime

Relative time

Get the number of days ago from a filename (cf. daysago):

   DAY="$[$[$(date +%s)-$(date -d "${FIL%%_*}" +%s)]/86400]"

Subtract a day from today:

   YESTERDAY=$(date -d "-1 day" +%Y-%m-%d)

This can be pushed quite far, as in the tape-timestamp script, which adds running seconds to any past time:

   START=$(date +%s)
   LAPSED=$[$(date +%s)-$START]
   date -d "+$LAPSED seconds"\ $YEAR-$MONTH-$DAY\ $HOUR:$MIN +%F\ %H:%M:%S

Or add a number of days to some past date, $DAY, in the format 2009-10-02:

   date -d "+2 days"\ $DAY +%F

Or add a number of seconds to the current time:

   date -d "+$LAPSED seconds"\ $(date +%F)\ $(date +%H:%M:%S) +%F_%H:%M:%S

This syntax also works (leave out "ago" for a future time):

   date --date='1 minute ago' +%s

For instance in a time comparison -- wait until the condition is satisfied that an image is older than a minute:

 until [ "$(date --date='1 minute ago' +%s)" -gt "$( date -r `ls -1 $WORK/$FIL.img/png/* | tail -n1` +%s )" ] ; do sleep 18.81 ; done

Just as usefully, convert a duration in seconds to a duration in hours and minutes:

   echo $(date -d "+$LAPSED seconds"\ $(date +%F) +%H\ hours,\ %M\ minutes,\ and\ %S\ seconds)

Create an initial time based on the file name, adding seconds

TIME1="$( echo $FIL | sed -r s/'([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})([0-9]{2}).*/\1\ \2:\3:00/' )"

Add a number of seconds to the initial time

TIME2=$( $DAT -d "+$SECS seconds"\ "$TIMEA" +%F\ %H:%M:%S)

Time arithmetic

For adding or subtracting time from a past or present time, you can just do this sort of thing:

  date -d "+1 day"\ 2011-12-09 +%F
  date -d "+1 minute"\ "$LBT 12:00" +%Y-%m-%d\ %H:%M  (where $LBT is 2009-09-02)

Without specifying the date format, you can add and subtract time, using eval to get around the quoting requirements:

  STIM=$( eval "date -d '$STIM +1 min'" )

You can compare two regular YYYY-MM-DD dates by simply removing the hyphens (see #Chop strings):

  if [ ${START//-/} -lt ${STOP//-/} ] ; then 

Or you can use these bashisms to compare the age of two files

  if [ "$F.txt" -nt "$F.json" ] ; then  => newer than
  if [ "$F.txt" -ot "$F.json" ] ; then  => older than

For more complex operations, first convert to unix seconds.

Convert hours and minutes to seconds for arithmetric operations (this uses UTC):

   CUT1="$(date -ud 1970-01-01\ 00:28:46 +%s)"

Unix seconds back to minutes and seconds:

   echo $(date -ud "+$CUT1 seconds"\ $(date +%F) +%M:%S)

If the number of seconds is more than one day, use this function:

   convertsecs() {
    ((h=${1}/3600))
    ((m=(${1}%3600)/60))
    ((s=${1}%60))
    printf "%02d:%02d:%02d\n" $h $m $s
   }
    
   echo $(convertsecs $s)

Convert a date to the number of seconds since the unix date epoch (which is 1970-01-01 00:00:00 UTC), in this case by extracting it from a filename:

BTIM="$( $DAT -ud "$( echo $FIL | \
$SED -r s/'([0-9]{4}-[0-9]{2}-[0-9]{2})_([0-9]{2})([0-9]{2}).*/\1\ \2:\3:00/' )" +%s)"

You can then do math with this number. To convert it back to UTC time, you can use the @ shortcut (it means "unix seconds since epoch") -- the second line is equivalent:

$DAT -ud @$BTIM +%Y-%m-%d\ %H:%M:%S  (cf. cc-generate-missing-txt-files-new-style)
$DAT -ud "$BTIM seconds"\ 1970-01-01 +%Y%m%d\ %H:%M:%S

Check the relative age of a file:

if [ $[ $( $DAT +%s ) - $( $DAT -r /tmp/SumDIFF +%s ) ] -lt 60 ]
 then echo "The file is less than a minute old" ; fi

Bash interprets two leading zeroes as an octal number; if it's not, use 10# to mark the number as base 10:

SUM=$[ $TS1 - 10#${TS1: -2} ]

tv tree

Reconstruct a base directory address from a filename (no initial or final slash):

   DDIR="$(echo $FIL | sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2}).*/\1\/\1-\2\/\1-\2-\3/')"

Show no output unless the file name actually contains the date:

  DDIR="$(echo $FIL | $SED -rn 's/([0-9]{4})-([0-9]{2})-([0-9]{2}).*/\1\/\1-\2\/\1-\2-\3/p' )"

Or do the same using variables:

eval $( echo "$1" | sed -r 's/.*([0-9]{4})-([0-9]{2})-([0-9]{2}).*/YEAR=\1 MONTH=\2 DAY=\3/' )
DIR="/tv/$YEAR/$YEAR-$MONTH/$YEAR-$MONTH-$DAY/"

Or simply chop the string (this doesn't check that the strings fit the date pattern):

DIR="${FIL:0:4}/${FIL:0:7}/${FIL:0:10}"

Sed can also be used to insert a character into a position in a string:

echo 2330 | sed -e 's/^.\{2\}/&:/'
23:30

Get the number of days ago from a file name:

DAY="$[$[$(date +%s)-$(date -ud "${F:0:10}" +%s)]/86400]"

Reconstruct a base directory address from a date (includes initial and final slash):

   DIR="$(echo $DAT | sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\/\1\/\1-\2\/\1-\2-\3\//')"

Note that the "-r" (regex) switch doesn't work with the sed on OSX; use gsed (and gdate).

Check that a date is valid, say inside a loop as in pull-tv:

   if [ "$( $DAT -d "$YM-$i" +%F 2>&1 | grep invalid )" != "" ] ; then continue ; fi

Time zones

See also Time zone converter and GNU date invocation.

Display the time in a different time zone:

 zdump CEST

Convert a remote local time to UTC, independently of the time zone setting of the host:

date -ud 'TZ="America/New_York" 2012-01-03 09:00' '+%Y-%m-%d %H:%M %Z %z'

Because the enclosing ' and " quotes are mandatory, you cannot simply substitute variables, but this works:

T="2004-10-31 06:30"
eval "date -ud 'TZ=\"Europe/Paris\" $T'"

So directly to a variable:

NEW=$( eval "$DAT -ud 'TZ=\"America/New_York\" $DAY $TIM' +%Y-%m-%d_%H%M" )

This is the most reliable way to convert a remote local broadcast time to the CSA's UTC format.

If you need to add a number of seconds, place the timezone first:

eval "$DAT -ud 'TZ=\"$TMZ\" +$LEN seconds $DAY $TIM' +%Y%m%d%H%M%S"

If we have the timezone in LBT, we can of course use that instead:

date -ud 'CET 2012-01-03 09:00' '+%Y-%m-%d %H:%M %Z %z'

However, we then depend on the time zone abbreviation being correct for that particular date to get the daylight saving correctly. It's typically safer to use the location explicitly.

Convert an old-style timestamp to the new-style UTC time (see mplayer-slave-final):

date -ud @`date -d "2011-06-07 07:00:01" +%s` +%Y%m%d%H%M%S

That is to say, convert the local broadcast time (in the filename) to UTC:

LBT=2008-10-11_1300
date -ud @`date -d "${LBT/_/ }" +%s` +%Y-%m-%d_%H%M
2008-10-11_2000

Convert UTC to a remote local time:

export TZ=America/New_York
date -d 'UTC 2011-07-22 06:00:00' '+%Y-%m-%d %H:%M:%S %Z'
2011-07-22 02:00:00 EDT

Say for a file correctly named in UTC, but recorded in Prague:

 export TZ=Europe/Prague
 T="${FIL:0:10} ${FIL:11:2}:${FIL:13:2}"
 LBT="$( eval $DAT -d \"UTC $T\" \'+%Y-%m-%d %H:%M:%S %Z\' ) Europe/Prague"

Find the local equivalent to a time in another timezone:

date -d 'CEST 2011-07-22 1500' '+%Y-%m-%d %H:%M:%S %Z'
2011-07-22 06:00:00 PDT

This tells you that 3pm on the 22nd in Central European Summer time is 6am on the 22nd in Pacific Daylight savings time. Even better (as it automatically handles daylight savings):

date --date='TZ="Europe/Paris" 2004-10-31 06:30'

Because the enclosing ' and " quotes are mandatory, you cannot simply substitute variables, but this works:

T="2004-10-31 06:30"
eval "date -d 'TZ=\"Europe/Paris\" $T'"

This was for a short time useful for naming files recorded elsewhere, but we now use UTC names.

Find the current time in UTC:

 date -u '+%Y-%m-%d %H:%M:%S %Z'

To get the UTC equivalent of a past local time, start with the UTC time (in the filename in this case):

UTC=`echo $FIL | cut -d_ -f1-2`

Then, pretending that it was a local time, convert it to UTC seconds, :

 PST=`$DAT -ud "${UTC/_/ }" +%s`

This will be the "wrong" number of UTC seconds, namely the time it was in California at the UTC time. Convert this back to a date:

 PST=`$DAT -d @$PST +%s` '+%Y-%m-%d %H:%M:%S %Z'`

Combine the two steps into one (the lines show two different solutions for quoting):

 $DAT -d @`$DAT -ud "${UTC/_/ }" +%s` '+%Y-%m-%d %H:%M:%S %Z'
 LBT="$( $DAT -d @`$DAT -ud "${UTC/_/ }" +%s` +%Y-%m-%d\ %H:%M:%S\ %Z )"

This gives you the local broadcast time of a UTC filename (cf. cc-generate-missing-txt-files-new-style).

It might in some cases be useful to set a temporary timezone:

 OLDTZ=$TZ ; export TZ=UTC; echo "UTC: `date +\"%F %R (%Z)\"`" ; export TZ=$TZ

See unix date (Wikipedia).

For adjusting the time by a fixed number of hours, use this:

 date -d "+9 hours 2011-07-22 06:00" '+%Y-%m-%d %H:%M:%S'

For instance, generate broadcast time from file name:

 for i in `ls -1 *NRK*t` ; do 
   D=${i%%_*} ; H=${i:11:2}
   date -d "+9 hours $D $H:00" '+%Y-%m-%d %H:%M:%S'
 done

Add the new timestamp into the header:

 for i in `ls -1 *NRK*t` ; do 
   D=${i%%_*} ; H=${i:11:2}
   sed -i "/CMT|Norwegian National Archives/a \
    LBT|`date -d "+9 hours $D $H:00" '+%Y-%m-%d %H:%M:%S'` CEST" <$i
 done

This does not of course track the actual time of a remote location.

This seems to work fine for deriving the local broadcast time from a filename:

LBT=$( eval "$DAT -d 'TZ=\"America/San_Francisco\" $DAY $TIM' '+%Y-%m-%d %H:%M:00'")" America/San_Francisco"

Chop strings

See a great list, including some undocumented features, and String manipulation (tldp).

Note that these chop operations can also be performed directly on array elements.

Chop the last character off a string:

${TS2%?}

Keep the first 16, the date and time of a filename:

${FIL:0:15}

Interestingly, cutting from the end is also something GNU cut can handle, like this:

echo abcde123456 | cut -c6- --complement

Without the --complement flag you get 123456 -- with it, you get abcde. Still, to cut this way you have to know how far from the beginning you need to go, which you may not know.

You can also use this trick with cut, using GNU rev to reverse the order of characters:

echo ab:cd:ef | rev | cut -d: -f1 | rev
ef

Sed can of course do the same thing -- here by number of characters:

echo 123456 | sed 's/..$//'
1234

Starting with bash 4.2, you can do this more elegantly -- for instance, to trim the last seven characters:

f=01:01:50.548000000 ; echo ${f: -7}

Or with tail, using the character flag:

echo -n $foo | tail -c 3

If the string is a file name with path, you can extract either the path or the file name:

ls -1d $RDIR/*.{len,mpg,reserved} 2>/dev/null | xargs -n 1 basename
ls -1d $RDIR/*.{len,mpg,reserved} 2>/dev/null | xargs -n 1 dirname

Subtract one string from another

STAMP="2008-08-08_0500_CNN_American_Morning_2008-08-08_05:00:01"
FIL="2008-08-08_0500_CNN_American_Morning"
echo ${STAMP#$FIL}
_2008-08-08_05:00:01

Subtract a string plus a something else from another string (nice)

echo ${STAMP#$FIL'_'}
2008-08-08_05:00:01

Match the longest possible string

$ FIL=foodforthought.jpg
$ echo ${FIL##*fo}
rthought.jpg

Remove trailing spaces

LIN="${LIN%"${LIN##*[![:space:]]}"}"

Strip path

FIL="${FIL##*/}"

Match the shortest possible string

$ echo ${FIL#*fo}
odforthought.jpg

This can also be used to cut a string in the middle -- for instance

DRV="$( ssh chi "realpath /tv/$( date +%Y )/$( date +%Y-%m )/$( date +%Y-%m-%d )" )"
echo $DRV (shows it is /mnt/2011_02_25/2011/2011-04/2011-04-24)
echo  ${DRV%/????/*} (picks out the mount point, /mnt/2011_02_25)

In OSX, use this for realpath:

greadlink -f $FIL

Strip the file name

DIR="${FIL%/*}"

Substitution (wow!) -- replace space with underscore or vice versa

echo Converting "$FIL" to "${FIL/ /_}"
echo "${TIM/_/ }"

Useful for renaming without worrying about extensions:

for i in *Test.mpg ; do mv $i ${i/Test/News} ; done

Replace all instances

DAT=2011-12-20
echo ${DAT//-//}
 2011/12/20

Substitution of letters in a string -- say you get WFFBW:

for DEV in $( seq 0 $[ ${#STATUS}-1 ] ) ; do
 for D in ${STATUS:$DEV:1} ; do
   D="${D//W/Working}"
   D="${D//F/Failed}"
   D="${D//B/Busy}"
   echo -e "\tCard $DEV \t$D"
 done
done

Remove newlines:

 dt=${dt//$'\n'/}   # all newlines.
 dt=${dt%$'\n'}   # a trailing newline.

Or insert a zero after a pattern (note the ls regex):

for i in `ls -1 Ch[0-9][!0-9]*` ; do mv $i ${i/Ch/Ch0} ; done

Advanced chopping to reliably extract extension -- how is the colon used? What is the significance of explicitly local variables?

function ext()
{
 local name=${1##*/}
 local name0="${name%.*}"
 local ext=${name0:+${name#$name0}}
 echo "${ext:-.}"
}

Don't chop, just match straight: does a string contain a substring?

 test="somesubstring"
 if [[ $FOO =~ "sub" ]] ; then echo "true"; else echo "false"; fi
 if [[ $( date +%H:%M ) =~ "09:0" ]]

Or use wildcards instead of tilde:

  if [[ $FOO == *sub* ]] ; then echo "true"; else echo "false"; fi

That allows you to negate:

   if [[ $FOO != *sub* ]] ; then echo "true"; else echo "false"; fi

The method is characterized as a "true bashism," meaning it won't work in other shells. You can combine it -- get the content of a line by number of a file, and see if it partially matches a string:

 if [[ "$( sed -n "$[N-1]p" < $FIL.tag )" =~ "ANS_TIME_POSITION" ]]

This may also work -- for details, see string comparisons:

 case $SYSA in
   durga )  if [[ $FIL == *CNN* ]] ; then continue ; fi ;;
   sita  )  if [[ $FIL != *CNN* ]] ; then continue ; fi ;;
   kali  )  if [[ $FIL == *ABC* ]] ; then continue ; fi ;;
   radha )  if [[ $FIL != *ABC* ]] ; then continue ; fi ;;
 esac

For excellent examples, see bash-hackers.

You can also use egrep -o to pick out a substring through matching, for instance to determine whether the present working directory is a month directory:

   DIR="$( echo $(pwd) | egrep -o [0-9]{4}-[0-9]{2}$ )"

This is used in the check-mp4-txt script and could be used elsewhere. See man egrep for details; the -o switch tells it to output only the matching substrings.

You can also select a sequence of characters from a string:

SYS=pratomagno ; echo ${SYS:3:3}| tr 'a-z' 'A-Z'
TOM

The first number in ${SYS:3:3} is the starting character and the second the length of the sequence. Unlike cut -b this lets you constrain a cut from both ends in one operation. For instance, use this to rename downloaded files like dn2006-0524_512kb.mp4,

for i in `ls -1` ; do mv $i ${i:2:7}-${i:9:2}_0900_WWW_DemocracyNow.mp4 ; done

You can use this to simplify scripts that rely on cut -b -- for the first time you have a tool that double-chops, and on the fly, without the need to create a new variable.

You can also use a variable, such as the string length, inside the chop command:

d=/tv/2011-04/2011-04-14
echo ${d:4:${#d}}

cuts off the first three characters, keeping from 4 to 27, the length of the string.

Or select a sequence starting from the end of a string:

echo ${SYS: -4}  

Locate a character's position within a string:

STATUS=00010
echo -e "\n\tCard `expr index $STATUS 1` is jammed\n"

to get "Card 4 is jammed".

Change case (needs bash 4, so "sudo port install bash" on OSX):

 $ param="parade"
 $ echo ${param^}      (uppercases the first letter -- or use sed 's/^./\U&/')
 Parade
 $ echo ${param^^}     (uppercases all letters)
 PARADE
 $ param="parade"
 $ echo ${param~}      (reverses the first letter)
 Parade
 $ echo ${param~~}     (reverses all letters)
 PARADE
 $ param="PARADE"
 $ echo ${param,}      (lowercases the first letter)
 pARADE
 $ echo ${param,,}     (lowercases all letters)
 parade

Convert an uppercase SNAKE_CASE to Camel_Snake:

 sed -r "s/(^|_)(.)/\1\U\2/g" <<< ${TTL,,}

For more, see manipulate strings and String manipulation (Advanced Bash Scripting Guide).

Arrays

There are index arrays and associative arrays (new in bash 4).

Index arrays

Create an index array:

aa=(0 1 2 3 4)

Or read lines from a file, implicit delimiter is newline (needs bash 4):

 readarray aa < $FIL        # Include newline
 readarray -t aa < $FIL     # Exclude newline

Or read lines from the output of a command -- this formats the output of a file list on hoffman2:

  readarray -t FILES < <( ssh $NODE "ls -1d /work/pond/*.reserved" )
   
  for f in `seq 0 $[${#FILES[@]}-1]` ; do echo "" ; FIL=${FILES[$f]%.*} ; FIL=${FIL##*/}
    SRV="$( ssh $NODE "egrep -o 'wd.' ${FILES[$f]}/*" | cut -d":" -f2 )"
    readarray -t FLS < <( ssh $NODE "ls --color -Ahl /work/pond" | grep $FIL )
    for i in `seq 0 $[${#FLS[@]}-1]` ; do echo -e "\t$SRV \t${FLS[$i]}" ; done
  done

Pipe the output of a command in an array and output the array:

MYJOBS=( `myjobs | grep fsteen` )
printf -- '%s\n' "${MYJOBS[@]}"

Put the content of a variable into an array:

aa=( $FIL )

Place the variable into a numbered array member -- note no parens in this case:

NUM=1 BEG=1334602860
BEG[$NUM]="$( $DAT -d @$BEG +%Y%m%d%H%M%S )" ; echo ${BEG[$NUM]}

Take a line of values separated by variable amounts of space):

6996987 0.00000 mpg2h264-7 fsteen       r     01/10/2013 11:09:29 msa.q@n2189                        1

and place each of the values in an array (tr strips away the extra spaces)

 OFS=$IFS IFS=$'\n' ; for WAITJOB in `grep -h 'r' $MYJOBS | grep fsteen` ; do IFS=$OFS
   aa=($(echo "$WAITJOB" | tr -s ' ' ' ' ))
 done

Note you have to switch IFS twice. The individual variables can then be referenced by number, for instance

 echo ${aa[5]}

Place a string of variables into a variable first, and then put the content into an array to keep newlines:

B="$( for i in {01..12} ; do gdate -d "2010-$i-01" +%B ; gdate -d "2010-$i-01" +%b ; done )" A=( $B )

Create an array of lines even when there are spaces between strings (mplayer-slave-prep):

OFS=$IFS IFS=$'\n' V=( $( egrep "^% " ${FIL%\_*}.nav | grep visual ) ) IFS=$OFS

Add a string to a numbered array element of an array (the first time creates the array):

  errMsg[errMsgIndex]=$Msg

Read the output of a function or utility into an array line by line, so you can refer to it repeatedly without having to run the function again (/home/tna/.bashrc function ff; see also hoffman's ~/bin/i.sh)

 OFS=$IFS IFS=$'\n'
 for FIL in `ls $DIR/*4` ; do A=( `mp4info $FIL` )
   if [ "$( echo "${A[@]}" | grep Lavf )" != "" ] ; then
     if [ "$( echo "${A[@]}" | grep 52.78.3 )" != "" ]
       then echo -e "\n\t${FIL##*/} -- avidemux\n"
       else echo -e "\n\t${FIL##*/} -- ffmpeg\n"
     fi
     echo -e "\n\t${A[3]#* }\n\t${A[4]#*   }"  # #Chop strings with TAB
   fi
 done ; IFS=$OFS

Since you can #Chop strings with array elements, this is a very powerful way to handle strings!

Read a file into an array:

A=( $( < $FIL.txt ) )

Use a variable to name an array (cannot start with a number):

eval "N$i=( $( sed '1,3'd $DIR/drivemap2-$(date -d "-$i day" +%F) | cut -f5 ) )"

Get the number of elements in the variable-named array:

eval "array=(\${N$i[@]})" ; echo -e "\t$i\t${#array[@]}"

Get the number of elements in the array:

N="${#aa[@]}"

Show an array element at a time:

for n in `seq 0 ${#aa[@]}` ; do echo "${aa[$n]}" ; done

Show all the elements in the array:

echo ${aa[*]}

Shuffle the elements in the array:

shuf -e ${aa[*]}

Rewrite the shuffled array:

aa=(`shuf -e ${aa[*]}`)

Remove the second array element:

unset aa[1]

Remove the array:

unset aa

For more, see Arrays (Advanced Bash Scripting Guide).

Associative arrays

Associative arrays are new in bash 4. They are particularly useful for handling #Indirection and string mapping.

$ declare -A fullNames
$ fullNames=( ["lhunath"]="Maarten Billemont" ["greycat"]="Greg Wooledge" )
$ echo "Current user is: $USER.  Full name: ${fullNames[$USER]}."
Current user is: lhunath.  Full name: Maarten Billemont.

With the same syntax as for indexed arrays, you can iterate over the keys of associative arrays:

$ for user in "${!fullNames[@]}"
> do echo "User: $user, full name: ${fullNames[$user]}."; done
User: lhunath, full name: Maarten Billemont.
User: greycat, full name: Greg Wooledge.

Character sets

Some files contain non-UTF8 characters, and as a consequence are seen as binary files by unix utilities. This means they're not searchable by default.

The script cc-find-binary on roma identifies the line in a transcript with a non-utf8 character, making them easy to fix. It uses these commands:

To determine the character set in a file, issue

  file -bi $FIL

If you get something like "application/octet-stream; charset=binary" there is likely an odd character hidden in the file.

To quickly identify the location of the wayward characters in the file, issue

 iconv -f utf-8 -t utf-8//IGNORE $FIL > /tmp/$FIL

and then run a diff on the two files. If the problem characters were junk, which is often the case, use the converted file. Or do it straight:

 iconv -f utf-8 -t utf-8//IGNORE $FIL | sponge $FIL

Another method to locate the odd characters is to try

 strings $FIL

Or more clumsily add lines successively using

 head -n $number $F > /tmp/$F

and test the result until you get the line number. This will clean out common problem characters and take you straight to each problem line number:

 for F in in `cat illegal_characters.txt` ; do day $i 
    for i in $(seq 1 `sed -n '/^END/=' $F`) ; do head -n $i $F > /tmp/$F ; 
        if [ "`file -bi /tmp/$F | grep binary`" ] ; then nano +$i $F ; fi
    done
    sed -i "s/\xEF\xBF\xBD//g" $i
    for a in 00 02 03 06 0F 1B 19 ; do sed -i "s/\x$a//g" $F ; done
 done

You can also use isutf8 from the moreutils package:

  isutf8 $FIL  (check if a file or standard input is utf-8)

If you know the culprit -- say you know it's 0xc2 from a python exception:

  xxd -u $FIL | egrep ' C2|C2 '

Or in two steps if you want to inspect the hex output (same as hexedit):

  xxd -u $FIL > $FIL.hex
  egrep ' C2|C2 ' $FIL.hex

The utility iconv converts from one character set to another.

Recode latin-1 to utf-8

 recode l1..u8 $F
 for F in `ls -1 *seg` ; do if [ "$( isutf8 $F )" != "" ] ; then recode l1..u8 $F ; fi ; done

91 and 92 are the hex codes for open and close curly apostrophe (single quote) in the MS Windows default version of the latin1/ISO-8859-1 encoding, which is more specifically called cp1252/Windows-1252. To replace:

 $SED -i "s/\x92/'/g" $F

Or better, convert all characters from the cp1252 encoding to UTF-8:

 iconv -f CP1252 -t UTF-8 $FIL | sponge $FIL

Remove the DEL character:

 $SED -i -e 's/\x7F//g'

Remove the byte order mark:

 $SED -i -e '1s/^\xef\xbb\xbf//'

These are incorporated into cc-extract.

Remove all non-ASCII:

perl -i.bak -pe 's/[^[:ascii:]]//g' $FIL

Creates a $FIL.bak -- can be useful.

Compare files

Use comm to selectively display differences -- for instance, only the unique lines in the second file:

 comm -13 $FIL.ccx.out{1,2}

Using diff often produces very different results -- for instance, show the differences in two colums:

 diff $FIL.ccx.out{1,2} --suppress-common-lines -yw

Show only the different lines in the first file:

 diff --changed-group-format='%<' --unchanged-group-format= $FIL.ccx.out{1,2} 

Get a line count (xargs strips leading blanks):

 DIFf="$( diff --changed-group-format='%<' --unchanged-group-format='' \
$FIL.ccx.out{2,1} | wc -l | xargs )"

Another way to strip leading blanks from a line count is to use cat:

 Lines="$( cat ${FIL%.*}.csa | wc -l )"

Use cmp for bitwise comparisons:

 cmp $FIL.ccx.out{1,2}

Image size

Get the size of an image with php

php -r "print_r(getimagesize('2013-01-08_0600_FR_TV5_Le_Journal-000004.jpg'));"

Get image or video size With ffmpeg

SIZ="$( ffprobe $FFIL 2>&1 | grep Stream | grep Video | sed -r 's/.*\ ([0-9]{3,4}x[0-9]{3,4})\ .*/\1/' )"

Check frame size of several images

for i in $( ls -1d *q ) ; do echo -e "\t`identify -format "%w" $i/*-000002.jpg`\t$i"; done

Used in scripts such as imagesd and mpg2h264-daemon.

nano

To replace spaces with a TAB or vice versa, use verbatim input:

Ctrl-W Ctrl-r space space
Esc-V TAB

The Esc-V turns on verbatim input, or the literal value of a key.

The text editor nano can be used to remove line shifts as follows:

 nano -r99999 filename

Press Ctrl-j to justify the text to the given line width.

 xclip filename   (only works in X-windows)
 pbpaste filename (may work in OSX)

to capture the unjustified file for pasting (for instance into a wiki).

See also nano made easy.

To edit a list of files:

for i in `cat LOCKUP` ; do nano $i ; done

If you need to use part of the line only, for instance cutting out a final :4 as generated by a sort count (see /db/find):

for i in `cat LOCKUP` ; do nano ${i%:*} ; done

To unwrap lines,

 nano -w

and ctrl-j to justify if needed.

python

Call a python script from bash

python smt2csv.py $FIL | cut -d"," -f10-11 > FIL.csv

Or read into an array

aa=( $( python smt2csv.py $FIL | cut -d"," -f10 ) )
echo ${aa[*]}

Embed python code within a bash script

Assign to a variable available to bash:

#!/bin/bash
 
TEST=$(python << END
from pattern.en import wordnet
from pattern.en import ADJECTIVE
 
print wordnet.synsets('happy', ADJECTIVE)[0].weight
print wordnet.synsets('sad', ADJECTIVE)[0].weight
END
)
 
echo $TEST

or pipe directly to a file:

#!/bin/bash
 
python << END
 
# Load libraries
from pattern.en import wordnet
from pattern.en import ADJECTIVE
 
# Define output file
file=open('filename.txt','a')
 
# Compute values
Sentiment = str(wordnet.synsets('happy', ADJECTIVE)[0].weight)
 
# Append to file, one line each
file.write(Sentiment)
file.write("\n")
 
END

sudo

Used to allow user tna (a non-root user) access to privileged commands. Issue as root:

visudo

to edit the sudoers file. If a command is permitted for tna in the sudoers file, it can be used in scripts with sudo, for instance

sudo /sbin/modprobe

Include the whole path if it is not in tna's default path. If you get the error

sudo: no tty present and no askpass program specified

it is likely because the command is not listed in the sudoers file. See also this advice -- relevant as far as I can understand only if you actually do need to enter a password.

ssh -t

can be used for force the creation of a tty, again I'm assuming in cases where you want to be prompted for a password in a script.

screen

qdbus

Used to control GUI applications from the commandline.

kate through qdbus

Connect with x11:

ssh -X tna@roma
kate $FIL.txt

Get kate's PID:

qdbus | grep kate
org.kde.kate-8257

So in this case PID=8257.

Get a list of qdbus command options:

qdbus org.kde.kate-$PID

Commands for getting and setting document properties:

qdbus org.kde.kate-$PID /Kate/Document/1

Same information in xml format:

qdbus org.kde.kate-$PID /Kate/Document/1 Introspect 

This includes commands like "clear", "save", "cursorInText", "insertText", "insertTextLines", "insertLine", and "removeLine", but I don't know how they are called or used.

Check whether kate contains a document (say it does):

qdbus org.kde.kate-$PID /Kate/Document/1 Get org.kde.KTextEditor.Document empty
false

Get the number of lines in the text (say 2432):

qdbus org.kde.kate-$PID /Kate/Document/1 Get org.kde.KTextEditor.Document lines
2432

Get character encoding:

qdbus org.kde.kate-$PID /Kate/Document/1 Get org.kde.KTextEditor.Document encoding
ISO-8859-1

Output the full text in the remote shell (could be filtered and put back?):

qdbus org.kde.kate-$PID /Kate/Document/1 Get org.kde.KTextEditor.Document text

Save all the files (works):

qdbus org.kde.kate-$PID /kate/__KateMainWindow_1/actions/file_save_all trigger

Close all documents (prompts if modified):

qdbus org.kde.kate-$PID /DocumentManager org.kde.Kate.DocumentManager.closeAllDocuments

List of possible actions:

qdbus org.kde.kate-$PID /kate/__KateMainWindow_1/actions Introspect

Unsuccessful or less used commands

Open the file open dialogue in the current directory:

qdbus org.kde.kate-$PID /kate/__KateMainWindow_1/actions/file_open trigger

This does something, but it doesn't close the current document:

qdbus org.kde.kate-$PID /kate/__KateMainWindow_1/actions/view_close_current_space trigger

Is the document closed? (The command crashed kate when there was no document.)

qdbus org.kde.kate-$PID /DocumentManager org.kde.Kate.DocumentManager.closeDocument 1
false

Hide statusbar (accepts the command, but I see no effect):

qdbus org.kde.kate-$PID /kate/__KateMainWindow_1/actions/options_show_statusbar 1

Search for a particular method (say, print):

for part in `qdbus org.kde.kate-$PID`
  do qdbus org.kde.kate-$PID $part | grep method
done | grep -i print

wajig

Used for sysadmin work. See overview.

wmctrl

If you ssh -X tna@roma, your local desktop can be controlled from remote roma -- or of course remote applications running from roma:

tna@roma:~/t$ wmctrl -l
0x01c00180 -1 heron plasma-desktop
0x0360001f  0 heron x-nautilus-desktop
0x0340001c  0 heron roma : tna
0x03800024  0 heron 2006 – Dolphin
0x04200003  0 heron How to build a baby IV.Oct16-1.pdf
0x040000af  0 heron Hack and / - Automate your Desktop with wmctrl
0x03e31c59  0 heron Write: Intro: a communication model of perception
0x05a00190  0 heron 2011-09-22_Possibility.odg - LibreOffice Draw
0x05a00238  0 heron 2011-09-22_Timeline.odg - LibreOffice Draw
0x03e01326  0 heron Write: pdf request
0x03eccf00  0 heron Inbox - Main - Mozilla Thunderbird
0x0540005e  0 heron Web Accounts of various kinds – Konqueror
0x06200051  0 roma 2006-10-03_1000_CNN_Newsroom.txt [modified] – Kate

kml

On a Mac, Google Earth's location file is ~/Library/Application support/Google Earth/myplaces.kml (and myplaces.backup.kml).

In Google Earth, save My Places to a new location; it will create a .kmz file. Move it to the new computer and issue:

cp myplaces.kmz myplaces.zip
unzip myplaces.zip
cp doc.xml myplaces.kml

The new computer will now load the new locations file.

Example: A use case

Suppose an approved Red Hen researcher, working under a Red Hen professor, tracks tweets by a very prominent person, and wonders whether there is an influence each day from the early morning broadcasts of the show "Fox and Friends" on the content of the tweets. For starters, how shall the researcher discover whether the Red Hen datasets hold the broadcasts of "Fox and Friends" for the relevant dates? The researcher can look manually through the Edge Search Engine, but that is very slow. Suppose the researcher does the legwork to find 5 days on which Red Hen does not have the holdings, and 5 days on which Red Hen does have the holdings, and produces a .csv file that looks like this:

The underlying ascii file that is rendered by the GUI as you see it above has this kind of format:

$ cat TweetDates_txt.csv
10/15/2016,US_FOX-News_FOX_and_Friends.txt8/6/2016,US_FOX-News_FOX_and_Friends.txt7/16/2016,US_FOX-News_FOX_and_Friends.txt5/21/2016,US_FOX-News_FOX_and_Friends.txt5/21/2016,US_FOX-News_FOX_and_Friends.txt9/27/2016,US_FOX-News_FOX_and_Friends.txt7/5/2016,US_FOX-News_FOX_and_Friends.txt6/27/2016,US_FOX-News_FOX_and_Friends.txt5/11/2016,US_FOX-News_FOX_and_Friends.txt10/24/2016,US_FOX-News_FOX_and_Friends.txt

These are good .csv files, but the dates are not in Red Hen format. We need YYYY-MM-DD. All .txt files for broadcasts are in the directory /tv in Red Hen. The subdirectories of /tv are by year; sub-sub-directories are by month; and the sub-sub-sub-directories are by day.

The task is to change the .csv file into something that can be used in a Red Hen search.

First, let’s just get the dates:

$ cut -d, -f 1 TweetDates_txt.csv
10/15/20168/6/20167/16/20165/21/20165/21/20169/27/20167/5/20166/27/20165/11/201610/24/2016

But now, we want those dates in the right format for the directories and the filename, so

$ cut -d, -f 1 TweetDates_txt.csv | awk -v FS=/ -v OFS=- '{print $3,$1,$2}'
2016-10-152016-8-62016-7-162016-5-212016-5-212016-9-272016-7-52016-6-272016-5-112016-10-24

But we need the single digit numbers 0-9 to have a leading zero:

$ cut -d, -f 1 TweetDates_txt.csv | awk -v FS=/ -v OFS=-  '{print $3,$1,$2}' |sed 's/\<[0-9]\>/0&/g'
2016-10-152016-08-062016-07-162016-05-212016-05-212016-09-272016-07-052016-06-272016-05-112016-10-24

Now let’s save that output to a file:

$ cut -d, -f 1 TweetDates_txt.csv | awk -v FS=/ -v OFS=-  '{print $3,$1,$2}' |sed 's/\<[0-9]\>/0&/g' > TweetDates.lst
$ cat TweetDates.lst
2016-10-152016-08-062016-07-162016-05-212016-05-212016-09-272016-07-052016-06-272016-05-112016-10-24

Then, for every line in the file TweetDates_txt.csv, we want to move into the appropriate directory and check whether there are .txt holdings for Fox_and_Friends.

We can do that with a “for” loop. The Red Hen bash script “day” moves us into the appropriate directory. The command ls -l lists files matching a specified pattern. The command cd moves us back to the home directory.

So our for loop will look like this:

for line in $(cat TweetDates.lst) ; do echo "$line" ; day "$line" ; ls -l *Fox_and_Friends*txt ; cd ; done

The command

echo “$line”

has been included in this for loop just so that the output will specify where we are in the /tv tree for each report on a line.

Here’s what this for loop does: it looks at the line in the file, prints that line, then moves to the directory indicated by the line, then checks whether there are any .txt files for Fox_and_Friends in that directory, and prints failure if it doesn’t find any files or lists them if it does find them; then moves back to the home directory, then concludes and moves on to the next line and goes through such loops seriatim until all the lines are processed,

Like this:

$ for line in $(cat TweetDates.lst) ; do echo "$line" ; day "$line" ; ls -l *Fox_and_Friends*txt ; cd ; done
2016-10-15ls: cannot access *Fox_and_Friends*txt: No such file or directory2016-08-06ls: cannot access *Fox_and_Friends*txt: No such file or directory2016-07-16ls: cannot access *Fox_and_Friends*txt: No such file or directory2016-05-21ls: cannot access *Fox_and_Friends*txt: No such file or directory2016-05-21ls: cannot access *Fox_and_Friends*txt: No such file or directory2016-09-27-rw-r--r-- 1 tna tna 100259 Sep 27 2016 2016-09-27_0900_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 114310 Sep 27 2016 2016-09-27_1000_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 108579 Sep 27 2016 2016-09-27_1100_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 104309 Sep 27 2016 2016-09-27_1200_US_FOX-News_Fox_and_Friends.txt2016-07-05-rw-r--r-- 1 tna tna 88412 Jul 5 2016 2016-07-05_0900_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 112623 Jul 5 2016 2016-07-05_1000_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 116701 Jul 5 2016 2016-07-05_1100_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 98611 Jul 5 2016 2016-07-05_1200_US_FOX-News_Fox_and_Friends.txt2016-06-27-rw-r--r-- 1 tna tna 86328 Jun 27 2016 2016-06-27_0900_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 96517 Jun 27 2016 2016-06-27_1000_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 121296 Jun 27 2016 2016-06-27_1100_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 112610 Jun 27 2016 2016-06-27_1200_US_FOX-News_Fox_and_Friends.txt2016-05-11-rw-r--r-- 1 tna tna 99340 May 29 2016 2016-05-11_0900_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 107321 May 29 2016 2016-05-11_1000_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 124084 May 29 2016 2016-05-11_1100_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 116057 May 29 2016 2016-05-11_1200_US_FOX-News_Fox_and_Friends.txt2016-10-24-rw-r--r-- 1 tna tna 91085 Oct 24 2016 2016-10-24_0900_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 112728 Oct 24 2016 2016-10-24_1000_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 105670 Oct 24 2016 2016-10-24_1100_US_FOX-News_Fox_and_Friends.txt-rw-r--r-- 1 tna tna 101672 Oct 24 2016 2016-10-24_1200_US_FOX-News_Fox_and_Friends.txt

The report matches just what the researcher had already found by doing individual, manual searches in the Edge Search Engine. The difference is that this report took 1 second.

But now, if you had thousands of dates, you could check for all of them in a second.

(Note that the times given in the filenames are UTC. E.g., 0900 means 9am in Universal Time Coordinated.)

In *nix and bash, there are always many ways to get the same output, and there is surely a more efficient way of doing this, but it’s reasonably efficient, given that it doesn’t check the entire tree, the way a recursive "find" command would. The search occurs in only the day directories signaled by the lines in the file TweetDates.lst.

More precise searches would be refinements on this general strategy.

For example, the researcher could search for only F&F shows that were broadcast before a certain hour of the day, and that hour could vary with the date, provided the cut-off time was listed (in the right format) in the original .csv file

In further steps, the researcher could use command-line tools to search for overlap in keywords or even whole phrases between the tweet and the transcript (or closed-captions) of the Fox & Friends broadcasts. And so on.

It is important to recognize that Red Hen uses UTC, Universal Time Coordinated. The researcher might accordingly find it useful to convert the time of the tweets to UTC before doing searches. Here's a way to do that in a unix shell.

Convert a remote local time to UTC, independent of the time zone setting of the host:

date -ud 'TZ="America/New_York" 2012-01-03 09:00' '+%Y-%m-%d %H:%M %Z %z'

Because the enclosing ' and " quotes are mandatory, you cannot simply substitute variables, but this works:

T="2004-10-31 06:30"
eval "date -ud 'TZ=\"Europe/Paris\" $T'"

So directly to a variable:

NEW=$( eval "$DAT -ud 'TZ=\"America/New_York\" $DAY $TIM' +%Y-%m-%d_%H%M" )

This is the most reliable way to convert a remote local broadcast time to the Red Hen's UTC format.

If you need to add a number of seconds, place the timezone first:

eval "$DAT -ud 'TZ=\"$TMZ\" +$LEN seconds $DAY $TIM' +%Y%m%d%H%M%S"