Removing duplicate $PATH entries



A naive solution would be to use sort + uniq
echo $PATH | sort -u
This sorts the input, and makes the dupes consecutive, thereby reducing the problemĀ  to O(n * log(n)) complexity.
The solution is wrong because the order of $PATH entries is significant and should not change.




The shortest, and arguably best solution is to use awk(1)
set "/Fred:/Wilma:/Pebbles:/Barny:/Betty:/Bamm bamm:/Dino:/Baby Puss:/Wooly"

echo "/Dino:${1}:/Pebbles:${1}" | awk -F: '{for (i=1;i<=NF;i++) { if ( !x[$i]++ ) printf("%s:",$i); }}'
The awk code separates the input by colons, stores path entries in an associative array for fast look-up.
The awk solution is short and fast, and best of all (g)awk has chaining hash-table: amortized O(1).
Awk one-liner courtesy of Jarett Stevens (firstname _DOT_ lastname _AT_ gmail _DOT_ com)

The result:
/Dino:/Fred:/Wilma:/Pebbles:/Barny:/Betty:/Bamm bamm:/Baby Puss:/Wooly




In pure shell code it's a bit more complex.
OPATH=$PATH
PATH="/George:/Jane:/Elroy:/Judy:/Rosie:/Astro:/Orbity:/Spacely:/Cogswell"
PATH="/Astro:${PATH}:/Elroy:${PATH}"

  OIFS=$IFS
  IFS=':'
  for A in $PATH
  do
    OOIFS=$IFS
    IFS=$OIFS
    for B in ${*}
    do
      [ -z "$B" ] && continue
      [ "$A" = "$B" ] && continue 2
    done
    IFS=$OOIFS

    # By this point no dupe was found
    set $* $A
    
# Reconstruct the $PATH
if [ -z "$RET_VAL" ]
then RET_VAL="$A"
else RET_VAL="${RET_VAL}:${A}"
fi

done IFS=$OIFS shift $# echo $RET_VAL
Two loops are used; the outer-loop iterates the $PATH entries(A) , the inter loop searches positional parameters (B) of previously seen words (A), and stores unseen words. The positional parameters are used in the same way as an indexed array. For every PATH word(A) all previously seen stored entries(B) are searched, which means the code is O(n^2) complex.

The result:
/Astro:/George:/Jane:/Elroy:/Judy:/Rosie:/Orbity:/Spacely:/Cogswell



Modern shells (Bash and ksh93 ) have built-in associative arrays, which significantly reduce the complexity.
OPATH=$PATH
PATH="/George:/Jane:/Elroy:/Judy:/Rosie:/Astro:/Orbity:/Spacely:/Cogswell"
PATH="/Astro:${PATH}:/Elroy:${PATH}"


  if [ -z "$BASH" ]
    then typeset -A FOO # ksh93
    else declare -A FOO # bash
  fi
  OIFS=$IFS
  IFS=':'
  for A in ${PATH}
  do
    [ -z "${FOO[${A}]}" ] || continue

    # By this point no dupe was found
    FOO[${A}]=${#FOO[*]}

    # Reconstruct the $PATH
if [ -z "$RET_VAL" ] then RET_VAL="$A" else RET_VAL="${RET_VAL}:${A}" fi done IFS=$OIFS echo $RET_VAL
This example has only one loop, and one associative array for storage.
The result is a modest O(n * log(n)), but sadly not compatible with Posix shells.

The result:
/Astro:/George:/Jane:/Elroy:/Judy:/Rosie:/Orbity:/Spacely:/Cogswell





Note that in the two previous shell-code implementations the actual $PATH is modified, but doesn't impact the code execution.
All the code uses pure shell code, no external non-builtin commands. While the awk solution is superior, it would fail run in the same scenario unless executed with the absolute path to awk.
Because awk is included as part of the Xopen Single unix spec, you can rely that it's there.




Comments