Top‎ > ‎Unix & Linux trinkets‎ > ‎Little utilities‎ > ‎

b64 and b85: base64 and ASCII85 encoding awk scripts

In another project I needed a self-contained shell script to extract a bunch of images from a tar archive but I didn't want the extra file so I included a hex dump of the tar file in the script and decoded it at run time.  Hex, however is wasteful; it's a 2:1 ratio.  I looked at base 64 mime encoding but that required utilities that may or may not exist on all target systems.  I didn't want my shell script to try to compile my own because there was no guarantee that all targets had a C compiler.  Doing it in pure bourne shell was possible but very very slow, so I looked at awk.

The problem with awk is that it lacks the ability to read pure binary data to do encoding.  Solution: have awk use od (octal dump) convert the input from binary to decimal values.

This worked great, I could then include a shell function that calls awk right and then pipe "here-document" data through the function and then pipe the output into tar.

I then shortened the script as much as possible to reduce size, so it's almost unreadable I'm afraid.

After a while, I discovered an article about ASCII85 on wikipedia and wrote an awk implementation of it using the same tricks I had used for base64.

You can find the scripts in the attachments, but they're pasted here for your perusal pleasure, or masochism.

Both of these scripts are less than one kilo-byte each..

Note that if you plan on running these on Solaris, use the /usr/xpg4/bin versions of awk and od as the other ones are quite brain-dead.

b64:

#!/usr/bin/awk -f
function encode64() {
  while( "od -v -t x1" | getline ) {
    for(c=9; c<=length($0); c++) {
      d=index("0123456789abcdef",substr($0,c,1));
      if(d--) {
        for(b=1; b<=4; b++ ) {
          o=o*2+int(d/8); d=(d*2)%16;
          if(++obc==6) {
            printf substr(b64,o+1,1);
            if(++rc>75) { printf("\n"); rc=0; }
            obc=0; o=0;
          }
        }
      }
    }
  }
  if(obc) {
    while(obc++<6) { o=o*2; }
    printf "%c",substr(b64,o+1,1);
  }
  print "==";
}
function decode64() {
  while( getline < "/dev/stdin" ) {
    for(i=1;i<=length($0);i++) {
      c=index(b64,substr($0,i,1));
      if(c--) {
        for(b=0;b<6;b++) {
          o=o*2+int(c/32); c=(c*2)%64;
          if(++obc==8) { printf "%c",o; obc=0; o=0; }
        }
      }
    }
  }
}
BEGIN {
  b64="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
  if(ARGV[1]=="d") { decode64(); exit; }
  encode64();
}



b85:

#!/usr/bin/awk -f

function l(){if(++p>72){p=0;print t}}

function s(v,b){
 if(v==0) {
   printf(z);l()
 } else {
  o=t;for(n=0;n<5;n++){o=o substr(h,(v%g)+1,1);v/=g}
  while(n>4-b){printf(c,substr(o,n--,1));l()}
 }
}

function e(){
 printf("<~");
 while("od -vtu1"|getline){
  for(y=1;y<NF;){i+=($(++y)*m);m/=a;if(++k>3){s(i,k);k=i=0;m=j}}
 }
 if(k) s(i,k);
 p++;l();print "~>"
}

function d(){
 while(!f&&getline<"/dev/stdin"){if(substr($0,1,2)=="<~") f=1}
 while(f){
  while(++p<=length($0)&&f){
   n=index(h,substr($0,p,1));if(substr($0,p,2)=="~>") f=0;
   if(n>g) {
    printf(c c c c,0,0,0,0)
   } else {
    if(n--){
     q=g*q+n;
     if(++i>4){
      while(--i){printf(c,(q/j)%a);q*=a}
      q=0
     }
    }
   }
  }
  p=0;if(!getline<"/dev/stdin") f=0
 }
 if(i) { q=++q*g^(5-i--); while(i--){printf(c,(q/j)%a);q=a*(q-(j*int(q/j)))} }
}

BEGIN{
 z=h="z";c="%c";a=256;i=g=85;m=j=a^3;p=2;while(i){h=sprintf(c,32+i--) h}
 if(ARGV[1]=="d") d();
 else e()
}


Example use:

B$ t="This is the input text used in the encoding comparisons that are to follow.  I'm making this long enough so that overhead will not be too significant in   the results.  ASCII85 encoding adds an extra two bytes at the beginning ( <~ ) and 2 at the end ( ~> ) so it has a little disadvantage over base64 mime encoding, but that is quickly nullified with any but the smallest input texts."

B$ echo "$t" | b64

VGhpcyBpcyB0aGUgaW5wdXQgdGV4dCB1c2VkIGluIHRoZSBlbmNvZGluZyBjb21wYXJpc29ucyB0
aGF0IGFyZSB0byBmb2xsb3cuICBJJ20gbWFraW5nIHRoaXMgbG9uZyBlbm91Z2ggc28gdGhhdCBv
dmVyaGVhZCB3aWxsIG5vdCBiZSB0b28gc2lnbmlmaWNhbnQgaW4gICB0aGUgcmVzdWx0cy4gIEFT
Q0lJODUgZW5jb2RpbmcgYWRkcyBhbiBleHRyYSB0d28gYnl0ZXMgYXQgdGhlIGJlZ2lubmluZyAo
IDx+ICkgYW5kIDIgYXQgdGhlIGVuZCAoIH4+ICkgc28gaXQgaGFzIGEgbGl0dGxlIGRpc2FkdmFu
dGFnZSBvdmVyIGJhc2U2NCBtaW1lIGVuY29kaW5nLCBidXQgdGhhdCBpcyBxdWlja2x5IG51bGxp
ZmllZCB3aXRoIGFueSBidXQgdGhlIHNtYWxsZXN0IGlucHV0IHRleHRzLgo==

B$ echo "$t" | b85

<~<+oue+DGm>FD,5.Bl7m4F<G[:G]Y'NF(Jl)Bl5&8BOr;tDI[TqBl7Q+@rH4'@<-('Df0V=F
D,*)+CT;%+EVNEAoDL%Dg*fV+A!qt+DkP&Bl7Q+FD,B0+Dbt6B-:c'Dfo]++EMHDFD,*)+E)F
7EbK#mA0?)1Cht53Dfd+2AKZ)5D]j+8B5VEqBk(RhF<G:8+<VeKBOr<,ATN!1FE9&W+@/pn8P
(m!+D#G#De*R"B-:VnA9/l%DBNM8FE1e4FE_XG@X3',F!+n5+EV:.+C\npBl7g&DJ((?+?Y)q
.3N&:A0<WM@<<W6BOr;tDIak<+FZKs.3N\M+DGp?BOPs)@3BB#FED>1+Co2-@:XOiDKK<"AKY
o7ATAo&@<6!<1a$XLD.Oi$DI[TqBl7Q7+C]J8+EV:*F<G:=+E;O<@r#n++Du=<Ch[KqARlp-B
ln#2@;^?5@Wcc8FD,5.F)>?%Ch7[0+DG_4F`\aJAU&<</d_~>

B$ echo "$t" | b85 | b85 d

This is the input text used in the encoding comparisons that are to follow.  I'm making this long enough so that overhead will not be too significant in   the results.  ASCII85 encoding adds an extra two bytes at the beginning ( <~ ) and 2 at the end ( ~> ) so it has a little disadvantage over base64 mime encoding, but that is quickly nullified with any but the smallest input texts.

ċ
b64
(1k)
Danny Chouinard,
Sep 11, 2010, 2:19 AM
ċ
b85
(1k)
Danny Chouinard,
Sep 11, 2010, 2:19 AM