Commodore‎ > ‎

PETSCII to ASCII

This page describes a useful program that I wrote.  PET2ASC converts "vintage" PETSCII / ASCII-X text into one of several "respected" (well-known) character sets.
 
 
Synopsis
 
PET2ASC accepts a single file (or else a directory of files) as input and generates an output file (or else a directory of output files).  Each file will be read as either a lower/upper PETSCII file (see PETSCII) or else an uppger/graphic PETSCII file (see ASCII-X).  In either case, it will create an output file (or directory of files) in a specific output format.

Each generated output file has 3 characteristics:
  1. Translated character set (one of ASCII, ISO-8859-1, Unicode(16), or Windows-1252).
  2. Translated new-line (one of CR [CP/M, Mac9-, etc.], LF [Mac10+, Linux, UNIX, etc.], LF+CR [rare], or CR+LF [DOS/Win]).
  3. Encoding method ("raw byte", UTF-16LE, UTF-16BE, or UTF-8).
The program is designed to run from a "command prompt" (CMD.EXE or COMMAND.COM" in Windows)... it is not a graphical-user-interface (GUI) program!  (Because it was designed for programmers/hackers).
 
 
Using PET2ASC
 
In its simplest form, PET2ASC only requires two parameters: an input file (or directory) and an output file (or directory).  NOTE: if the input is a directory, the output argument must also be a directory!

A third parameter "options" is allowed (but not required).  If present, it tells how to translate characters from the "source" to the "target".  Here are the available options:
  • v -- Verbose (list files processed within a directory).
  • l -- lower/upper PETSCII input [default]
  • g -- graphic/upper PETSCII input
  • B -- add BOM to start of 16-bit encoded output
  • a -- ASCII set output [default]
  • i -- ISO-8859 set output
  • I -- Unicode set output
  • w -- Win-1252 set output
  • b -- Byte encoded output [default]. Unicode set not allowed!
  • u -- (little endian) 16-bit encoded output
  • U -- (big endian) 16-bit encoded output
  • 8 -- UTF-8 encoded output.
  • c -- CP/M [Mac 9-] new line output (CR)
  • d -- DOS/Win new line output (CR+LF) [default]
  • m -- Acorn / RISC OS (LF+CR).
  • p -- POSIX/UNIX [Mac 10+] new line output (LF).
  • 0 -- Don't translate control codes [default]
  • 1 -- Also no translate British pound (0x5c)
  • 2 -- Also no translate left-arrow (0x5e)
  • 3 -- No translate all of above (control, 0x5c, 0x5e)
  • 4 -- Also no translate pi (0xff)
  • 5 -- No translate pi or pound (control, 0x5c, 0xff)
  • 6 -- No translate pi or underscore (control, 0x5e, 0xff)
  • 7 -- No translate all of above (control, 0x5c, 0x5e, 0xff)
You may be view a list of OPTIONS with the command "pet2asc options".  You may view examples with "pet2asc examples".  You can view some trivial details about the trans-coding with "pet2asc details"  (see the links above for details about PETSCII and ASCII-X codes).
NOTE: multiple options are combined without any extra characters... for example, if you want an output of UTF-8 encoded Unicode (with default DOS/Win end-of-line), then try this:
pet2asc input.txt output.txt -8I
Also Windows users should know that some programs will fail to correctly read text without a "BOM" (byte-order-marker)... so another option (if the above fails) is:
pet2asc input.txt output.txt -B8I
 
 
Source
 
What?  You don't like my PET2ASC program?  Well fix it! The source code is available here.  It was compiled with Visual C++ (because it uses a few MFC classes, a little work will be needed to port it to Linux [for example]).


© H2Obsession, 2016
Comments