Next: Optimizer Idiom Language, Previous: Character Sets, Up: Top [Index]
A variety of character sets have historically been used to represent
INTERCAL programs. Atari syntax was designed
specifically for use with ASCII-7, and all Atari-syntax-based
INTERCAL compilers accept that character set as
possible input. (C-INTERCAL also accepts Latin-1 and
UTF-8.) However, the story is more complicated with Princeton syntax;
the original Princeton compiler was designed to work with EBCDIC, but
because modern computers are often not designed to work with this
character set other character sets are often used to represent it,
particularly Latin-1. The CLC-INTERCAL compiler accepts
Latin-1, a custom dialect of EBCDIC, Baudot, and a punched-card format
as input; C-INTERCAL can cope with Latin-1 Princeton
syntax, but for the other character sets, for other compilers, or just
for getting something human-readable, it’s useful to have a
conversion program. convickt
is an
INTERCAL character set conversion program designed
with these needs in mind.
The syntax for using convickt
is
convickt inputset outputset [padding]
(that is, the input and output character sets are compulsory, but the parameter specifying what sort of padding to use is optional).
The following values for inputset and outputset are permissible:
Latin-1, or to give it its official name ISO-8859-1, is the character set most commonly used for transmitting CLC-INTERCAL programs, and therefore nowadays the most popular character set for Princeton syntax programs. Because it is identical to ASCII-7 in all codepoints that don’t have the high bit set, most of the characters in it can be read by most modern editors and terminals. It is also far more likely to be supported by modern editors than EBCDIC, Baudot, or punched cards, all of which have fallen into relative disuse since 1972. It is also the only input character set that C-INTERCAL supports for Princeton syntax programs. It uses 8 bit characters.
EBCDIC is an 8-bit character set that was an alternative to ASCII
in 1972, and is the character set used by the original Princeton
compiler. Unfortunately, there is no single standard version; the
version of EBCDIC used by convickt
is the one that
CLC-INTERCAL uses. It is the default input character
set that CLC-INTERCAL uses (although more recent
versions of CLC-INTERCAL instead try to guess the
input character set based on the input program.)
Baudot is a 5-bit character set with shift codes; therefore when
storing it in a file on an 8-bit computer, padding is needed to
fill in the remaining three bits. The standard Baudot character set
does not contain all the characters needed by
INTERCAL; therefore, CLC-INTERCAL
uses repeated shift codes to add two more sets of characters.
convickt
uses the CLC-INTERCAL version of
Baudot, so as to be able to translate programs designed for that
compiler; however, standard Baudot is also accepted in input if it
contains no redundant shift codes, and if the input contains no
characters not in standard Baudot, the output will be written so
that it is both correct standard Baudot and correct
CLC-INTERCAL Baudot for those characters.
This option causes convickt
to attempt a limited
conversion to or from Atari syntax; this uses ASCII-7 as the
character set, but also tries to translate between Atari and
Princeton syntax at the character level, which is sometimes but not
always effective. For instance, ?
is translated from
Atari to Princeton as a yen sign, and from Princeton to Atari as a
whirlpool (@
); this sort of behaviour is often capable
of translating expressions automatically, but will fail when
characters outside ASCII-7 (Atari) or Latin-1 (Princeton) are used,
and will not, for instance, translate a Princeton V
,
backspace, -
into Atari ?
, but instead
leave it untouched. ASCII-7 is a 7-bit character set, so on an 8
bit computer, there is one bit of padding that needs to be
generated; note, however, that it is usual nowadays to clear the
top bit when transmitting ASCII-7, which the
‘printable’ and ‘zero’ padding styles will
do, but the ‘random’ style may not do.
When using a character set where not all bits in each byte are specified, a third argument can be given to specify what sort of padding to use for the top bits of each character. There are three options for this:
Option | Meaning |
---|---|
printable | Keep the output in the range 32-126 where possible |
zero | Zero the high bits in the output |
random | Pad with random bits (avoiding all-zero bytes) |
Note that not all conversions are possible. If a character cannot be converted, it will normally be converted to a NUL byte (which is invalid in every character set); note that this will prevent round-tripping, because NUL is interpreted as end-of-input if given in the input. There is one exception; if the character that could not be converted is a tab character, it will be converted to the other character set’s representation of a space character, if possible, because the two characters have the same meaning in INTERCAL (the only difference is if the command is a syntax error that’s printed as an error message). (The exception exists to make it possible to translate existing INTERCAL source code into Baudot.)
Next: Optimizer Idiom Language, Previous: Character Sets, Up: Top [Index]