Next: , Previous: , Up: Top   [Index]


Appendix B convickt

A variety of character sets have historically been used to represent INTERCAL programs. Atari syntax was designed specifically for use with ASCII-7, and all Atari-syntax-based INTERCAL compilers accept that character set as possible input. (C-INTERCAL also accepts Latin-1 and UTF-8.) However, the story is more complicated with Princeton syntax; the original Princeton compiler was designed to work with EBCDIC, but because modern computers are often not designed to work with this character set other character sets are often used to represent it, particularly Latin-1. The CLC-INTERCAL compiler accepts Latin-1, a custom dialect of EBCDIC, Baudot, and a punched-card format as input; C-INTERCAL can cope with Latin-1 Princeton syntax, but for the other character sets, for other compilers, or just for getting something human-readable, it’s useful to have a conversion program. convickt is an INTERCAL character set conversion program designed with these needs in mind.

The syntax for using convickt is

convickt inputset outputset [padding]

(that is, the input and output character sets are compulsory, but the parameter specifying what sort of padding to use is optional).

The following values for inputset and outputset are permissible:

latin1

Latin-1, or to give it its official name ISO-8859-1, is the character set most commonly used for transmitting CLC-INTERCAL programs, and therefore nowadays the most popular character set for Princeton syntax programs. Because it is identical to ASCII-7 in all codepoints that don’t have the high bit set, most of the characters in it can be read by most modern editors and terminals. It is also far more likely to be supported by modern editors than EBCDIC, Baudot, or punched cards, all of which have fallen into relative disuse since 1972. It is also the only input character set that C-INTERCAL supports for Princeton syntax programs. It uses 8 bit characters.

ebcdic

EBCDIC is an 8-bit character set that was an alternative to ASCII in 1972, and is the character set used by the original Princeton compiler. Unfortunately, there is no single standard version; the version of EBCDIC used by convickt is the one that CLC-INTERCAL uses. It is the default input character set that CLC-INTERCAL uses (although more recent versions of CLC-INTERCAL instead try to guess the input character set based on the input program.)

baudot

Baudot is a 5-bit character set with shift codes; therefore when storing it in a file on an 8-bit computer, padding is needed to fill in the remaining three bits. The standard Baudot character set does not contain all the characters needed by INTERCAL; therefore, CLC-INTERCAL uses repeated shift codes to add two more sets of characters. convickt uses the CLC-INTERCAL version of Baudot, so as to be able to translate programs designed for that compiler; however, standard Baudot is also accepted in input if it contains no redundant shift codes, and if the input contains no characters not in standard Baudot, the output will be written so that it is both correct standard Baudot and correct CLC-INTERCAL Baudot for those characters.

atari

This option causes convickt to attempt a limited conversion to or from Atari syntax; this uses ASCII-7 as the character set, but also tries to translate between Atari and Princeton syntax at the character level, which is sometimes but not always effective. For instance, ? is translated from Atari to Princeton as a yen sign, and from Princeton to Atari as a whirlpool (@); this sort of behaviour is often capable of translating expressions automatically, but will fail when characters outside ASCII-7 (Atari) or Latin-1 (Princeton) are used, and will not, for instance, translate a Princeton V, backspace, - into Atari ?, but instead leave it untouched. ASCII-7 is a 7-bit character set, so on an 8 bit computer, there is one bit of padding that needs to be generated; note, however, that it is usual nowadays to clear the top bit when transmitting ASCII-7, which the ‘printable’ and ‘zero’ padding styles will do, but the ‘random’ style may not do.

When using a character set where not all bits in each byte are specified, a third argument can be given to specify what sort of padding to use for the top bits of each character. There are three options for this:

Option Meaning
printable Keep the output in the range 32-126 where possible
zero Zero the high bits in the output
random Pad with random bits (avoiding all-zero bytes)

Note that not all conversions are possible. If a character cannot be converted, it will normally be converted to a NUL byte (which is invalid in every character set); note that this will prevent round-tripping, because NUL is interpreted as end-of-input if given in the input. There is one exception; if the character that could not be converted is a tab character, it will be converted to the other character set’s representation of a space character, if possible, because the two characters have the same meaning in INTERCAL (the only difference is if the command is a syntax error that’s printed as an error message). (The exception exists to make it possible to translate existing INTERCAL source code into Baudot.)


Next: , Previous: , Up: Top   [Index]