Next: , Previous: , Up: Optimizer Idiom Language   [Index]


C.4 OIL Patterns

Patterns are simply OIL expressions; the expressions match either original INTERCAL input or expressions produced by earlier idioms. Each operator must match the same operator in the (possibly partially-optimised) input; the operands themselves are pattern templates specifying what operands in the input they can match.

One special simple form of match is possible: #NUMBER, where NUMBER is in decimal, matches a constant with that value. (Unlike in INTERCAL, this constant is not limited to being a onespot value; it is, however, limited to being at most twospot, as are all operands and intermediate values in OIL.)

Otherwise, an operand consists of the following parts, written in order:

  1. A character to specify the data type of what’s being matched. Usually, this will be _ to specify that any data type can be matched. In a few cases, you may want to use . or : to specify that you only want to match a onespot or twospot value respectively (that is, 16- or 32-bit). You can also use #; this specifies a value that can be any width, but must be known to be a constant with a known value at optimize time (either because it was hardcoded as a constant originally or because a constant was produced there by the optimizer, for instance via a constant folding optimization).
  2. Optionally, an expression in braces ({ and }). This expression is written in C, not OIL (as are all expressions in braces), and puts an extra condition on whether the pattern matches. The exact meaning of this will be explained later.
  3. A reference number, which must be one decimal digit long. A reference number of 0 causes the operand to be discarded immediately after matching; normally, you will want to specify a positive reference number. Two operands with the same reference number must be exactly the same for the pattern to match (for instance, both references to the same variable, or identical subexpressions). The reference number also allows the operand to be referenced by C expressions on other operands and by replacements. Reference numbers must be unique within the idiom (unless two or more reference numbers are deliberately the same so that the operands they reference have to be identical to produce a match), and they are scoped only as far as the containing idiom; they don’t carry from one idiom to another.

Note that syntax like #2 is ambiguous given what comes so far; the first interpretation is the one that is taken in practice, and if the second interpretation is wanted the operand should be expressed as #{1}2, using a no-op braced expression to tell them apart. This particular no-op is recommended because it’s detected and optimized by the OIL compiler.

Braced expressions, which must be written purely in C, add extra conditions; they must return nonzero to allow a possible match or zero to prevent one. They can reference the following variables and functions:

c
cNUMBER

This accesses a calculation made automatically by the compiled OIL program to identify which bits of the operand can possibly be set, and which ones cannot be. c by itself refers to the operand to which the braced expression is attached; if a number is given, it refers to another node (the number is interpreted as a reference number). The actual value of c is a 32-bit unsigned integer, each bit of which is true, or 1, if there is any chance that the corresponding bit of the operand might be 1, and false, or 0, if it’s known for certain that the corresponding bit of the operand is 0.

For instance:

_{!(c&4294901760LU)}1

The constant given here is FFFF0000 when expressed in hexadecimal; the point is that the expression matches any operand that is known to have a value no greater than 65535. Unless the operand is the argument to a unary AND, this check generally makes more sense than explicitly specifying . rather than _, because it will identify both 16- and 32-bit values as long as they’re small enough to fit into a onespot variable. This code could, for instance, be used to check that an argument to a mingle must be small enough before optimising it (this is important because an optimisation shouldn’t optimise an error – in this case, an overflowing mingle – into a non-error).

x
xNUMBER

x is like c, and refers to operands in the same way, except that it can only refer to an operand marked with #. It holds the value of that constant (a 32-bit unsigned integer), which will be known to the optimizer at optimize time. One common use of this is to detect whether a constant happens to be a power of 2, although there are many other possibilities that may be useful.

r

When inside a loop, r is the value of the loop counter. (It’s almost certainly a mistake if you have a loop but don’t reference the loop counter at least once, and usually at least twice, within the loop.) See OIL Loops.

and16
and32
or16
or32
xor16
xor32
iselect
mingle

These are all functions with one argument (apart from iselect and mingle, which each take two arguments); they exist so that INTERCAL operators can be used by C expressions. They all take unsigned longs as input and output, even if they are onespot operators. Note that it’s entirely possible for these to cause a compile-time error if used on invalid arguments (such as mingling with an operand over 65535), or to silently truncate an invalid argument down to the right number of bits; both of these should be avoided if possible, so the optimiser should check first to make sure that it doesn’t use any of these functions on invalid arguments.

xselx

This function returns its argument selected with itself; so xselx(c) is shorthand for iselect(c,c). When the argument is very complicated, this can save a lot of space in the original OIL program.

setbitcount

This function simply returns the number of bits with value 1 in its argument. This is sometimes useful with respect to various select-related optimisations, and can be a useful alternative to having to take logarithms in various situations.

smudgeright
smudgeleft

The smudgeright function returns its argument but with all the bits less significant than the most significant bit with value 1 set to 1; likewise, smudgeleft returns its argument with all the bits more significant than the least significant bit with value 1 set to 1.

Note that all OIL calculation is done internally using unsigned 32-bit numbers, and C expressions you write should do the same. The practical upshot of this is that you should write LU after any constant you write in C code; if you don’t do this, you are reasonably likely to get compiler warnings, and the resulting program may not work reliably, although the OIL compiler itself will not complain.

Here’s a more complicated example of an optimizer operand:

#{!(x&2863311530LU)&&iselect(x,1431655765LU)==
  xselx(iselect(x,1431655765LU))}3

It helps to understand this if you know that 2863311530 in hexadecimal is AAAAAAAA and 1431655765 in hexadecimal is 55555555. (It’s worth putting a comment with some frequently-used decimal constants in an OIL input file to help explain what these numbers mean and make the code more maintainable.) The operand matches any constant integer which has no bits in common with AAAAAAAA, and for which if any bit in common with 55555555 is set, all less significant bits in common with that number are also set.


Next: , Previous: , Up: Optimizer Idiom Language   [Index]