|Tools| <Using other features| |Up to Jax: Scanner Generator| |Internals >

Jax: Scanner Generator Reference


2.3 Reference

This is a summary of features jax provides, and does not provide.

2.3.1 Methods and variables defined by jax

These are methods that are embedded into the java file after jax processes the jax specification.
int jax_next_token() throws IOException
This is used to start the matching process. It will return an integer value returned from an action. This function will return -1 when EOF is reached on the input stream. If you call break; from within an action, this function will continue scanning from where it left off instead of returning.
void init(InputStream inp) throws IOException
This is used to prime the lexer, and must be called prior to calling jax_next_token() or strange things will happen.
String jax_text()
This can be used to return a string containing the matched section of the regular expression.
void jax_switch_state()
This is used to switch the scanner into a new state if there are any %state directives in the file.
int jax_cur_line
This is a variable that keeps track of the current line if you use the %line directive.
int jax_cur_char
This is a variable pointing to the current character position in the input file if you use the %char directive.
In addition, there are a few internal methods that are intended not to be used except by the lexer.

2.3.2 Options

The driver for jax is in the class sbktech.tools.jax.driver and calling it without any arguments will print out a synopsis.
java sbktech.tools.jax.driver [-lexFile outputFileName] [-i] inputFileName
Hand it an input file, and it will create by default a file called lexer.java which you can override with the -lexFile option.

The -i option generates a case insensitive scanner. The case of the matched text is preserved, so jax_text() still returns the original text.

2.3.3 Formal Syntax

For those that care, despite everything that is mentioned in the rest of the document, this is the grammar that jax understands.
st ::=
  (VERBATIM) ? ((((lexStatement | stateStatement) | LINE_DIRECTIVE) | CHAR_DIRECTIVE)) + (VERBATIM) ?
lexStatement ::=
  PATTERN or_expr PATTERN (VERBATIM) ? SEMI
or_expr ::=
  cat_expr (OR cat_expr) *
cat_expr ::=
  singleton (singleton) *
singleton ::=
  (((DOT | CHAR) | fullccl) | PAREN_OPEN or_expr PAREN_CLOSE) (((STAR | PLUS) | QMARK)) ?
fullccl ::=
  SQUARE_OPEN (CARET) ? ccl SQUARE_CLOSE
ccl ::=
  (((CHAR DASH CHAR | CHAR) | DOT)) *
stateStatement ::=
  STATE (NAME) +
This grammar was generated from the jell specification for jax.

2.3.4 Goofy quirks

This is a catalog of things that I know are different from f/lex about jax's syntax. Some of them are intentional, others are because I'm too lazy to fix it, and others are bugs.

2.3.5 Known bugs and limitations

2.3.6 Jax can trigger javac bug

On very large scanners, jax can generate code that will trigger a bug in the 1.02 JDK, which causes the javac compiler to fail with a UTFDataFormatException on reading a compiled scanner.

Jax will warn if it is about to generate such a scanner, and there are two (well, three if you count pestering Sun to fix it) solutions.

2.3.7 Other java lexers

This is only what I know, please do mail me if you are aware of other additions to this list.
|Tools| <Using other features| |Up to Jax: Scanner Generator| |Internals >

KB Sriram
Comments, bug reports: kbs@sbktech.org

Revised: Sat Sep 21 12:59:18 1996
URL: http://www.sbktech.org/jax-ref.html