|Tools| <Using other features | |Up to Jell: Parser Generator|

Jell: Parser Generator Reference

3.3 Reference

3.3.1 Options

The driver for jell is in the class sbktech.tools.jell.driver and if you call it with no arguments, it will print out a list of the options it accepts.
java sbktech.tools.jell.driver
 [-tokenPackage packageName]
 [-tokenFile fileName]
 [-tokenClass className]
 [-parserFile fileName]

 -tokenPackage: Package into which the token interface must be placed
                (default none)
 -tokenFile   : File into which the token interface must be placed
                (default token.java)
 -tokenClass  : Class name for token interface
                (default token)
 -parserFile  : File into which the parser is generated
                (default parser.java)
 -grammar     : Just print out the grammar alone, devoid of all actions

3.3.2 Formal Syntax

For those that care, despite everything that is mentioned in the rest of the document, this is the grammar that jell understands.
program ::=
  (ACTION) ? (((term_defn | nt_defn) | rule_stmt)) + (ACTION) ?
term_defn ::=
  TERMINAL (term_ref) * END
term_ref ::=
nt_defn ::=
  NONTERMINAL (nt_ref) * END
nt_ref ::=
  ID (SQ_OPEN arglist SQ_CLOSE) ? (COLON ID) ?
arglist ::=
  (ID (COMMA ID) *) ?
rule_stmt ::=
  var_ref (ACTION) ? DERIVES or_expression SEMI
or_expression ::=
  cat_expression (OR cat_expression) *
cat_expression ::=
  unary_expression (unary_expression) *
unary_expression ::=
  singleton (((PLUS | QMARK) | STAR)) ?
singleton ::=
  (USER_SELECT) ? primary (ACTION) ?
primary ::=
  ((var_ref | PAREN_OPEN or_expression PAREN_CLOSE) | EPSILON)
var_ref ::=
  ID (SQ_OPEN varglist SQ_CLOSE) ? (COLON ID) ?
varglist ::=
  (ID (COMMA ID) *) ?
This definition was generated from the jell specification file for jell's parser.

3.3.3 Internals

These are my quick (incomplete) notes on the internals, so that I still remember where I kludged what parts ;-)

Jell uses a straightforward algorithms to compute FIRST/FOLLOW sets for the grammar, and constructs a recursive descent parser from it. Error repair and recovery are implemented using these sets to determine how to skip and insert tokens.

The input is specified in an EBNF format which is converted into a tree, one tree for each rule. In the current implementation, there is only one rule permitted per nonterminal.

Once all the trees have been created, the FIRST set for each node is computed by traversing all the trees until no more elements are added to any FIRST set. These rules are recursively used to compute the follow set.

FIRST = { terminal }
Or node
FIRST = FIRST(left) U FIRST(right)
Cat node
FIRST = FIRST(left) if left is nullable, add FIRST(right) to FIRST
Star node
FIRST = FIRST(child)
Plus node
FIRST = FIRST(child)
Qmark node
FIRST = FIRST(child)
The FOLLOW set computation for each node is also standard, and proceeds very similarly to the FIRST set computation, except that the data flows in (mostly) a top down fashion. These rules are used to update all the follow sets until no more changes are possible.
Cat node
Add FIRST(right) to FOLLOW(left)
Add FOLLOW to FOLLOW(right)
if right is nullable, add FOLLOW to FOLLOW(left)
Or node
Add FOLLOW to FOLLOW(right)
Add FOLLOW to FOLLOW(left)
Star node
Add FIRST(child) to FOLLOW(child)
Add FOLLOW to FOLLOW(child)
Plus node
Add FIRST(child) to FOLLOW(child)
Add FOLLOW to FOLLOW(child)
Qmark node
Add FOLLOW to FOLLOW(child)
The next step is to walk down the tree for each rule top down and generate the code for each node encountered. **document this**

Error recovery is implemented by passing down the first set of the next node that can be reduced whenever entering a subrule. On getting an error, the list of tokens that are used as a stop set consist of the union of all these first sets in the current call stack, and the expected set of the token where the syntax error occurred. **expand**.

3.3.4 Other java parser generators

3.3.5 What was stolen from whom and where

This is the place where I 'fess up that all I've done is a feeble attempt to reimplement the ideas stolen from smart person so and so. I'm sure this list isn't complete, but heck, just mail me if I've forgotten anything and I'll pop it in here.

Why LL parsers
I lurked in the PCCTS newsgroup too long and got brainwashed. Terence Parr is too busy to get it to do Java yet, so I figured I'd do a silly simple one. Also, LR confuses me. I have to remember derivations go right to left, while my noggin keeps heading left to right. No, fixing the noggin is not a solution.
Error recovery
Ell from the *cocktail compiler construction toolkit has the most neat idea of passing error recovery stacks. I just had to steal this one right away.
Thanks to Scott Hudson's JavaCUP, I was able to quickly get a preliminary version of both jax and jell.

|Tools| <Using other features | |Up to Jell: Parser Generator|

KB Sriram
Comments, bug reports: kbs@sbktech.org

Revised: Sun May 26 09:41:07 1996
URL: http://www.sbktech.org/jell-ref.html