|Tools| <Sample usage | |Up to Jell: Parser Generator| |Reference >

Jell: Parser Generator Using other features


3.2 Using other features

Grammar specifications can have rules that make the language non LL(1). Sometimes, languages just have constructs that cannot be expressed as an LL(1) grammar. Jell provides hooks to deal with such cases. Parsers must also deal with errors in the input gracefully. Jell generates parsers which automatically provide error reports and recover from errors.

This section describes features jell provides to deal with both these problems.

3.2.1 Resolving ambiguous rules

Jell generates predictive parsers and will warn you about any rules that cannot be uniquely predicted with one token of lookahead. Jell will still generate a parser using some of the default strategies to resolve such conflicts explained below. In addition, you can explicitly aid the parse by adding additional code to pick one of the ambiguous rules.

One type of ambiguity is when there are two choices to continue parsing a rule, and the first token that can appear in each choice is the same. For instance, if we want a grammar to parse both declarations and assignment statements, and we write this set of rules for it.

stmt     ::= assignment | typeDecl  ;

typeDecl ::= ID varList SEMI ;
varList  ::= ID (COMMA ID) * ;

assignment ::= ID ASSIGN NUMBER;
Jell will give this warning for the grammar
Warning: or/or ambiguous for stmt on seeing { ID }
Both typeDecl and assignment start with an ID, so with one token of lookahead jell is unable to decide which rule to use. Jell will generate a parser anyway, and it will arbitrarily pick the first rule (in this case assignment) if it encounters an ID in its input.

Jell provides a way to embed directives to control the parsing for such cases through the %if{ directive, and also provides a peek() method that lets you peek into tokens further down the stream.

stmt    ::=
  (%if{ peek(2).type() == token.ASSIGN %} assignment) |
  typeDecl ;
The code within the %if{ .. %} is taken as a boolean expression and added as an extra conditional in the generated parser. When an ID is seen on the input, the parser also applies the conditional to decide which of the two rules to pick.

Another source of ambiguity is of the dangling else type.

stmt ::= IF expr THEN stmt (ELSE stmt)?
         |
         OTHER ;
On processing this grammar, jell generates the warning
Warning: ``?'' choices for stmt ambiguous on { ELSE }
The parser can either choose to reduce the else part to empty or it can try to expand the ELSE part of the rule. Jell will generate a parser that will always enter the ELSE portion, which corresponds to matching up else's with the closest if's, and is similar to how LR parsers deal with such ambiguities.

Similar ambiguities can arise for * or + choices, and in all cases, jell will generate a parser that will try to match as much of the input as possible before returning from the choices.

3.2.2 Error reports

Jell will automatically provide syntax error reports and employ some heuristics to recover from errors. If you give the parser generated by the expression grammar example the input
1 - (-2 3 + (-4 - - 5 )
It gives the following error messages
Line 1: 3: Syntax error, expected { PLUS MINUS MULT DIV RPAREN EOF }
Line 1: PLUS: restarting
Line 2: EOF: Syntax error, expected { RPAREN }
Line 2: EOF: restarting
Inserting RPAREN
On detecting an error, jell first reports the location of the error and the local set of tokens that would have made sense for the parser. Next, it starts skipping tokens (maybe none) until it find one it can use to restart the parse. This new location is reported. Finally, if any tokens need to be inserted in order to "resynchronize" the parse, these are added and also reported.

All errors ultimately percolate down to the method int error(String s) which can be overridden if you wish to keep track of any statistics and so on. For finer grained control of error tracking, there are three methods that are used to handle each type of message generated by jell.

Syntax errors are first handed to the method void syntaxError(token cur_token, int expectedTokens[]). The integer array contains the set of token id's that were expected to be present.

Restart messages are handed to the method void restartMessage(token restartToken) which is given the token where the parse will resume.

Any tokens that are inserted are passed to the method void insertMessage(int insertedTokenId) which is given the id (the constant in the token interface) of the inserted token.

The strategy employed by jell to recover from errors isn't particularly smart, but does reasonably well on grammars for expressions and common language constructs like it-then-else or while statements.

3.2.3 Actions and errors

How does jell deal with actions or values returned from rules that encounter input containing errors?

Actions are not executed while skipping tokens, but a value returned from a rule may potentially be uninitialized because the parse failed for that portion of the rule. Therefore, remember to always initialize any values that can be returned from a rule (the java compiler also will warn about any uninitialized variables it detects) and always remember to check if a valid value has been returned from a rule.

3.2.4 Dealing with yacc constructs

Yacc is an LALR parser generator, and understands a larger set of grammars than jell. If you are working with an existing yacc grammar, you will probably find it worthwhile to check if Scott Hudsons *CUP parser generator is suitable for your application.

If you are used to working with yacc grammars, the most important problem is to avoid left recursion while writing jell grammars. Many yacc rules take the form

listOfSomeRule : listOfSomeRule someRule | someRule ;
The listOfSomeRule production is left recursive, and jell cannot handle it as it is written. Rewrite such rules using EBNF notation
listOfSomeRule ::= someRule + ;
If the yacc rule allowed listOfSomeRule to also reduce to an empty production, you could rewrite it as
listOfSomeRule ::= someRule * ;
Use the zero-or-one rule when appropriate, which generates more efficient parsers and is succint. A yacc rule of the form
rule : choicePart requiredPart | requiredPart ;
can be represented as
rule ::= choicePart ? requiredPart ;

3.2.5 Misc stuff

These are a few more factoids about jell specifications.
|Tools| <Sample usage | |Up to Jell: Parser Generator| |Reference >

KB Sriram
Comments, bug reports: kbs@sbktech.org

Revised: Thu Aug 1 22:55:19 1996
URL: http://www.sbktech.org/jell-ex2.html