|Tools| |Up to Jell: Parser Generator| |Using other features >

Jell: Parser Generator Sample usage


3.1 Sample usage

Jell is a simple parser generator that generates recursive descent parsers. Jell processes specifications and actions embedded in a skeleton java file and creates two java files which are then compiled to make the parser. The files generated are standalone in the sense that they require no runtime support classes from the jell system in order to do the parsing.

This section describes some toy examples which use jell to generate parsers.

3.1.1 Expression grammar

The next two sections describe a simple expression interpreter written with jell.

The first step is to write an LL(1) grammar for whatever needs to be parsed. A language for simple expressions can be defined as

expr ::= term ('+' term |'-' term) *
term ::= factor ('*' factor |'/' factor) *
factor ::= identifier | number | '-' factor | '(' expr ')'

A quick explanation about that notation. It uses a BNF like syntax where operators like + * ? and | have the same meaning as in regular expressions. The next step is to write the grammar in jell.

%{

import java.io.*;

public class parser
{
  TokenInputStream inp;

  public parser(TokenInputStream inp)
  { this.inp = inp; }

  token next_token()
  {
    try { return inp.nextToken(); }
    catch (IOException e)
      { e.printStackTrace(); return new simpletoken(token.EOF, 0); }
  }

  public static void main(String argv[]) throws IOException
  {
    parser p = new parser(new TokenInputStream(System.in));
    p.expr();
  }

%}

%terminal{ NUMBER PLUS MINUS MULT DIV LPAREN RPAREN %}

%nonterminal{ expr term factor %}

expr ::= term (PLUS term | MINUS term ) *  ;

term ::= factor (MULT factor |DIV factor ) *  ;

factor ::= NUMBER |
           MINUS factor |
           LPAREN expr RPAREN ;

%{

}
 
%}
There are main three portions to the file.

The header and trailer portions, which are enclosed within the {% ... %} sequences are included verbatim into the final file. These portions are intended to contain any support or initialization sequences for the parser, and in addition must define a method called token next_token() which is used by the parser to obtain the next token in the input.

In this case, we use a class that conveniently generates a stream of tokens, and you might want to take a look at jax or *JavaLex or the StreamTokenizer class as ways to implement such a stream.

The middle part of the specification declares terminals and nonterminals and associates rules with each nonterminal, and are quite similar to the extended BNF rules we wrote up for expressions. Notice the only difference is that symbolic names have replaced the terminals in the grammar.

To create a parser, we run jell on the specification file. From the root of the distribution, this is how this might work

% java sbktech.tools.jell.driver examples/expr.gram
This generates two files, parser.java and token.java.

parser.java contains the parser itself, with one method corresponding to each nonterminal. As you might expect, the parser enters a method whenever the corresponding nonterminal needs to be reduced, and returns when that nonterminal has been reduced.

token.java file defines an interface for a class called token which contains integer constants that identify the tokens used in the parser. It also contains one method int type() which the parser uses to identify the type of the token. The method next_token() that must be implemented should return objects that implement the token interface.

The parser starts the parse using the very first rule, and creates a method with the same name as the nonterminal. This method is the entry point to start the parse. In the example, the parse is initiated by calling the method expr() in the parser class.

3.1.2 Adding actions

Ok. So we can parse things, and now to actually be able to do anything useful, we have to embed code into it to do things at the right time.
%{

import java.io.*;

public class parser
{
  TokenInputStream inp;

  public parser(TokenInputStream inp)
  { this.inp = inp; }

  token next_token()
  {
    try { return inp.nextToken(); }
    catch (IOException e)
      { e.printStackTrace(); return new simpletoken(token.EOF, 0); }
  }

  public static void main(String argv[]) throws IOException
  {
    parser p = new parser(new TokenInputStream(System.in));
    System.out.println("Result is " + p.expr());
  }

%}
 
%terminal{ NUMBER:inttoken PLUS MINUS MULT DIV LPAREN RPAREN %}

%nonterminal{ expr:int term:int factor:int %}

expr:e  ::= term:t %{ e = t; %}
            (PLUS term:t1 %{e += t1; %} | MINUS term:t2 %{e -= t2; %} ) *  ;

term:t  ::= factor:f %{ t = f; %}
            (MULT factor:f1 %{ t *= f1; %} | DIV factor:f2 %{ t /= f2; %} ) * ;

factor:f %{ f = 0; %} ::=
            NUMBER:n %{ f = n.val(); %} |
            MINUS factor:f1 %{ f = -f1; %} |
            LPAREN expr:e %{ f = e; %} RPAREN ;

%{

}
 
%}
Code is embedded at places by placing them in a %{ ... %} sequence. Also, labels have been attached to symbols both in the rules and in the declarations portion. If a label appears in the declaration section for a nonterminal, it declares the type returned by the assoiciated function. For instance, the example says that the function for expr will return an integer. If a label appears in the terminal declaration portion, it declares the class returned by next_token for that terminal. The token returned for a NUMBER in this example is of type inttoken. If a label appears in a rule, it associates a variable to the object returned by that rule.

Notice that there is an action associated with the rule for factor that appears on the LHS of the definition. Such actions are assumed to be initialization sequences that are executed before the rest of the rule is parsed.

After compiling our parser as before, lets run it on a sample input

% java parser
3 - -2*4
^D
Result is 11
% 

3.1.3 Inheriting attributes

In the expression interpreter, the action computed at nodes worked bottom-up. Each rule computed a value and returned it to its caller. In turn, every rule only used values returned by its subrules.

At times, it is more convenient to pass a value into a rule. One situation is when parsing a set of declarations that look like

int fubar, glorp;
ZorkType x, y, z;
and we need to use that to fill up a symbol table with the variables declared and their associated types. A grammar to handle such declarations might look like
typeDecl ::= ID varList ';'
varList ::= ID ( ',' ID ) *
It is convenient to pass the first ID into the varList rule so it can update the symbol table and include the type of the variable. Part of a jell specification to do this might look like
%terminal{      ID:strToken COMMA SEMI %}
%nonterminal{   typeDecl varList[strToken] %}

typeDecl ::=
  ID:id varList[id] SEMI
  ;

varList[typeId] ::=
  ID:id          %{ Symbols.add(typeId, id); %}

  ( COMMA ID:id1 %{ Symbols.add(typeId, id1); %} ) *
  ;
The square brackets are how you can pass objects into a rule. When they appear in a nonterminal declaration, they specify the type of the object that is passed into the corresponding rule. When they appear in a rule, they label the variable that is associated with the object.

The nonterminal declaration says that the varList should be handed a strToken type whenever it appears in a rule. The typeDecl rule simply passes the value returned from the ID to varList. In the rule for varList itself, the value handed to it is labelled typeId, and the rest of the rules use this to update the symbol table as more ID's are parsed.


|Tools| |Up to Jell: Parser Generator| |Using other features >

KB Sriram
Comments, bug reports: kbs@sbktech.org

Revised: Sun May 26 09:41:08 1996
URL: http://www.sbktech.org/jell-ex1.html