rply
¶
-
class
rply.
LexerGenerator
¶ A LexerGenerator represents a set of rules that match pieces of text that should either be turned into tokens or ignored by the lexer.
Rules are added using the
add()
andignore()
methods:>>> from rply import LexerGenerator >>> lg = LexerGenerator() >>> lg.add('NUMBER', r'\d+') >>> lg.add('ADD', r'\+') >>> lg.ignore(r'\s+')
The rules are passed to
re.compile()
. If you need additional flags, e.g.re.DOTALL
, you can pass them toadd()
andignore()
as an additional optional parameter:>>> import re >>> lg.add('ALL', r'.*', flags=re.DOTALL)
You can then build a lexer with which you can lex a string to produce an iterator yielding tokens:
>>> lexer = lg.build() >>> iterator = lexer.lex('1 + 1') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Token('ADD', '+') >>> iterator.next() Token('NUMBER', '1') >>> iterator.next() Traceback (most recent call last): ... StopIteration
-
add
(name, pattern, flags=0)¶ Adds a rule with the given name and pattern. In case of ambiguity, the first rule added wins.
-
build
()¶ Returns a lexer instance, which provides a lex method that must be called with a string and returns an iterator yielding
Token
instances.
-
ignore
(pattern, flags=0)¶ Adds a rule whose matched value will be ignored. Ignored rules will be matched before regular ones.
-
-
class
rply.
ParserGenerator
(tokens, precedence=[], cache_id=None)¶ A ParserGenerator represents a set of production rules, that define a sequence of terminals and non-terminals to be replaced with a non-terminal, which can be turned into a parser.
Parameters: - tokens – A list of token (non-terminal) names.
- precedence – A list of tuples defining the order of operation for avoiding ambiguity, consisting of a string defining associativity (left, right or nonassoc) and a list of token names with the same associativity and level of precedence.
- cache_id – A string specifying an ID for caching.
-
error
(func)¶ Sets the error handler that is called with the state (if passed to the parser) and the token the parser errored on.
Currently error handlers must raise an exception. If an error handler is not defined, a
rply.ParsingError
will be raised.
-
production
(rule, precedence=None)¶ A decorator that defines a production rule and registers the decorated function to be called with the terminals and non-terminals matched by that rule.
A rule should consist of a name defining the non-terminal returned by the decorated function and a sequence of non-terminals and terminals that are supposed to be replaced:
replacing_non_terminal : ATERMINAL non_terminal
The name of the non-terminal replacing the sequence is on the left, separated from the sequence by a colon. The whitespace around the colon is required.
Knowing this we can define productions:
pg = ParserGenerator(['NUMBER', 'ADD']) @pg.production('number : NUMBER') def expr_number(p): return BoxInt(int(p[0].getstr())) @pg.production('expr : number ADD number') def expr_add(p): return BoxInt(p[0].getint() + p[2].getint())
If a state was passed to the parser, the decorated function is additionally called with that state as first argument.
-
class
rply.
ParsingError
(message, source_pos)¶ Raised by a Parser, if no production rule can be applied.
-
getsourcepos
()¶ Returns the position in the source, at which this error occurred.
-
-
class
rply.
Token
(name, value, source_pos=None)¶ Represents a syntactically relevant piece of text.
Parameters: - name – A string describing the kind of text represented.
- value – The actual text represented.
- source_pos – A
SourcePosition
object representing the position of the first character in the source from which this token was generated.
-
getsourcepos
()¶ Returns a
SourcePosition
instance, describing the position of this token’s first character in the source.
-
getstr
()¶ Returns the string represented by this token.
-
gettokentype
()¶ Returns the type or name of the token.