Small, Fast S-Expression Library
|
API for a small, fast and portable s-expression parser library. More...
#include <stddef.h>
#include <stdio.h>
#include "faststack.h"
#include "cstring.h"
#include "sexp_memory.h"
#include "sexp_errors.h"
#include "sexp_ops.h"
Go to the source code of this file.
API for a small, fast and portable s-expression parser library.
typedef struct parser_event_handlers parser_event_handlers_t |
Some users would prefer to, instead of parsing a full string and walking a potentially huge sexp_t structure, use an XML SAX-style parser where events are triggered as certain parts of the s-expression are encountered. This structure contains a set of function pointers that are called by the parser as it hits expression start and end, and completes reading atoms and binary data. NOTE: The parser_event_handler struct that is a field in the continuation data structure is NOT freed by destroy_continuation since structs for callbacks are ALWAYS malloc'd by the user, not the library.
A continuation is used by the parser to save and restore state between invocations to support partial parsing of strings. For example, if we pass the string "(foo bar)(goo car)" to the parser, we want to be able to retrieve each s-expression one at a time - it would be difficult to return all s-expressions at once without knowing how many there are in advance (this would require more memory management than we want...). So, by using a continuation-based parser, we can call it with this string and have it return a continuation when it has parsed the first s-expression. Once we have processed the s-expression (accessible through the last_sexpr field of the continuation), we can call the parser again with the same string and continuation, and it will be able to pick up where it left off.
We use continuations instead of a state-ful parser to allow multiple concurrent strings to be parsed by simply maintaining a set of continuations. Manipulating continuations by hand is required if the continuation-based parser is called directly. This is not recommended unless you are willing to deal with potential errors and are willing to learn exactly how the continuation relates to the internals of the parser. A simpler approach is to use either the parse_sexp function that simply returns an s-expression without exposing the continuations, or the iparse_sexp function that allows iteratively popping one s-expression at a time from a string containing one or more s-expressions. Refer to the documentation for each parsing function for further details on behavior and usage.
An s-expression is represented as a linked structure of elements, where each element is either an atom or list. An atom corresponds to a string, while a list corresponds to an s-expression. The following grammar represents our definition of an s-expression:
sexpr ::= ( sx ) sx ::= atom sxtail | sexpr sxtail | 'sexpr sxtail | 'atom sxtail | NULL sxtail ::= sx | NULL atom ::= quoted | value quoted ::= "ws_string" value ::= nws_string
An atom can either be a quoted string, which is a string containing whitespace (possibly) surrounded by double quotes, or a non-whitespace string that does not require surrounding quotes. An element representing an atom will have a type of value and data stored in the val field. An element of type list represents an s-expression corresponding to sexpr in the grammar, and will have a pointer to the head of the appropriate s-expression. Details regarding these fields and their values given with the fields themselves. Notice that a single quote can appear directly before an s-expression or atom, similar to the use in LISP.
enum atom_t |
For an element that represents a value, the value can be interpreted as a more specific type. A basic value is a simple string with no whitespace (and therefore no quotes required). A double quote value, or dquote, is one that contains characters (such as whitespace) that requires quotation marks to contain the string. A single quote value, or squote, represents an element that is prefaced with a single tick-mark. This can be either an atom or s-expression, and the result is that the parser does not attempt to parse the element following the tick mark. It is simply stored as text. This is similar to the meaning of a tick mark in the Scheme or LISP family of programming languages. Finally, binary allows raw binary to be stored within an atom. Note that if the binary type is used, the data is stored in bindata with the length in binlength. Otherwise, the data us stored in the val field with val_used and val_allocated tracking the size of the value string and the total memory allocated for it.
enum elt_t |
An element in an s-expression can be one of three types: a value represents an atom with an associated text value. A list represents an s-expression, and the element contains a pointer to the head element of the associated s-expression.
enum parsermode_t |
parser mode flag used by continuation to toggle special parser behaviour.
void destroy_continuation | ( | pcont_t * | pc | ) |
destroy a continuation. This involves cleaning up what it contains, and cleaning up the continuation itself.
void destroy_sexp | ( | sexp_t * | s | ) |
given a sexp_t structure, free the memory it uses (and recursively free the memory used by all sexp_t structures that it references). Note that this will call the deallocation routine for sexp_t elements. This means that memory isn't freed, but stored away in a cache of pre-allocated elements. This is an optimization to speed up the parser to eliminate wasteful free and re-malloc calls. Note: If using inlined binary mode, this will free the data pointed to by the bindata field. So, if you care about the data after the lifetime of the s-expression, make sure to make a copy before cleaning up the sexpr.
pcont_t* init_continuation | ( | char * | str | ) |
create an initial continuation for parsing the given string
sexp_t* new_sexp_atom | ( | const char * | buf, |
size_t | bs, | ||
atom_t | aty | ||
) |
Allocate a new sexp_t element representing a value. The user must specify the precise type of the atom. This used to default to SEXP_BASIC, but this can lead to errors if the user did not expect this assumption. By explicitly passing in the atom type, the caller should ensure that the data in the buffer is valid given the requested atom type. For performance reasons, such checks are left to the caller if they are desired, and not performed in the library if they are not wanted.
sexp_t* new_sexp_binary_atom | ( | char * | data, |
size_t | binlength | ||
) |
Allocate a new sexp_t element representing a raw binary atom. This element will contain a pointer to the raw binary data provided, as well as the binary data length. The character atom fields will be NULL and the corresponding val length and allocation size will be set to zero since this element is carrying a binary pointer only.
sexp_t* new_sexp_list | ( | sexp_t * | l | ) |
Allocate a new sexp_t element representing a list.
void print_pcont | ( | pcont_t * | pc, |
char * | buf, | ||
size_t | buflen | ||
) |
print the contents of the parser continuation stack to a buffer. this is useful if an expression is partially parsed and the caller realizes that something is wrong with it. with this routine, the caller can reconstruct the expression parsed so far and use it for error reporting. this works with fixed size buffers allocated by the caller. there is not a CSTRING-based version currently.
int print_sexp | ( | char * | loc, |
size_t | size, | ||
const sexp_t * | e | ||
) |
print a sexp_t struct as a string in the LISP style. If the buffer is large enough and the conversion is successful, the return value represents the length of the string contained in the buffer. If the buffer was too small, or some other error occurred, the return value is -1 and the contents of the buffer should not be assumed to contain any useful information. When the return value is -1, the caller should check the contents of sexp_errno for details on what error may have occurred.
int print_sexp_cstr | ( | CSTRING ** | s, |
const sexp_t * | e, | ||
size_t | ss | ||
) |
print a sexp_t structure to a buffer, growing it as necessary instead of relying on fixed size buffers like print_sexp. Important argument to tune for performance reasons is ss
- the buffer start size. The growsize used by the CSTRING routines also should be considered for tuning via the sgrowsize() function. This routine no longer requires the user to specify the growsize, and uses the current setting without changing it.
void reset_sexp_errno | ( | ) |
reset the value of sexp_errno to SEXP_ERR_OK.
void sexp_cleanup | ( | void | ) |
In the event that someone wants us to release ALL of the memory used between calls by the library, they can free it. If you don't call this, the caches will be persistent for the lifetime of the library user. Note that in the event of an error condition resulting in sexp_errno being set, the user might consider calling this to clean up any memory that may be lingering around that should be cleaned up.
sexp_t* sexp_t_allocate | ( | void | ) |
return an allocated sexp_t. This structure may be an already allocated one from the stack or a new one if none are available. Use this instead of manually mallocing if you want to avoid excessive mallocs. Note: Mallocing your own expressions is fine - you can even use sexp_t_deallocate to deallocate them and put them in the pool. Also, if the stack has not been initialized yet, this does so.
void sexp_t_deallocate | ( | sexp_t * | s | ) |
given a malloc'd sexp_t element, put it back into the already-allocated element stack. This method will allocate a stack if one has not been allocated already.
Global value indicating the most recent error condition encountered. This value can be reset to SEXP_ERR_OK by calling sexp_errno_reset().