XML sources (and sinks) can be used with pipelines. While XML processing is too general for providing useful general-purpose filters and sinks (ad-hoc stages need to be written), the library provides, though, a xml.etree.ElementTree-based source stage. This page shows this stage, as well as an example of an ad-hoc filter stage.
The source stage dagpype.parse_xml() parses XML files, and emits a stream of xml.etree.ElementTree elements with the events causing their emission:
def parse_xml(stream, events = ('end',)):
"""
Parses XML. Yields a sequence of (event, elem) pairs, where event
is the event for yielding the element (e.g., 'end' for tag end),
and elem is a xml.etree.ElementTree element whose tag and text can be
obtained through elem.tag and elem.text, respectively.
Arguments:
stream -- Either a stream, e.g., as returned by open(), or a name of a file.
Keyword Arguments:
events -- Tuple of xml.etree.ElementTree events (default ('end',))
"""
Filters can be used to process this stream.
Consider the meteo.xml XML file.
Using dagpype.parse_xml() and dagpype.to_list(), here’s a list of all the events and elements:
>>> parse_xml('meteo.xml') | to_list()
[('end', <Element 'day' at 0x20b3ea0>), ('end', <Element 'wind' at 0x20b3f30>),
('end', <Element 'hail' at 0x20b3f90>), ('end', <Element 'rain' at 0x20c9030>),
('end', <Element 'day' at 0x20c9060>), ('end', <Element 'wind' at 0x20c9090>),
('end', <Element 'hail' at 0x20c90c0>), ('end', <Element 'rain' at 0x20c90f0>),
('end', <Element 'day' at 0x20c9120>), ('end', <Element 'wind' at 0x20c9150>),
('end', <Element 'hail' at 0x20c9180>), ('end', <Element 'rain' at 0x20c91b0>),
('end', <Element 'day' at 0x20c91e0>), ('end', <Element 'wind' at 0x20c9210>),
('end', <Element 'hail' at 0x20c9240>), ('end', <Element 'rain' at 0x20c9270>),
('end', <Element 'day' at 0x20c92a0>), ('end', <Element 'wind' at 0x20c92d0>),
('end', <Element 'hail' at 0x20c9300>), ('end', <Element 'rain' at 0x20c9330>),
('end', <Element 'stuff' at 0x20b3ed0>)]
To find the average ‘rain’ values, for example, we can do the following:
>>> parse_xml('meteo.xml') | \
... filter(lambda event, elem : float(elem.text), pre = lambda event, elem: elem.tag == 'rain') | \
... ave()