Datasets and ndarrays¶
Dataset objects provide read, read-write, and write access to raster data files
and are obtained by calling rasterio.open()
. That function mimics Python’s
built-in open()
and the dataset objects it returns mimic Python file
objects.
>>> import rasterio
>>> dataset = rasterio.open('tests/data/RGB.byte.tif')
>>> dataset
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
>>> dataset.name
'tests/data/RGB.byte.tif'
>>> dataset.mode
r
>>> dataset.closed
False
If you attempt to access a nonexistent path, rasterio.open()
does the same
thing as open()
, raising an exception immediately.
>>> open('/lol/wut.tif')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '/lol/wut.tif'
>>> rasterio.open('/lol/wut.tif')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: no such file or directory: '/lol/wut.tif'
Attributes¶
In addition to the file-like attributes shown above, a dataset has a number
of other read-only attributes that help explain its role in spatial information
systems. The driver
attribute gives you the name of the GDAL format
driver used. The height
and width
are the number of rows and columns of
the raster dataset and shape
is a height, width
tuple as used by
Numpy. The count
attribute tells you the number of bands in the dataset.
>>> dataset.driver
u'GTiff'
>>> dataset.height, dataset.width
(718, 791)
>>> dataset.shape
(718, 791)
>>> dataset.count
3
What makes geospatial raster datasets different from other raster files is that their pixels map to regions of the Earth. A dataset has a coordinate reference system and an affine transformation matrix that maps pixel coordinates to coordinates in that reference system.
>>> dataset.crs
{u'units': u'm', u'no_defs': True, u'ellps': u'WGS84', u'proj': u'utm', u'zone': 18}
>>> dataset.affine
Affine(300.0379266750948, 0.0, 101985.0,
0.0, -300.041782729805, 2826915.0)
To get the x, y
world coordinates for the upper left corner of any pixel,
take the product of the affine transformation matrix and the tuple (col,
row)
.
>>> col, row = 0, 0
>>> src.affine * (col, row)
(101985.0, 2826915.0)
>>> col, row = src.width, src.height
>>> src.affine * (col, row)
(339315.0, 2611485.0)
Reading data¶
Datasets generally have one or more bands (or layers). Following the GDAL convention, these are indexed starting with the number 1. The first band of a file can be read like this:
>>> dataset.read_band(1)
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
The returned object is a 2-dimensional Numpy ndarray. The representation of that array at the Python prompt is just a summary; the GeoTIFF file that Rasterio uses for testing has 0 values in the corners, but has nonzero values elsewhere.

The indexes, Numpy data types, and nodata values of all a dataset’s bands can
be had from its indexes
, dtypes
, and nodatavals
attributes.
>>> for i, dtype, ndval in zip(src.indexes, src.dtypes, src.nodatavals):
... print i, dtype, nodataval
...
1 <type 'numpy.uint8'> 0.0
2 <type 'numpy.uint8'> 0.0
3 <type 'numpy.uint8'> 0.0
To close a dataset, call its close()
method.
>>> dataset.close()
>>> dataset
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>
After it’s closed, data can no longer be read.
>>> dataset.read_band(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: can't read closed raster file
This is the same behavior as Python’s file
.
>>> f = open('README.rst')
>>> f.close()
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file
As Python file
objects can, Rasterio datasets can manage the entry into
and exit from runtime contexts created using a with
statement. This
ensures that files are closed no matter what exceptions may be raised within
the the block.
>>> with rasterio.open('tests/data/RGB.byte.tif', 'r') as one:
... with rasterio.open('tests/data/RGB.byte.tif', 'r') as two:
print two
... print one
... print two
>>> print one
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>
Writing data¶
Opening a file in writing mode is a little more complicated than opening a text file in Python. The dimensions of the raster dataset, the data types, and the specific format must be specified.
>>> with rasterio.oepn
Writing data mostly works as with a Python file. There are a few format- specific differences. TODO: details.