Datasets and ndarrays

Dataset objects provide read, read-write, and write access to raster data files and are obtained by calling rasterio.open(). That function mimics Python’s built-in open() and the dataset objects it returns mimic Python file objects.

>>> import rasterio
>>> dataset = rasterio.open('tests/data/RGB.byte.tif')
>>> dataset
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
>>> dataset.name
'tests/data/RGB.byte.tif'
>>> dataset.mode
r
>>> dataset.closed
False

If you attempt to access a nonexistent path, rasterio.open() does the same thing as open(), raising an exception immediately.

>>> open('/lol/wut.tif')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '/lol/wut.tif'
>>> rasterio.open('/lol/wut.tif')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: no such file or directory: '/lol/wut.tif'

Attributes

In addition to the file-like attributes shown above, a dataset has a number of other read-only attributes that help explain its role in spatial information systems. The driver attribute gives you the name of the GDAL format driver used. The height and width are the number of rows and columns of the raster dataset and shape is a height, width tuple as used by Numpy. The count attribute tells you the number of bands in the dataset.

>>> dataset.driver
u'GTiff'
>>> dataset.height, dataset.width
(718, 791)
>>> dataset.shape
(718, 791)
>>> dataset.count
3

What makes geospatial raster datasets different from other raster files is that their pixels map to regions of the Earth. A dataset has a coordinate reference system and an affine transformation matrix that maps pixel coordinates to coordinates in that reference system.

>>> dataset.crs
{u'units': u'm', u'no_defs': True, u'ellps': u'WGS84', u'proj': u'utm', u'zone': 18}
>>> dataset.affine
Affine(300.0379266750948, 0.0, 101985.0,
       0.0, -300.041782729805, 2826915.0)

To get the x, y world coordinates for the upper left corner of any pixel, take the product of the affine transformation matrix and the tuple (col, row).

>>> col, row = 0, 0
>>> src.affine * (col, row)
(101985.0, 2826915.0)
>>> col, row = src.width, src.height
>>> src.affine * (col, row)
(339315.0, 2611485.0)

Reading data

Datasets generally have one or more bands (or layers). Following the GDAL convention, these are indexed starting with the number 1. The first band of a file can be read like this:

>>> dataset.read_band(1)
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

The returned object is a 2-dimensional Numpy ndarray. The representation of that array at the Python prompt is just a summary; the GeoTIFF file that Rasterio uses for testing has 0 values in the corners, but has nonzero values elsewhere.

http://farm6.staticflickr.com/5032/13938576006_b99b23271b_o_d.png

The indexes, Numpy data types, and nodata values of all a dataset’s bands can be had from its indexes, dtypes, and nodatavals attributes.

>>> for i, dtype, ndval in zip(src.indexes, src.dtypes, src.nodatavals):
...     print i, dtype, nodataval
...
1 <type 'numpy.uint8'> 0.0
2 <type 'numpy.uint8'> 0.0
3 <type 'numpy.uint8'> 0.0

To close a dataset, call its close() method.

>>> dataset.close()
>>> dataset
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>

After it’s closed, data can no longer be read.

>>> dataset.read_band(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: can't read closed raster file

This is the same behavior as Python’s file.

>>> f = open('README.rst')
>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file

As Python file objects can, Rasterio datasets can manage the entry into and exit from runtime contexts created using a with statement. This ensures that files are closed no matter what exceptions may be raised within the the block.

>>> with rasterio.open('tests/data/RGB.byte.tif', 'r') as one:
...     with rasterio.open('tests/data/RGB.byte.tif', 'r') as two:
            print two
... print one
... print two
>>> print one
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<open RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>
<closed RasterReader name='tests/data/RGB.byte.tif' mode='r'>

Writing data

Opening a file in writing mode is a little more complicated than opening a text file in Python. The dimensions of the raster dataset, the data types, and the specific format must be specified.

>>> with rasterio.oepn

Writing data mostly works as with a Python file. There are a few format- specific differences. TODO: details.