Masks¶
In using Rasterio, you’ll encounter two different kinds of masks. One is the the valid data mask from GDAL, an unsigned byte array with the same number of rows and columns as the dataset in which non-zero elements indicate that the corresponding data elements are valid. Other elements are invalid, or nodata elements. The other kind of mask is the mask in Numpy’s [masked arrays](http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html), which have the inverse sense: True values in a masked array’s mask indicate that the corresponding data elements are invalid. With care, you can safely navigate this divide.
Consider Rasterio’s RGB.byte.tif test dataset. It has 718 rows and 791 columns of pixels. Each pixel has 3 8-bit (uint8) channels or bands. It has a trapezoid of image data within a rectangular background of 0,0,0 value pixels.

Metadata in the dataset declares that values of 0 shall be interpreted as invalid data or nodata pixels. In, e.g., merging the image with adjacent scenes, we’d like to ignore the nodata pixels and have only valid image data in our final mosaic.
Let’s use the rio-insp command to look at the two kinds of masks and their inverse relationship in the context of RGB.byte.tif.
$ rio insp tests/data/RGB.byte.tif
Rasterio 0.19.0 Interactive Inspector (Python 2.7.9)
Type "src.meta", "src.read_band(1)", or "help(src)" for more information.
>>> src.shape
(718, 791)
>>> src.count
3
>>> src.dtypes
['uint8', 'uint8', 'uint8']
>>> src.nodatavals
[0.0, 0.0, 0.0]
Reading dataset masks¶
For every band of a dataset there is a mask. These masks can be had as arrays
using the dataset’s read_masks()` method. Below, msk
is the valid data
mask corresponding to the first dataset band.
>>> msk = src.read_masks(1)
>>> msk
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
This array is a valid data mask in the sense of GDAL RFC 15. The 0 values in its
corners represent nodata regions. Zooming in on the interior of the mask
array shows the 255
values that indicate valid data regions.
>>> m[200:250,200:250]
array([[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]], dtype=uint8)
Displayed using Matplotlib’s imshow(), the mask looks like this:

Wait, what are these 0 values in the mask interior? This is an example of
a problem inherent in 8-bit raster data: lack of dynamic range. The dataset
creator has said that 0 values represent missing data (see the
nodatavals
property in the first code block of this document), but some of
the valid data have values so low they’ve been rounded during processing to
zero. This can very easily happen in scaling 16-bit data to 8 bits. There’s
no magic nodata value bullet for this. Using 16 bits per band helps, but you
really have to be careful with 8-bit per band datasets and their nodata values.
Writing masks¶
Writing a mask that applies to all dataset bands is just as straightforward:
pass an ndarray with True
(or values that evaluate to True
to indicate
valid data and False
to indicate no data to write_mask()
. Consider a
copy of the test data opened using rio-insp in “r+” (update) mode.
$ rio insp copy.tif --mode r+
Rasterio 0.19.0 Interactive Inspector (Python 2.7.9)
Type "src.meta", "src.read_band(1)", or "help(src)" for more information.
>>>
To mark that all pixels of all bands are valid (i.e., to override nodata metadata values that can’t be unset), you’d do this.
No data have been altered, nor have the dataset’s nodata values been changed. A new band has been added to the dataset to store the valid data mask. By default it is saved to a “sidecar” GeoTIFF alongside the dataset file. When such a .msk GeoTIFF exists, Rasterio will ignore the nodata metadata values and return mask arrays based on the .msk file.
$ ls -l copy.tif*
-rw-r--r--@ 1 sean staff 1713704 Mar 24 14:19 copy.tif
-rw-r--r-- 1 sean staff 916 Mar 24 14:25 copy.tif.msk
Can Rasterio help fix buggy nodata masks like the ones in RGB.byte.tif? It certainly can. Consider a fresh copy of that file. This time we’ll read all 3 band masks (based on the nodata values, not a .msk GeoTIFF) and show them as an RGB image (with the help of numpy.dstack()):
$rio insp copy.tif --mode r+
Rasterio 0.19.0 Interactive Inspector (Python 2.7.9)
Type "src.meta", "src.read_band(1)", or "help(src)" for more information.
>>> msk = src.read_masks()
>>> show(np.dstack(msk))

Colored regions appear where valid data pixels don’t quite coincide. This is, again, an artifact of scaling data down to 8 bits per band. We’ll begin by constructing a new mask array from the logical conjunction of the three band masks we’ve read.
>>> new_msk = (msk[0] & msk[1] & msk[2])
>>> show(new_msk)

Now we’ll use sieve() to shake out the small buggy regions of the mask. I’ve
found the right value for the size
argument empirically.
>>> from rasterio.features import sieve
>>> sieved_msk = sieve(new_msk, size=800)
>>> show(sieved_msk)

Last thing to do is write that sieved mask back to the dataset.
>>> src.write_mask(sieved_msk)
The result is a properly masked dataset that allows some 0 value pixels to be considered valid.
Numpy masked arrays¶
If you want, you can read dataset bands as numpy masked arrays.
$ rio insp tests/data/RGB.byte.tif
Rasterio 0.19.0 Interactive Inspector (Python 2.7.9)
Type "src.meta", "src.read_band(1)", or "help(src)" for more information.
>>> blue = src.read(1, masked=True)
>>> blue.mask
array([[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True]], dtype=bool)
As mentioned earlier, this mask is the inverse of the GDAL band mask. To get a mask conforming to GDAL RFC 15, simply do this:
>>> msk = (~blue.mask * 255).astype('uint8')
You can rely on this Rasterio identity for any integer value N
.
>>> (~src.read(N, masked=True).mask * 255 == src.read_masks(N)).all()
True