Skip to content
Advertisement

Dealing with masked coordinate arrays in pcolormesh

I’m working on visualizing some climate model output. The computation is done on a projected latitude/longitude grid. Since the model is simulating sea ice, all land grid cells are masked. The standard tools for plotting geographical information in Python are Basemap and Cartopy, both of which use matplotlib routines. In particular, pcolormesh is the obvious choice for plotting. If there was no land mask, it would be simple:

X = longitude
Y = latitude
C = variable

fig, ax = plt.subplots()
plt.pcolormesh(X,Y,C)

Although C is allowed to be a masked array, pcolormesh can’t handle masked arrays on X and Y. So how can I get around this?

For a simple example, take:

n = 100
X,Y = np.meshgrid(np.linspace(1,5,n),np.linspace(1,5,n))
C = np.sin(X*Y)
fig, ax = plt.subplots()
plt.pcolormesh(X,Y,C)

color plot of C

Now imagine we have a mask:

X[50:60,:] = np.nan
X[:,50:60] = np.nan
Y[50:60,:] = np.nan
Y[:,50:60] = np.nan
C[50:60,:] = np.nan
C[:,50:60] = np.nan

The first idea I had to get around this was to just select the valid entries and reshape X, Y, and C:

M = np.isnan(X)
X_valid = X[~M]
Y_valid = Y[~M]
C_valid = C[~M]
X_valid.shape = (81,100)
Y_valid.shape = (81,100)
C_valid.shape = (81,100)
plt.pcolormesh(X_valid, Y_valid, C_valid)

color sin plot, screwy Like many naive approaches, this doesn’t work.

Ideally the resulting plot would just be blank where the mask was. How can this be done?

Advertisement

Answer

I see two issues with your “naive” approaches.

Firstly, you generally shouldn’t set the coordinate arrays X and Y to nan, only the value of the function to plot. Most plotting functions (both matplotlib and others) automatically treat these as missing values, drawing blanks in their stead (setting the coordinates to nan, on the other hand, might interfere with internal routines involving interpolation and whatnot).

However, this still won’t work with pcolor(mesh). But that’s OK, since I also disagree with your statement that it “is the obvious choice for plotting”. In my opinion, pcolor(mesh) is mostly suitable for plotting matrices. For non-trivial plots like yours, something like plt.contourf should work wonders. It also inherently includes interpolation, making your plot prettier. It also treats nan data points as we’d expect it:

n = 100
X, Y = np.meshgrid(np.linspace(1, 5, n), np.linspace(1, 5, n))
C = np.sin(X*Y)
C[50:60,:] = np.nan
C[:,50:60] = np.nan

fig, ax = plt.subplots()
n_levels = 100  # number of contour levels to plot
ax.contourf(X, Y, C, n_levels)

Results before (left) and after (right) of masking:

before after

Note that contourf stands for “filled contour plot”, and works by computing level curves of your input data set. This means that in order to obtain a smooth and pretty plot, you need dense contour lines, this is why I chose 100 lines to plot. For your specific case, you should consider explicitly defining the level values using the levels keyword argument.


Update:

In comments you clarified that your data set is a given, so you have to treat the missing values in X and Y too. This is hard, since your input mesh has holes in it, and you can only make up for this if you have a very precise idea of how the problem looks like.

In your example, there are full regions missing from the coordinates along each dimension. This is the best scenario, since the remaining data points could’ve been generated by a meshgrid call, only with smaller coordinate vectors along each dimension.

In this very simple case, an easy remedy is what you yourself tried: throw away the nan values. You almost had it right, but if you take an array of shape (100,100) and cut out 10-10 from each dimension, you end up with an array of shape (90,90) rather than (81,100). This is why your figure looked so jumpy. If you do it with a proper shape, the result is much better:

n = 100
X, Y = np.meshgrid(np.linspace(1, 5, n), np.linspace(1, 5, n))
C = np.sin(X*Y)
X[50:60, :] = np.nan
X[:, 50:60] = np.nan
Y[50:60, :] = np.nan
Y[:, 50:60] = np.nan
C[50:60, :] = np.nan
C[:, 50:60] = np.nan

endshape = (90, 90)  # needs to be known a priori!

inds = np.logical_not(np.isnan(X) | np.isnan(Y) | np.isnan(C))
X_plot = np.reshape(X[inds], endshape)
Y_plot = np.reshape(Y[inds], endshape)
C_plot = np.reshape(C[inds], endshape)

fig, ax = plt.subplots()
n_levels = 100  # number of contour levels to plot
ax.contourf(X_plot, Y_plot, C_plot, n_levels)

result from naive method

The result is clearly off near the missing data: interpolation performed by contourf (or pcolormesh if you use that) will try to fill in the gaps, distorting your data. You might consider manually plotting a white patch over the missing data points, but still then you’ll get some distortion along the edges. And note that we had to know ourselves how the missing points were distributed.

For a more fool-proof and general solution, I’d try to guess the underlying mesh. By this I mean that you should take every unique value that occurs in X and Y, and reconstruct your function over this full mesh. This is based on the much weaker assumption that the original data was located on a rectangular grid, but no other assumptions are needed. This will not be helpful in your particular case when full bands are missing from the data, but they will help if you have patches of nans in the data. So I’m giving a solution for this case as well.

Here’s an implementation using scipy.interpolate.griddata to reconstruct the mesh (using interpolation might be overkill, especially since we’re throwing away part of the result, but the other option is to loop over the full dataset, and I don’t feel like doing that):

import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as interp

n = 100
X, Y = np.meshgrid(np.linspace(1, 5, n), np.linspace(1, 5, n))
C = np.sin(X*Y)

# poke a hole into the data
X[40:60, 40:60] = np.nan
Y[40:60, 40:60] = np.nan
C[40:60, 40:60] = np.nan

# indices where nobody is nan
inds = np.logical_not(np.isnan(X) | np.isnan(Y) | np.isnan(C))
X_notnan = X[inds]
Y_notnan = Y[inds]
C_notnan = C[inds]

# construct new mesh
X_vals = np.unique(X[inds])
Y_vals = np.unique(Y[inds])
X_plot, Y_plot = np.meshgrid(X_vals, Y_vals)

# use nearest-neighbour interpolation to match the two meshes
C_plot = interp.griddata(np.array([X_notnan, Y_notnan]).T, C_notnan,
                         (X_plot, Y_plot), method='nearest')
# fill in the nans in C
C_plot[np.logical_not(inds)] = np.nan

fig, ax = plt.subplots()
n_levels = 100  # number of contour levels to plot
ax.contourf(X_plot, Y_plot, C_plot, n_levels)

This solution will break if the reduced meshes without nans have smaller dimensions than the originals, i.e. if there are full rows or columns of nans in the data. However, if this is not the case, then it will give you a pretty result like this:

final result

This also means that if you guess the X and Y values in your original problem along a single line, for instance by knowing that both meshes are equidistant, then you can fix the first row of X and the first column of Y, and use the latest code above: it should generate the full mesh for you, producing a result similar to the first figure in this post.

Advertisement