Skip to content
Advertisement

Calculate mean value for each pixel of a sum of Xarray DataArrays

I am trying to calculate a fog frequency map based on a number of geoTIFFs that I have read as Xarray DataArrays using the rioxarray.open_rasterio function in Python 3.10. Each “pixel” can have one of the following values: 1 = fog, 0 = no fog, -9999 = no data. The end goal is to calculate a new DataArray that contains the ratio of the number “fog” pixels to the number of pixel with either “fog” or “no fog”.

For this I want to write a for-loop that creates the sum of “fog” and “no_fog” entries per pixel while excluding the “no data” pixels. Then it should divide the pixel values of the sum DataArray by the number of pixels that were used in the calculation of each individual sum. So, if for a single pixel there are the following values: 0, 1, 1, -9999, 0, and -9999 the loop should create a sum of 2 and divide it by 4, creating a fog frequency of 0.5 or 50%.

So far, I have only been able to calculate the sum of all input DataArrays, without excluding the “no data” pixels using this code:

JavaScript

I tried to exclude the “no data” values using fog_map = fog_map.where(fog_map >= 0), but this left me with a fog_sum geoTIFF, where each pixel had the value 1.79769e+308

This is an example of what the output of a fog_map_YYYYMMDD.tif DataArray looks like, before applying the fog_map.where(fog_map >= 0) function:

JavaScript

And after applying the function:

JavaScript

Any help is greatly appreciated!

Advertisement

Answer

If your data is this small it’s probably fastest and easiest to concat all the data, drop the -9999 values using the da.where() code you suggest, then just take the mean over the concatenated dimension:

JavaScript

Your data is already float (though maybe it shouldn’t be?) so replacing -9999 with np.nan (a float) isn’t hurting you.

But if you’re facing memory constraints and want to accumulate the stats iteratively I’d use xarray.where(da >= 0, 1, 0) to accumulate your denominator:

JavaScript

xr.where is like xr.DataArray.where but you define the value returned if true explicitly.

As an aside, the value you were seeing, 1.797e308, is the largest value that can be held by a float64. So you’re probably just looking at the GeoTiff encoding of np.inf after a divide by zero problem. Make sure to mask the data where your denominator is zero and handle it the way you’re intending for locations where there is never any valid data.

Advertisement