I am trying to calculate a fog frequency map based on a number of geoTIFFs that I have read as Xarray DataArrays using the rioxarray.open_rasterio
function in Python 3.10. Each “pixel” can have one of the following values: 1 = fog, 0 = no fog, -9999 = no data. The end goal is to calculate a new DataArray that contains the ratio of the number “fog” pixels to the number of pixel with either “fog” or “no fog”.
For this I want to write a for-loop that creates the sum of “fog” and “no_fog” entries per pixel while excluding the “no data” pixels. Then it should divide the pixel values of the sum DataArray by the number of pixels that were used in the calculation of each individual sum. So, if for a single pixel there are the following values: 0, 1, 1, -9999, 0, and -9999 the loop should create a sum of 2 and divide it by 4, creating a fog frequency of 0.5 or 50%.
So far, I have only been able to calculate the sum of all input DataArrays, without excluding the “no data” pixels using this code:
# open all fog maps and create a list: folder = "E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06/fog_frequency" list_of_maps = glob.glob(folder + '/fog_map*.tif', recursive=True) # all files that start with "fog_map" # make list with all different filenames (dates) in this folder: maps = [] # initialize empty list for all file names for i in range(0, np.size(list_of_maps)): # files naming convention "fog_map_YYYYMMDD_HHMMSS.tif": maps.append(list_of_maps[i].split('fog_map_')[1][0:8]) # find out how many dates are in the folder: maps = np.unique(maps) # remove duplicates from array print(maps) print('ndata from {} different dates in this foldern'.format(np.size(maps))) # create fog_sum xarray dataArray to have something to start out with and later subtract it again: fog_sum = rioxarray.open_rasterio("E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06/fog_frequency/fog_map_20210601.tif") fog_sum_subtract = rioxarray.open_rasterio("E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06/fog_frequency/fog_map_20210601.tif") # add all fog maps: for i in range(0, np.size(list_of_maps)): # open data sets: fog_map = rioxarray.open_rasterio(list_of_maps[i], engine='rasterio') # fog_map = fog_map.where(fog_map >= 0) fog_sum = fog_sum + fog_map # subtract original fog map and export as geoTIFF: fog_sum = fog_sum - fog_sum_subtract fog_sum.rio.to_raster("E:/Jasper/Studium/BA_Thesis/MODIS_data/MODIS_2021_data/2021_06/fog_frequency/fog_sum.tif", driver="GTiff")
I tried to exclude the “no data” values using fog_map = fog_map.where(fog_map >= 0)
, but this left me with a fog_sum geoTIFF, where each pixel had the value 1.79769e+308
This is an example of what the output of a fog_map_YYYYMMDD.tif
DataArray looks like, before applying the fog_map.where(fog_map >= 0)
function:
<xarray.DataArray (band: 1, y: 412, x: 388)> [159856 values with dtype=float32] Coordinates: * band (band) int32 1 * x (x) float64 -92.49 -92.49 -92.48 ... -89.02 -89.02 -89.01 * y (y) float64 2.0 1.991 1.982 1.973 ... -1.674 -1.683 -1.692 spatial_ref int32 0 Attributes: STATISTICS_MAXIMUM: 1 STATISTICS_MEAN: -858.62379891903 STATISTICS_MINIMUM: -9999 STATISTICS_STDDEV: 2801.4551987932 STATISTICS_VALID_PERCENT: 100 scale_factor: 1.0 add_offset: 0.0
And after applying the function:
<xarray.DataArray (band: 1, y: 412, x: 388)> array([[[ 0., 0., 0., ..., 0., 0., 0.], [ 0., 0., 0., ..., 0., nan, 0.], [ 0., 0., 0., ..., 0., 0., 0.], ..., [ 0., 0., 0., ..., nan, nan, nan], [ 0., 0., 0., ..., nan, nan, nan], [ 0., 0., 0., ..., nan, nan, nan]]], dtype=float32) Coordinates: * band (band) int32 1 * x (x) float64 -92.49 -92.49 -92.48 ... -89.02 -89.02 -89.01 * y (y) float64 2.0 1.991 1.982 1.973 ... -1.674 -1.683 -1.692 spatial_ref int32 0 Attributes: STATISTICS_MAXIMUM: 1 STATISTICS_MEAN: -858.62379891903 STATISTICS_MINIMUM: -9999 STATISTICS_STDDEV: 2801.4551987932 STATISTICS_VALID_PERCENT: 100 scale_factor: 1.0 add_offset: 0.0
Any help is greatly appreciated!
Advertisement
Answer
If your data is this small it’s probably fastest and easiest to concat all the data, drop the -9999 values using the da.where()
code you suggest, then just take the mean over the concatenated dimension:
fog_maps = [] for i in range(0, np.size(list_of_maps)): # open data sets: fog_map = rioxarray.open_rasterio(list_of_maps[i], engine='rasterio') fog_maps.append(fog_map) all_fog = xr.concat(fog_maps, dim="file") all_fog = all_fog.where(all_fog >= 0) mean_fog = all_fog.mean(dim="file")
Your data is already float (though maybe it shouldn’t be?) so replacing -9999 with np.nan (a float) isn’t hurting you.
But if you’re facing memory constraints and want to accumulate the stats iteratively I’d use xarray.where(da >= 0, 1, 0)
to accumulate your denominator:
# add all fog maps: data_count = 0 fog_sum = 0 for i in range(0, np.size(list_of_maps)): # open data sets: fog_map = rioxarray.open_rasterio(list_of_maps[i], engine='rasterio') data_count += xr.where(fog_map >= 0), 1, 0) fog_sum += fog_map.where(fog_map > 0, 0) fog_mean = (fog_sum / data_count).where(data_count > 0)
xr.where
is like xr.DataArray.where
but you define the value returned if true explicitly.
As an aside, the value you were seeing, 1.797e308
, is the largest value that can be held by a float64
. So you’re probably just looking at the GeoTiff encoding of np.inf
after a divide by zero problem. Make sure to mask the data where your denominator is zero and handle it the way you’re intending for locations where there is never any valid data.