Skip to content
Advertisement

Extract values from xarray dataset using geopandas multilinestring

I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.

I’m unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?

Advertisement

Answer

Breaking the lines into points and then extracting the point is quite straightforward actually!

import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr

# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)

da = xr.DataArray(
    data=np.random.rand(y.size, x.size),
    coords={"y": y, "x": x},
    dims=["y", "x"],
)

# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
    geometry=[
        sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
        sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
    ]
)

# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])

# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
    y        (point) float64 3.0 13.0
    x        (point) float64 3.0 13.0
  * point    (point) int64 0 1

The main thing is to decide on which point you’d like to sample, or how many points, etc.

Note that the geometry objects in the geodataframe also have an interpolation method, if you’d like draw values at specific points along the trajectory:

https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate

In such a case, .apply can come in handy:

gdf.geometry.apply(lambda geom: geom.interpolate(3.0))

0      POINT (2.12132 2.12132)
1    POINT (12.12132 12.12132)
Name: geometry, dtype: geometry
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement