Geopandas: different .sjoin() results with different projections systems

Question

I tried to run a spatial join between a list of assets and a river basin dataset that you can find at the link below https://datasets.wri.org/dataset/aqueduct-global-flood-risk-maps?msclkid=630fc948b63611ec9931936b22cf4990 The first approach was a join on an ESPG 4326 projection setting and it works fine. The…

Accepted Answer

have coded up data sourcing of shape filestake a look at documentation https://epsg.io/3006 this is for Sweden.  Hence locations in Borneo and Australia are going to start to give rounding errors when expressed in meters from Swedenhave taken approach of work out UTM CRS of each point, buffer it, then convert back to epsg:4386with buffered point geometry can now spatial join as an inappropriate CRS for global geometry has not been usedtest = ["Unit 1", "Unit 2"]test_lat = ["0.176095", "-24.193790"]test_lon = ["117.495523", "150.370650"]df = pd.DataFrame()df["Name"] = testdf["Latitude"] = test_latdf["Longitude"] = test_longdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["Longitude"], df["Latitude"]))gdf = gdf.set_crs("epsg:4326")# work out UTM CRS for each point, then buffer it and return back as original CRSdef buffer_meter(g, crs="epsg:6666", buffer=50):    t = gpd.GeoDataFrame(geometry=[g], crs=crs)    return t.to_crs(t.estimate_utm_crs()).buffer(buffer).to_crs(crs).values[0]# buffer the pointsgdf["geometry"] = gdf["geometry"].apply(buffer_meter, crs=gdf.crs, buffer=500)# now joingpd.sjoin(gdf, wri_rfr, how='inner')data sourcingimport requestsfrom pathlib import Pathfrom zipfile import ZipFileimport urllibimport geopandas as gpdimport pandas as pd# download data setsurls = [    "http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/498319f7-992a-4447-94b4-c62d8f1daa38/download/aqueductglobalfloodriskdatabycountry20150304.zip",    "http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/471ef133-939c-4ca6-9b1c-5f81b5251c2b/download/aqueductglobalfloodriskdatabyriverbasin20150304.zip",    "http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/dd90c26a-edf2-46e4-be22-4273ab2344d0/download/aqueductglobalfloodriskdatabystate20150304.zip",]dfs = {}for url in urls:    f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])    if not f.exists():        r = requests.get(url, stream=True, headers={"User-Agent": "XY"})        with open(f, "wb") as fd:            for chunk in r.iter_content(chunk_size=128):                fd.write(chunk)        zfile = ZipFile(f)        zfile.extractall(f.stem)    dfs[f.stem] = gpd.read_file(list(f.parent.joinpath(f.stem).glob("*.shp"))[0])wri_rfr = dfs["aqueductglobalfloodriskdatabyriverbasin20150304"]

Advertisement

Answer

data sourcing