I tried to run a spatial join between a list of assets and a river basin dataset that you can find at the link below https://datasets.wri.org/dataset/aqueduct-global-flood-risk-maps?msclkid=630fc948b63611ec9931936b22cf4990
The first approach was a join on an ESPG 4326 projection setting and it works fine.
JavaScript
x
17
17
1
rfd = r"C:Users~aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
2
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
3
4
test = ['Unit 1', 'Unit 2' ]
5
test_lat = ['0.176095', '-24.193790']
6
test_lon = ['117.495523', '150.370650']
7
8
df = pd.DataFrame()
9
df['Name'] = test
10
df['Latitude'] = test_lat
11
df['Longitude'] = test_lon
12
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
13
gdf = gdf.set_crs('epsg:4326')
14
15
joined = gpd.sjoin(gdf, wri_rfr, how='inner')
16
len(joined )
17
The two assets have both a join.
In a second approach, I try to create a 500 mt buffer around my assets using a meter-based projection system (3006) and proceed to merge them…but it returns no result?
JavaScript
1
24
24
1
rfd = r"C:Users~aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
2
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
3
4
test = ['Unit 1', 'Unit 2' ]
5
test_lat = ['0.176095', '-24.193790']
6
test_lon = ['117.495523', '150.370650']
7
8
df = pd.DataFrame()
9
df['Name'] = test
10
df['Latitude'] = test_lat
11
df['Longitude'] = test_lon
12
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
13
gdf = gdf.set_crs('epsg:4326')
14
15
gdf = gdf.to_crs({'init': 'epsg:3006'})
16
gdf.geometry = gdf.geometry.buffer(500)
17
gdf = gdf.loc[gdf.is_valid]
18
wri_rfr_3006 = wri_rfr.to_crs({'init': 'epsg:3006'})
19
wri_rfr_3006 = wri_rfr_3006.loc[wri_rfr_3006.is_valid]
20
21
22
joined = gpd.sjoin(gdf, wri_rfr_3006 , how='inner')
23
len(joined )
24
it returns no joins.
What am I missing here? Why would be the results different?
Advertisement
Answer
- have coded up data sourcing of shape files
- take a look at documentation https://epsg.io/3006 this is for Sweden. Hence locations in Borneo and Australia are going to start to give rounding errors when expressed in meters from Sweden
- have taken approach of work out UTM CRS of each point, buffer it, then convert back to epsg:4386
- with buffered point geometry can now spatial join as an inappropriate CRS for global geometry has not been used
JavaScript
1
25
25
1
test = ["Unit 1", "Unit 2"]
2
test_lat = ["0.176095", "-24.193790"]
3
test_lon = ["117.495523", "150.370650"]
4
5
df = pd.DataFrame()
6
df["Name"] = test
7
df["Latitude"] = test_lat
8
df["Longitude"] = test_lon
9
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["Longitude"], df["Latitude"]))
10
gdf = gdf.set_crs("epsg:4326")
11
12
# work out UTM CRS for each point, then buffer it and return back as original CRS
13
def buffer_meter(g, crs="epsg:6666", buffer=50):
14
t = gpd.GeoDataFrame(geometry=[g], crs=crs)
15
return t.to_crs(t.estimate_utm_crs()).buffer(buffer).to_crs(crs).values[0]
16
17
18
# buffer the points
19
gdf["geometry"] = gdf["geometry"].apply(buffer_meter, crs=gdf.crs, buffer=500)
20
21
# now join
22
gpd.sjoin(gdf, wri_rfr, how='inner')
23
24
25
data sourcing
JavaScript
1
27
27
1
import requests
2
from pathlib import Path
3
from zipfile import ZipFile
4
import urllib
5
import geopandas as gpd
6
import pandas as pd
7
8
# download data sets
9
urls = [
10
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/498319f7-992a-4447-94b4-c62d8f1daa38/download/aqueductglobalfloodriskdatabycountry20150304.zip",
11
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/471ef133-939c-4ca6-9b1c-5f81b5251c2b/download/aqueductglobalfloodriskdatabyriverbasin20150304.zip",
12
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/dd90c26a-edf2-46e4-be22-4273ab2344d0/download/aqueductglobalfloodriskdatabystate20150304.zip",
13
]
14
dfs = {}
15
for url in urls:
16
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
17
if not f.exists():
18
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
19
with open(f, "wb") as fd:
20
for chunk in r.iter_content(chunk_size=128):
21
fd.write(chunk)
22
zfile = ZipFile(f)
23
zfile.extractall(f.stem)
24
dfs[f.stem] = gpd.read_file(list(f.parent.joinpath(f.stem).glob("*.shp"))[0])
25
26
wri_rfr = dfs["aqueductglobalfloodriskdatabyriverbasin20150304"]
27