import geopandas as gpd import matplotlib.pyplot as plt from shapely.geometry import Polygon, Point import numpy as np
I define a polygon:
polygon = Polygon([(0,0),(0,1),(1,1),(1,0)])
and create a list of random points:
np.random.seed(42) points = [Point([np.random.uniform(low=-1,high=1), np.random.uniform(low=-1,high=1)]) for _ in range(1000)]
I want to know which points are within the polygon. I create a GeoDataFrame
with a column called points
, by first converting the points
list
to GeoSeries
:
gdf = gpd.GeoDataFrame(dict(points=gpd.GeoSeries(points)))
Then simply do:
gdf.points.within(polygon)
which returns a pandas.core.series.Series
of booleans, indicating which points are within the polygon.
However, if I don’t create the GeoDataFrame from a list
, not a GeoSeries
object:
gdf = gpd.GeoDataFrame(dict(points=points))
and then do:
gdf.points.within(polygon)
I get:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-171-831eddc859a1> in <module>() ----> 1 gdf.points.within(polygon) /usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name) 5485 ): 5486 return self[name] -> 5487 return object.__getattribute__(self, name) 5488 5489 def __setattr__(self, name: str, value) -> None: AttributeError: 'Series' object has no attribute 'within'
In the examples given on the geopandas.GeoDataFrame page, a GeoDataFrame
is create from a list
, not a GeoSeries
of shapely.geometry.Point
objects:
from shapely.geometry import Point d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]} gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")
When do I need to convert my list
s to GeoSeries
first, and when can I keep them as list
s when creating GeoDataFrame
s?
Advertisement
Answer
On the docs for geopandas.GeoDataFrame
, where you got your example, there’s a little note:
Notice that the inferred dtype of ‘geometry’ columns is geometry.
Which can be seen here, and you can observe it yourself:
>>> import geopandas as gpd >>> gpd.GeoDataFrame({'geometry': [Point(0,0)]}).dtypes geometry geometry dtype: object >>> gpd.GeoDataFrame({'geometryXXX': [Point(0,0)]}).dtypes geometryXXX object dtype: object
From the docs for geopandas.GeoSeries
:
A Series object designed to store shapely geometry objects.
…so it makes sense that it would try to convert the objects it’s created with to the geometry
dtype. In fact, when you try to create a GeoSeries with non-shapely objects, you’ll get a warning:
>>> gpd.GeoSeries([1,2,3]) <ipython-input-53-ca5248fcdaf8>:1: FutureWarning: You are passing non-geometry data to the GeoSeries constructor. Currently, it falls back to returning a pandas Series. But in the future, we will start to raise a TypeError instead. gpd.GeoSeries([1,2,3])
…which, as the warning says, will become an error in the future.
Since you’re not creating a GeoSeries object (your using a list instead), and since the column is not called geometry
, the GeoDataFrame
makes its dtype be the most general it can convert the objects within to – object
. Therefore, since the column is of dtype object
and not geometry
, you can’t call geometry
-specific methods, such as within
.
If you need to use a list, you’ve two simple choices.
Method 1. Pass the geometry=
keyword argument to GeoDataFrame()
:
>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]}, geometry='points') >>> gdf['points'].dtypes <geopandas.array.GeometryDtype at 0x12882a1c0> >>> gdf['points'].within <bound method GeoPandasBase.within of 0 POINT (0.00000 0.00000) 1 POINT (0.00000 1.00000) Name: points, dtype: geometry>
Method 2. Use astype
like you’d do with a normal dataframe:
>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]}) >>> gdf['points'].dtype dtype('O') >>> gdf['points'].within ... AttributeError: 'Series' object has no attribute 'within' >>> gdf['points'] = gdf['points'].astype('geometry') >>> gdf['points'].dtype <geopandas.array.GeometryDtype at 0x122189e20> >>> gdf['points'].within <bound method GeoPandasBase.within of 0 POINT (0.00000 0.00000) 1 POINT (0.00000 1.00000) Name: points, dtype: geometry>