Skip to content
Advertisement

When do I need to use a GeoSeries when creating a GeoDataFrame, and when is a list enough?

import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon, Point
import numpy as np

I define a polygon:

polygon = Polygon([(0,0),(0,1),(1,1),(1,0)])

and create a list of random points:

np.random.seed(42)
points = [Point([np.random.uniform(low=-1,high=1),
                 np.random.uniform(low=-1,high=1)]) for _ in range(1000)]

I want to know which points are within the polygon. I create a GeoDataFrame with a column called points, by first converting the points list to GeoSeries:

gdf = gpd.GeoDataFrame(dict(points=gpd.GeoSeries(points)))

Then simply do:

gdf.points.within(polygon)

which returns a pandas.core.series.Series of booleans, indicating which points are within the polygon.

However, if I don’t create the GeoDataFrame from a list, not a GeoSeries object:

gdf = gpd.GeoDataFrame(dict(points=points))

and then do:

gdf.points.within(polygon)

I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-171-831eddc859a1> in <module>()
----> 1 gdf.points.within(polygon)

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'within'

In the examples given on the geopandas.GeoDataFrame page, a GeoDataFrame is create from a list, not a GeoSeries of shapely.geometry.Point objects:

from shapely.geometry import Point
d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")

When do I need to convert my lists to GeoSeries first, and when can I keep them as lists when creating GeoDataFrames?

Advertisement

Answer

On the docs for geopandas.GeoDataFrame, where you got your example, there’s a little note:

Notice that the inferred dtype of ‘geometry’ columns is geometry.

Which can be seen here, and you can observe it yourself:

>>> import geopandas as gpd

>>> gpd.GeoDataFrame({'geometry': [Point(0,0)]}).dtypes
geometry    geometry
dtype: object

>>> gpd.GeoDataFrame({'geometryXXX': [Point(0,0)]}).dtypes
geometryXXX    object
dtype: object

From the docs for geopandas.GeoSeries:

A Series object designed to store shapely geometry objects.

…so it makes sense that it would try to convert the objects it’s created with to the geometry dtype. In fact, when you try to create a GeoSeries with non-shapely objects, you’ll get a warning:

>>> gpd.GeoSeries([1,2,3])
<ipython-input-53-ca5248fcdaf8>:1: FutureWarning:     You are passing non-geometry data to the GeoSeries constructor. Currently,
    it falls back to returning a pandas Series. But in the future, we will start
    to raise a TypeError instead.
  gpd.GeoSeries([1,2,3])

…which, as the warning says, will become an error in the future.


Since you’re not creating a GeoSeries object (your using a list instead), and since the column is not called geometry, the GeoDataFrame makes its dtype be the most general it can convert the objects within to – object. Therefore, since the column is of dtype object and not geometry, you can’t call geometry-specific methods, such as within.

If you need to use a list, you’ve two simple choices.

Method 1. Pass the geometry= keyword argument to GeoDataFrame():

>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]}, geometry='points')
>>> gdf['points'].dtypes
<geopandas.array.GeometryDtype at 0x12882a1c0>
>>> gdf['points'].within
<bound method GeoPandasBase.within of 0    POINT (0.00000 0.00000)
1    POINT (0.00000 1.00000)
Name: points, dtype: geometry>

Method 2. Use astype like you’d do with a normal dataframe:

>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]})
>>> gdf['points'].dtype
dtype('O')
>>> gdf['points'].within
...
AttributeError: 'Series' object has no attribute 'within'

>>> gdf['points'] = gdf['points'].astype('geometry')
>>> gdf['points'].dtype
<geopandas.array.GeometryDtype at 0x122189e20>
>>> gdf['points'].within
<bound method GeoPandasBase.within of 0    POINT (0.00000 0.00000)
1    POINT (0.00000 1.00000)
Name: points, dtype: geometry>
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement