I have a problem that I would like to solve with a dataframe. The index of this table represents a cluster. I have a dataframe called “representative points” that has this structure:
lon lat 0 76 3 1 45 1 2 32 4
On the other hand I have a dataset containing a point with the cluster it belongs to. In this case the index does not mean anything important.
lon lat cluster 0 32 13 1 1 45 13 2 2 13 13 3
the case is that I would like to add to the dataframe “representative points” a column with the number of points that belong to the cluster. Any idea how I can do it?
Advertisement
Answer
I think that you may need something like
import pandas as pd a = pd.DataFrame({"lat": [76, 45, 32], "lon": [12, 34, 56]}) b = pd.DataFrame({"lat": [32, 45, 13], "lon": [13, 13, 13], "cluster": [1, 2, 3]}) a["cluster"] = a.index grouped = b.groupby("cluster").size().reset_index(name='counts') res = a.merge(grouped, on="cluster", how="outer")
how=’outer’ means that you keep both indices from a with no counts as well as clusters from b with no corresponding index in a. If you need something else, you may need “left”, “right” or “inner”.