Skip to content

How can I add to a dataframe count values of another?

I have a problem that I would like to solve with a dataframe. The index of this table represents a cluster. I have a dataframe called “representative points” that has this structure:

    lon   lat
0    76    3
1    45    1
2    32    4

On the other hand I have a dataset containing a point with the cluster it belongs to. In this case the index does not mean anything important.

  lon  lat  cluster
0 32   13   1
1 45   13   2
2 13   13   3

the case is that I would like to add to the dataframe “representative points” a column with the number of points that belong to the cluster. Any idea how I can do it?



I think that you may need something like

import pandas as pd

a = pd.DataFrame({"lat": [76, 45, 32], "lon": [12, 34, 56]})
b = pd.DataFrame({"lat": [32, 45, 13], "lon": [13, 13, 13], "cluster": [1, 2, 3]})

a["cluster"] = a.index
grouped = b.groupby("cluster").size().reset_index(name='counts')
res = a.merge(grouped, on="cluster", how="outer")

how=’outer’ means that you keep both indices from a with no counts as well as clusters from b with no corresponding index in a. If you need something else, you may need “left”, “right” or “inner”.

User contributions licensed under: CC BY-SA
2 People found this is helpful