I have a problem that I would like to solve with a dataframe. The index of this table represents a cluster. I have a dataframe called “representative points” that has this structure:
JavaScript
x
7
1
lon lat
2
0 76 3
3
1 45 1
4
2 32 4
5
6
7
On the other hand I have a dataset containing a point with the cluster it belongs to. In this case the index does not mean anything important.
JavaScript
1
6
1
lon lat cluster
2
0 32 13 1
3
1 45 13 2
4
2 13 13 3
5
6
the case is that I would like to add to the dataframe “representative points” a column with the number of points that belong to the cluster. Any idea how I can do it?
Advertisement
Answer
I think that you may need something like
JavaScript
1
9
1
import pandas as pd
2
3
a = pd.DataFrame({"lat": [76, 45, 32], "lon": [12, 34, 56]})
4
b = pd.DataFrame({"lat": [32, 45, 13], "lon": [13, 13, 13], "cluster": [1, 2, 3]})
5
6
a["cluster"] = a.index
7
grouped = b.groupby("cluster").size().reset_index(name='counts')
8
res = a.merge(grouped, on="cluster", how="outer")
9
how=’outer’ means that you keep both indices from a with no counts as well as clusters from b with no corresponding index in a. If you need something else, you may need “left”, “right” or “inner”.