Firstly I apologise for how poorly worded the title is.
I have 2 pandas dataframes. X_train and crunched_X_train.
crunched_X_train is derived from X_train by using groupby() on a column named id.
I then did some stuff on crunched_X_train such that I created a new column in this dataframe named label. For each id in crunched_X_train there was a label in that row. label is a number between 0 and 3.
I want to add this label column to the original X_train such that for every id in X_train there will be a label on that row which is got from crunched_X_train.
I’ll try to explain better with the tables.
I have omitted columns from the tables which don’t matter here.
crunched_X_train:
id label 1 2 2 0 3 1 4 3 5 2
I want X_train to look like, with new column label:
id label 1 2 1 2 1 2 . . 2 0 2 0 3 1 3 1
I hoped I could do it like this but it throws
ValueError: Can only compare identically-labeled Series objects
X_train["label"] = crunched_X_train["label"].loc[crunched_X_train["id"] == X_train["id"]]
I then tried this iterative approach, which appears to be working. But is bad practise and takes forever.
for index, row in X_train.iterrows(): row["label"] = crunched_X_train["label"].loc[crunched_X_train["id"] == row["id"]]
Any help is appreciated. Thanks.
Advertisement
Answer
After the groupby, the id column is the index of crunched_X_train
. Then you can just that index to join to the original dataframe (where id
is a column):
X_train.join(crunched_X_train["label"], on="id")
assuming that crunched_X_train
has id
as the index and that X_train
has the a column named id
.
See the pandas docs here: pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html