Skip to content
Advertisement

How create a column in dataframe1 based on a column created in dataframe2, which is derived by using groupby() on dataframe1

Firstly I apologise for how poorly worded the title is.

I have 2 pandas dataframes. X_train and crunched_X_train.
crunched_X_train is derived from X_train by using groupby() on a column named id.
I then did some stuff on crunched_X_train such that I created a new column in this dataframe named label. For each id in crunched_X_train there was a label in that row. label is a number between 0 and 3. I want to add this label column to the original X_train such that for every id in X_train there will be a label on that row which is got from crunched_X_train.

I’ll try to explain better with the tables.
I have omitted columns from the tables which don’t matter here.
crunched_X_train:

JavaScript

I want X_train to look like, with new column label:

JavaScript

I hoped I could do it like this but it throws
ValueError: Can only compare identically-labeled Series objects

JavaScript

I then tried this iterative approach, which appears to be working. But is bad practise and takes forever.

JavaScript

Any help is appreciated. Thanks.

Advertisement

Answer

After the groupby, the id column is the index of crunched_X_train. Then you can just that index to join to the original dataframe (where id is a column):

JavaScript

assuming that crunched_X_train has id as the index and that X_train has the a column named id.

See the pandas docs here: pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement