Group by the column in df python

Question

I have a simple df. It has two columns. I want to groupby the values based on column a. Here is a simple example: Any input would be greatly appreciated! Desired output is: df Answer Here&#8217;s a way to do what you want. First you want to group by column &#8216;a&#8217;. Normally groupby is used to calculat…

Accepted Answer

Here&#8217;s a way to do what you want. First you want to group by column &#8216;a&#8217;. Normally groupby is used to calculate group aggregation functions:df.groupby('a')['b'].mean()but in this case we want to keep the values of b associated with each a. You can use[(a,list(b)) for a,b in df.groupby('a')['b']]    [(1, [10, 50]), (2, [20, 60]), (3, [30]), (4, [40])]Conversion of this to a dataframe almost gets us there:df2 = pd.DataFrame([(a,list(b)) for a,b in df.groupby('a')['b']],                   columns=['a','temp'])   a      temp0  1  [10, 50]1  2  [20, 60]2  3      [30]3  4      [40]The column temp can be separated into different columns with to_list:pd.DataFrame(df2['temp'].to_list())    0     10  10  50.01  20  60.02  30   NaN3  40   NaNRejoin the output dataframes:df2.join(df3)   a      temp   0     10  1  [10, 50]  10  50.01  2  [20, 60]  20  60.02  3      [30]  30   NaN3  4      [40]  40   NaNAnd clean up (remove temp column, rename columns and you probably want to do something about the integers cast to floats in the last column due to the NaNs)I&#8217;m sure there&#8217;s a cleaner way to do this but hopefully this gets you started!

Advertisement

Answer