Skip to content

pandas data frame, group by multiple cols and put other columns’ contents in one

The goal is to organize the data based on multiple columns and put contents of other columns in one cell, for example,



To organize the table based on name/type/link, expect to get:


I know it is possible to organize the data stream based on multiple columns,


But I don’t know how to deal with the other two columns and put them in one cell. Thank you for any suggestions.



You can use groupby(...)[column].agg(...) since this this is a reduction/aggregation. To aggregate, you simply want to join the contents into a single string so you can use something like " ".join as your aggregation function like so:



  • passing as_index=False to the groupby statement puts the columns that we grouped by ['name', 'type', 'link'] back into the dataframe as columns (if we didn’t specify they would make up the Index
  • in this line .groupby(['name','type', 'link'], as_index=False)[["id", "subs"]] we don’t NEED to specify the columns since they’re the only remaining columns in our dataframe to work with. This snippet will run without selecting them manually, I just prefer explicit > implicit

At the end of the day, these data are mainly only useful to look at. Putting multiple values into a single string into a cell of a table makes those values very difficult to work with in the future.

To reinstate the ordering of your original dataframe you can:

  • set "id" as the index (as your original dataframe had)
  • call reindex to reorder the columns of the new_df to be that of the original df
  • ALTERNATIVELY, you can simply select the columns using bracket notation with the previous df.columns

Note that you’ll also get the “correct” ordering by just calling new_df.set_index("id") however the methods I showed are a more explicit.
