Skip to content
Advertisement

Getting a dictionnary of lists that contain element from a column using a groupby

I have a dataframe that looks like this, with 1 string column and 1 int column.

import random
columns=['EG','EC','FI', 'ED', 'EB', 'FB', 'FCY', 'ECY', 'FG', 'FUR', 'E', '[ED']
choices_str = random.choices(columns, k=200)
choices_int = random.choices(range(1, 8), k=200)
my_df = pd.DataFrame({'column_A': choices_str, 'column_B': choices_int})

enter image description here

I would like to get at the very end a dictionnary of lists that store all values of column B groupby A, like this :

enter image description here

What I made to achieve this to used a groupby to get number of occurences for column_B :

group_by = my_df.groupby(['column_A','column_B'])['column_B'].count().unstack().fillna(0).T
group_by

enter image description here

And then use some list comprehensions to create by hand my lists for each column_A and add them to the dictionnary. Is there anyway to get more directly using a groupby ?

Advertisement

Answer

I am not aware of a method that is able to achieve that within the groupby statement. But I think you could try something like this alternatively:

import random
import pandas as pd
columns=['EG','EC','FI', 'ED', 'EB', 'FB', 'FCY', 'ECY', 'FG', 'FUR', 'E', '[ED']
choices_str = random.choices(columns, k=200)
choices_int = random.choices(range(1, 8), k=200)
my_df = pd.DataFrame({'column_A': choices_str, 'column_B': choices_int})

final_dict = {val: my_df.loc[my_df['column_A'] == val, 'column_B'].values.tolist() for val in my_df['column_A'].unique()}

This dict-comprehension is a one-liner and takes all column_B values that correspond to a specific column_A value and assigns them to the dict stored in a list with column_A values as keys.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement