I have a dataframe that looks like this, with 1 string column and 1 int column.
import random columns=['EG','EC','FI', 'ED', 'EB', 'FB', 'FCY', 'ECY', 'FG', 'FUR', 'E', '[ED'] choices_str = random.choices(columns, k=200) choices_int = random.choices(range(1, 8), k=200) my_df = pd.DataFrame({'column_A': choices_str, 'column_B': choices_int})
I would like to get at the very end a dictionnary of lists that store all values of column B groupby A, like this :
What I made to achieve this to used a groupby to get number of occurences for column_B :
group_by = my_df.groupby(['column_A','column_B'])['column_B'].count().unstack().fillna(0).T group_by
And then use some list comprehensions to create by hand my lists for each column_A and add them to the dictionnary. Is there anyway to get more directly using a groupby ?
Advertisement
Answer
I am not aware of a method that is able to achieve that within the groupby
statement. But I think you could try something like this alternatively:
import random import pandas as pd columns=['EG','EC','FI', 'ED', 'EB', 'FB', 'FCY', 'ECY', 'FG', 'FUR', 'E', '[ED'] choices_str = random.choices(columns, k=200) choices_int = random.choices(range(1, 8), k=200) my_df = pd.DataFrame({'column_A': choices_str, 'column_B': choices_int}) final_dict = {val: my_df.loc[my_df['column_A'] == val, 'column_B'].values.tolist() for val in my_df['column_A'].unique()}
This dict-comprehension is a one-liner and takes all column_B
values that correspond to a specific column_A
value and assigns them to the dict stored in a list with column_A
values as keys.