I have a dataframe
and want to convert a dictionary
consists of set
.
To be specific, my dataframe and what I want to make it as below:
month date 0 JAN 1 1 JAN 1 2 JAN 1 3 FEB 2 4 FEB 2 5 FEB 3 6 MAR 1 7 MAR 2 8 MAR 3
My goal:
dict = {'JAN' : {1}, 'FEB' : {2,3}, 'MAR' : {1,2,3}}
I also wrote a code below, however, I am not sure it is suitable. In reality, the data is large, so I would like to know any tips or other efficient (faster) way to make it.
import pandas as pd df = pd.DataFrame({'month' : ['JAN','JAN','JAN','FEB','FEB','FEB','MAR','MAR','MAR'], 'date' : [1, 1, 1, 1, 2, 3, 1, 2, 3]}) df_list = df.values.tolist() monthSet = ['JAN','FEB','MAR'] inst_id_dict = {} for i in df_list: monStr = i[0] if monStr in monthSet: inst_id = i[1] inst_id_dict.setdefault(monStr, set([])).add(inst_id)
Advertisement
Answer
Let’s try grouping on the “month’ column, then aggregating by GroupBy.unique
:
df.groupby('month', sort=False)['date'].unique().map(set).to_dict() # {'JAN': [1], 'FEB': [2, 3], 'MAR': [1, 2, 3]}
Or, if you’d prefer a dictionary of sets, use Groupby.agg
:
df.groupby('month', sort=False)['date'].agg(set).to_dict() # {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}
Another idea is to iteratively build a dict (don’t worry, despite using loops this is likely to outspeed the groupby
option):
out = {} for m, d in df.drop_duplicates(['month', 'date']).to_numpy(): out.setdefault(m, set()).add(d) out # {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}