I have a dataframe
and want to convert a dictionary
consists of set
.
To be specific, my dataframe and what I want to make it as below:
JavaScript
x
12
12
1
month date
2
0 JAN 1
3
1 JAN 1
4
2 JAN 1
5
3 FEB 2
6
4 FEB 2
7
5 FEB 3
8
6 MAR 1
9
7 MAR 2
10
8 MAR 3
11
12
My goal:
JavaScript
1
2
1
dict = {'JAN' : {1}, 'FEB' : {2,3}, 'MAR' : {1,2,3}}
2
I also wrote a code below, however, I am not sure it is suitable. In reality, the data is large, so I would like to know any tips or other efficient (faster) way to make it.
JavaScript
1
13
13
1
import pandas as pd
2
df = pd.DataFrame({'month' : ['JAN','JAN','JAN','FEB','FEB','FEB','MAR','MAR','MAR'],
3
'date' : [1, 1, 1, 1, 2, 3, 1, 2, 3]})
4
df_list = df.values.tolist()
5
6
monthSet = ['JAN','FEB','MAR']
7
inst_id_dict = {}
8
for i in df_list:
9
monStr = i[0]
10
if monStr in monthSet:
11
inst_id = i[1]
12
inst_id_dict.setdefault(monStr, set([])).add(inst_id)
13
Advertisement
Answer
Let’s try grouping on the “month’ column, then aggregating by GroupBy.unique
:
JavaScript
1
3
1
df.groupby('month', sort=False)['date'].unique().map(set).to_dict()
2
# {'JAN': [1], 'FEB': [2, 3], 'MAR': [1, 2, 3]}
3
Or, if you’d prefer a dictionary of sets, use Groupby.agg
:
JavaScript
1
3
1
df.groupby('month', sort=False)['date'].agg(set).to_dict()
2
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}
3
Another idea is to iteratively build a dict (don’t worry, despite using loops this is likely to outspeed the groupby
option):
JavaScript
1
7
1
out = {}
2
for m, d in df.drop_duplicates(['month', 'date']).to_numpy():
3
out.setdefault(m, set()).add(d)
4
5
out
6
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}
7