I have a dataframe and want to convert a dictionary consists of set. To be specific, my dataframe and what I want to make it as below: My goal: I also wrote a code below, however, I am not sure it is suitable. In reality, the data is large, so I would like to know any tips or other efficient

How to convert dataframe into dictionary of sets?

I have a dataframe and want to convert a dictionary consists of set.

To be specific, my dataframe and what I want to make it as below:

    month   date
0   JAN       1
1   JAN       1
2   JAN       1
3   FEB       2
4   FEB       2
5   FEB       3
6   MAR       1
7   MAR       2
8   MAR       3

JavaScript
​x
 
    month   date
0   JAN       1
1   JAN       1
2   JAN       1
3   FEB       2
4   FEB       2
5   FEB       3
6   MAR       1
7   MAR       2
8   MAR       3
​
​

My goal:

dict = {'JAN' : {1}, 'FEB' : {2,3}, 'MAR' : {1,2,3}}

JavaScript
 
dict = {'JAN' : {1}, 'FEB' : {2,3}, 'MAR' : {1,2,3}}
​

I also wrote a code below, however, I am not sure it is suitable. In reality, the data is large, so I would like to know any tips or other efficient (faster) way to make it.

import pandas as pd
df = pd.DataFrame({'month' : ['JAN','JAN','JAN','FEB','FEB','FEB','MAR','MAR','MAR'],
                    'date'  : [1, 1, 1, 1, 2, 3, 1, 2, 3]})
df_list = df.values.tolist()

monthSet = ['JAN','FEB','MAR']
inst_id_dict = {}
for i in df_list:
    monStr = i[0]
    if monStr in monthSet:
        inst_id = i[1]
        inst_id_dict.setdefault(monStr, set([])).add(inst_id)

JavaScript
 
import pandas as pd
df = pd.DataFrame({'month' : ['JAN','JAN','JAN','FEB','FEB','FEB','MAR','MAR','MAR'],
                    'date'  : [1, 1, 1, 1, 2, 3, 1, 2, 3]})
df_list = df.values.tolist()
​
monthSet = ['JAN','FEB','MAR']
inst_id_dict = {}
for i in df_list:
    monStr = i[0]
    if monStr in monthSet:
        inst_id = i[1]
        inst_id_dict.setdefault(monStr, set([])).add(inst_id)
​

Answer

Let’s try grouping on the “month’ column, then aggregating by GroupBy.unique:

df.groupby('month', sort=False)['date'].unique().map(set).to_dict()
#  {'JAN': [1], 'FEB': [2, 3], 'MAR': [1, 2, 3]}

JavaScript
 
df.groupby('month', sort=False)['date'].unique().map(set).to_dict()
#  {'JAN': [1], 'FEB': [2, 3], 'MAR': [1, 2, 3]}
​

Or, if you’d prefer a dictionary of sets, use Groupby.agg:

df.groupby('month', sort=False)['date'].agg(set).to_dict()
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}

JavaScript
 
df.groupby('month', sort=False)['date'].agg(set).to_dict()
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}
​

Another idea is to iteratively build a dict (don’t worry, despite using loops this is likely to outspeed the groupby option):

out = {}
for m, d in df.drop_duplicates(['month', 'date']).to_numpy():
     out.setdefault(m, set()).add(d)

out
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}

JavaScript
 
out = {}
for m, d in df.drop_duplicates(['month', 'date']).to_numpy():
     out.setdefault(m, set()).add(d)
​
out
# {'JAN': {1}, 'FEB': {2, 3}, 'MAR': {1, 2, 3}}
​

Advertisement

Answer