I have a DataFrame df_data:
CustID MatchID LocationID isMajor #Major is 1 and Minor is 0 1 11111 324 0 1 11111 324 0 1 11111 324 0 1 22222 490 0 1 33333 675 1 2 44444 888 0
I have a function and parameter like this:
def compute_something(list_minor = None, list_major = None): return pass
Explain Parameters: with CustID = 1 the parameters should be list_minor = [3,1] (position is not important), list_major = [1] because with LocationID = 324 he get 3 times and LocationID = 490 he get 1 time (324,490 gets isMajor = 0 so it should be into 1 list). Similiar, CustID2 have parameters list_minor = [1] and list_major = [] (if he don’t have data major/minor, I should be pass [].
This is my program:
data = [
[1, 11111, 324, 0],
[1, 11111, 324, 0],
[1, 11111, 324, 0],
[1, 22222, 490, 0],
[1, 33333, 675, 1],
[2, 44444, 888, 0]
]
df_data = pd.DataFrame(data, columns = ['CustID','MatchID','LocationID','IsMajor'])
df_parameter = DataFrame()
df_parameter['parameters'] = df.groupby(['CustID','MatchID','IsMajor'])['LeagueID'].nunique()
But results of df_parameter['parameters'] is wrong:
parameters
CustID MatchID IsMajor
1 11111 0 1 #should be 3
22222 0 1
33333 1 1
2 44444 0 1
Can I get the parameters I explained above with groupby and pass them to the function?
Advertisement
Answer
How about:
(df.groupby(['CustID','isMajor', 'MatchID']).size()
.groupby(level=[0,1]).agg(set)
.unstack('isMajor')
)
Output:
isMajor 0 1
CustID
1 {1, 3} {1}
2 {1} NaN
Update Try this one groupby:
(df.groupby(['CustID','isMajor'])['MatchID']
.apply(lambda x: x.value_counts().agg(list))
.unstack('isMajor')
)
Also, groupby with two keys can be slow. In that case, you can just concatenate the keys and groupby on that:
keys = df['CustID'].astype(str) + '_' + df['isMajor'].astype(str) (df.groupby(keys)['MatchID'] .apply(lambda x: x.value_counts().agg(list)) )