count_freq data
3 [['58bcd029', 2, 'expert'],
['58bcd029', 2, 'user'],
['58bcd029', 2, 'expert']]
2 [['58bcd029', 2, 'expert'],
['58bcd029', 2, 'expert']]
1 [['1ee429fa', 1, 'expert']]
so I want to get the count of ‘expert’ and ‘user’ from every row of data frame and from every list. And after getting a count of experts and users, I want to store the respective ids in another list. I have tried converting them into the dictionary and calculate using key but it is not working. Can anyone help me doing this?
I want the dataframe in this format:
count_freq count_expert ids count_user ids 3 2 ['58bcd029','58bcd029'] 1 ['58bcd029'] 2 2 ['58bcd029','58bcd029'] 0 [] 1 1 ['1ee429fa'] 0 []
Advertisement
Answer
One solution might be:
import pandas as pd
data = pd.DataFrame({
'col': [[['58bcd029', 2, 'expert'],
['58bcd029', 2, 'user'],
['58bcd029', 2, 'expert']],
[['58bcd029', 2, 'expert'],
['58bcd029', 2, 'expert']],
[['1ee429fa', 1, 'expert']]]
})
print(data)
col
0 [[58bcd029, 2, expert], [58bcd029, 2, user], [...
1 [[58bcd029, 2, expert], [58bcd029, 2, expert]]
2 [[1ee429fa, 1, expert]]
data['count_expert'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('expert'))
data['count_user'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('user'))
data['ids_expert'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'expert'])))
data['ids_user'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'user'])))
# For the purpose of illustration, I just selected these rows, but `col` is also there.
print(data[['count_expert', 'count_user', 'ids_expert', 'ids_user']])
count_expert count_user ids_expert ids_user
0 2 1 [58bcd029] [58bcd029]
1 2 0 [58bcd029] []
2 1 0 [1ee429fa] []