count_freq data 3 [['58bcd029', 2, 'expert'], ['58bcd029', 2, 'user'], ['58bcd029', 2, 'expert']] 2 [['58bcd029', 2, 'expert'], ['58bcd029', 2, 'expert']] 1 [['1ee429fa', 1, 'expert']]
so I want to get the count of ‘expert’ and ‘user’ from every row of data frame and from every list. And after getting a count of experts and users, I want to store the respective ids in another list. I have tried converting them into the dictionary and calculate using key but it is not working. Can anyone help me doing this?
I want the dataframe in this format:
count_freq count_expert ids count_user ids 3 2 ['58bcd029','58bcd029'] 1 ['58bcd029'] 2 2 ['58bcd029','58bcd029'] 0 [] 1 1 ['1ee429fa'] 0 []
Advertisement
Answer
One solution might be:
import pandas as pd data = pd.DataFrame({ 'col': [[['58bcd029', 2, 'expert'], ['58bcd029', 2, 'user'], ['58bcd029', 2, 'expert']], [['58bcd029', 2, 'expert'], ['58bcd029', 2, 'expert']], [['1ee429fa', 1, 'expert']]] }) print(data) col 0 [[58bcd029, 2, expert], [58bcd029, 2, user], [... 1 [[58bcd029, 2, expert], [58bcd029, 2, expert]] 2 [[1ee429fa, 1, expert]] data['count_expert'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('expert')) data['count_user'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('user')) data['ids_expert'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'expert']))) data['ids_user'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'user']))) # For the purpose of illustration, I just selected these rows, but `col` is also there. print(data[['count_expert', 'count_user', 'ids_expert', 'ids_user']]) count_expert count_user ids_expert ids_user 0 2 1 [58bcd029] [58bcd029] 1 2 0 [58bcd029] [] 2 1 0 [1ee429fa] []