Skip to content
Advertisement

How to get a count of specific element in nested list python

count_freq   data
3            [['58bcd029', 2, 'expert'], 
              ['58bcd029', 2, 'user'], 
             ['58bcd029', 2, 'expert']]
2            [['58bcd029', 2, 'expert'], 
             ['58bcd029', 2, 'expert']]
1            [['1ee429fa', 1, 'expert']]

so I want to get the count of ‘expert’ and ‘user’ from every row of data frame and from every list. And after getting a count of experts and users, I want to store the respective ids in another list. I have tried converting them into the dictionary and calculate using key but it is not working. Can anyone help me doing this?

I want the dataframe in this format:

count_freq   count_expert  ids                     count_user ids
3            2             ['58bcd029','58bcd029'] 1          ['58bcd029']
2            2             ['58bcd029','58bcd029'] 0          []
1            1             ['1ee429fa']            0          []

Advertisement

Answer

One solution might be:

import pandas as pd

data = pd.DataFrame({
    'col': [[['58bcd029', 2, 'expert'],
             ['58bcd029', 2, 'user'],
             ['58bcd029', 2, 'expert']],
            [['58bcd029', 2, 'expert'],
             ['58bcd029', 2, 'expert']],
            [['1ee429fa', 1, 'expert']]]
})

print(data)
                                                 col
0  [[58bcd029, 2, expert], [58bcd029, 2, user], [...
1     [[58bcd029, 2, expert], [58bcd029, 2, expert]]
2                            [[1ee429fa, 1, expert]]



data['count_expert'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('expert'))
data['count_user'] = data['col'].apply(lambda x: [item for sublist in x for item in sublist].count('user'))
data['ids_expert'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'expert'])))
data['ids_user'] = data['col'].apply(lambda x: list(set([sublist[0] for sublist in x if sublist[2] == 'user'])))


# For the purpose of illustration, I just selected these rows, but `col` is also there.
print(data[['count_expert', 'count_user', 'ids_expert', 'ids_user']])

   count_expert  count_user  ids_expert    ids_user
0             2           1  [58bcd029]  [58bcd029]
1             2           0  [58bcd029]          []
2             1           0  [1ee429fa]          []
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement