Skip to content
Advertisement

Is there a way to get the count of every element in lists stored as rows in a data frame?

Hi, I’m using pandas to display and analyze a csv file, some columns were ‘object dtype’ and were displayed as lists, I used ‘literal_eval’ to convert the rows of a column named ‘sdgs’ to lists, my problem is how to use ‘groupby’ or any another way to display the count of every element stored at this lists uniquely, especially since there are many common elements between these lists.

df = pd.read_csv("../input/covid19-public-media-dataset/covid19_articles_20220420.csv")
df.dropna(subset=['sdgs'],inplace=True)
df=df[df.astype(str)['sdgs'] != '[]']
df.sdgs = df.sdgs.apply(literal_eval)
df.reset_index(drop=True, inplace=True)

This is a sample of data and my problem is about the last column

This is an example of the elements I want to count

Thank you

Advertisement

Answer

Given this example data:

import pandas as pd

df = pd.DataFrame({'domain': ['a', 'a', 'b', 'c'], 
                   'sdgs': [['Just', 'a', 'sentence'], ['another', 'sentence'], 
                            ['a', 'word', 'and', 'a', 'word'], ['nothing', 'here']]})
print(df)
  domain                     sdgs
0      a      [Just, a, sentence]
1      a      [another, sentence]
2      b  [a, word, and, a, word]
3      c          [nothing, here]

To get the word count across all the lists in the sdgs column you can concatenate the lists with Series.agg and use collections.Counter:

import collections

word_counts = collections.Counter(df['sdgs'].agg(sum))
print(word_counts)
Counter({'a': 3, 'sentence': 2, 'word': 2, 'Just': 1, 'another': 1, 
         'and': 1, 'nothing': 1, 'here': 1})
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement