Hi, I’m using pandas to display and analyze a csv file, some columns were ‘object dtype’ and were displayed as lists, I used ‘literal_eval’ to convert the rows of a column named ‘sdgs’ to lists, my problem is how to use ‘groupby’ or any another way to display the count of every element stored at this lists uniquely, especially since there are many common elements between these lists.
df = pd.read_csv("../input/covid19-public-media-dataset/covid19_articles_20220420.csv") df.dropna(subset=['sdgs'],inplace=True) df=df[df.astype(str)['sdgs'] != '[]'] df.sdgs = df.sdgs.apply(literal_eval) df.reset_index(drop=True, inplace=True)
This is a sample of data and my problem is about the last column
This is an example of the elements I want to count
Thank you
Advertisement
Answer
Given this example data:
import pandas as pd df = pd.DataFrame({'domain': ['a', 'a', 'b', 'c'], 'sdgs': [['Just', 'a', 'sentence'], ['another', 'sentence'], ['a', 'word', 'and', 'a', 'word'], ['nothing', 'here']]}) print(df)
domain sdgs 0 a [Just, a, sentence] 1 a [another, sentence] 2 b [a, word, and, a, word] 3 c [nothing, here]
To get the word count across all the lists in the sdgs
column you can concatenate the lists with Series.agg
and use collections.Counter
:
import collections word_counts = collections.Counter(df['sdgs'].agg(sum)) print(word_counts)
Counter({'a': 3, 'sentence': 2, 'word': 2, 'Just': 1, 'another': 1, 'and': 1, 'nothing': 1, 'here': 1})