I have a dataframe (df
) like this:
id col 1 [A, B, C, C] 2 [B, C, D] 3 [C, D, E]
And, I have list like this:
l = ["A", "C", "F"]
For each element in l
, I want to count the unique rows they appear in df
.
'A': 1, 'C': 3, 'F': 0
But I’m not getting the part where I can check if the value exists in the list-column
of the dataframe.
d = {} for i in l: df_tmp = df[i.isin(df['col'])]['id'] ## wrong, showing error, isin is not a string attribute d[i] = len(df_tmp)
Anyway I can fix this? Or is there a more cleaner/efficient way?
N.B. There is a similar question Frequency counts for a pandas column of lists, but it is different as I have an external list to check the frequency.
Advertisement
Answer
Here we are using apply method that applies given function to each element of the column (in our case it is the function that tests whether an element belongs to the list or not), then we sum True
values, i.e. rows in which we found requested values and eventually save it to the dictionary. And we do it for all requested letters. I have not tested performance of this solution.
import pandas as pd df = pd.DataFrame([ {'id': 1, 'col': ['A', 'B', 'C', 'C']}, {'id': 2, 'col': ['B', 'C', 'D']}, {'id': 3, 'col': ['C', 'D', 'E']}]) letters = ["A", "C", "D", "F"] res = {v: df['col'].apply(lambda x: v in x).sum() for v in letters} # output # {'A': 1, 'C': 3, 'D': 2, 'F': 0}