Skip to content
Advertisement

Count frequencies (unique rows) from a pandas list type column

I have a dataframe (df) like this:

id  col
1  [A, B, C, C]
2  [B, C, D]
3  [C, D, E]

And, I have list like this:

l = ["A", "C", "F"]

For each element in l, I want to count the unique rows they appear in df.

'A': 1, 'C': 3, 'F': 0

But I’m not getting the part where I can check if the value exists in the list-column of the dataframe.

d = {}
for i in l:
    df_tmp = df[i.isin(df['col'])]['id'] ## wrong, showing error, isin is not a string attribute
    d[i] = len(df_tmp)

Anyway I can fix this? Or is there a more cleaner/efficient way?

N.B. There is a similar question Frequency counts for a pandas column of lists, but it is different as I have an external list to check the frequency.

Advertisement

Answer

Here we are using apply method that applies given function to each element of the column (in our case it is the function that tests whether an element belongs to the list or not), then we sum True values, i.e. rows in which we found requested values and eventually save it to the dictionary. And we do it for all requested letters. I have not tested performance of this solution.

    import pandas as pd
    
    df = pd.DataFrame([
        {'id': 1, 'col': ['A', 'B', 'C', 'C']},
        {'id': 2, 'col': ['B', 'C', 'D']},
        {'id': 3, 'col': ['C', 'D', 'E']}])
    
    letters = ["A", "C", "D", "F"]
    res = {v: df['col'].apply(lambda x: v in x).sum()
           for v in letters}

    # output
    # {'A': 1, 'C': 3, 'D': 2, 'F': 0}
Advertisement