Increasing performance for search pandas df, Count occurrences of starting string grouped by unique identifier

Question

Current dataframe is as follows: Expected Output: Question: How do I get the counts of the &#8216;starting string&#8217;, only when uniqueID is occurring for the first time with increased performance. Thus far, I&#8217;m doing with a simple for loop of the data and checking with if/else statements. But this i…

Accepted Answer

Achieving better than O(N) is not possible.You can drop_duplicates, then value_counts:out = df.drop_duplicates('uniqueID')['String'].value_counts()output:goodbye    3hello      2Name: String, dtype: int64As dictionary:df.drop_duplicates('uniqueID')['String'].value_counts().to_dict()output: {'goodbye': 3, 'hello': 2}

Advertisement

Answer