Skip to content
Advertisement

Increasing performance for search pandas df, Count occurrences of starting string grouped by unique identifier

Current dataframe is as follows:

JavaScript

Expected Output:

JavaScript

Question: How do I get the counts of the ‘starting string’, only when uniqueID is occurring for the first time with increased performance.

Thus far, I’m doing with a simple for loop of the data and checking with if/else statements. But this is incredibly slow with a large dataframe.

I’m curious if there are any functions built in pandas, or another library out there, that would reduce the overall time it takes.

Advertisement

Answer

Achieving better than O(N) is not possible.

You can drop_duplicates, then value_counts:

JavaScript

output:

JavaScript

As dictionary:

JavaScript

output: {'goodbye': 3, 'hello': 2}

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement