What is the most efficient way of counting occurrences in pandas?

Question

I have a large (about 12M rows) DataFrame df: The following ran in a timely fashion: However, this is taking an unexpectedly long time to run: What am I doing wrong here? Is there a better way to count occurrences in a large DataFrame? ran pretty well, so I really did not expect this Occurrences_of_Words DataFrame to take very long

Accepted Answer

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you&#8217;ll save some time. I&#8217;m not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you&#8217;ll do much better than that.

Advertisement

Answer