Skip to content
Advertisement

How to improve the computation speed of subsetting a pandas dataframe?

I have a large df (14*1’000’000) and I want to subset it. The calculation seems to take unsurprisingly a lot of time though and I wonder how to improve the speed.

What I want is to subset for each Name the lowest value of Total_time while ignoring zero values and picking only the first one if there is more than one row has the lowest value of Total_time. And then I want it to be all appended into a new dataframe unique.

Is there a general mistake in my code that makes it inefficient?

JavaScript

Advertisement

Answer

in general, you don’t want to iterate over each item.

if you want the Name with the smallest time:

JavaScript

If you need the rest of the columns:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement