How to improve the computation speed of subsetting a pandas dataframe?

Question

I have a large df (14*1'000'000) and I want to subset it. The calculation seems to take unsurprisingly a lot of time though and I wonder how to improve the speed. What I want is to subset for each Name the lowest value of Total_time while ignoring zero values and picking only the first one if there is more than

Accepted Answer

in general, you don&#8217;t want to iterate over  each item.if you want the Name with the smallest time:new_df = df[df["Total_time"] != 0].copy() # you seem to be throwing away 0out = new_df.groupby("Name")["Total_time"].min()If you need the rest of the columns:new_df.loc[new_df.groupby("Name")["total_time"].idxmin()]

Advertisement

Answer