Skip to content
Advertisement

Pandas optimize performance loc function

I’ve a dataset with > 50000 rows. Some of the data is missing, for that I’m using a nested loop and loc function to fill in the missing values.

Dataset enter image description here

So what I’m doing basically is for the second row, I’ll find the mean of all the rating for usa and mean for all rating for 1, divide by 2 and use it as rating. In this case it will be (3.25). Code I’ve written:

JavaScript

Is there any way to optimize this? This takes a lot of time, I found a finction called at but it uses indexes not conditions, np.where but how will it fit in this case (takes a lot of time or gives me error)?

Advertisement

Answer

calculate the mean of location and name first separately. then join the origin dataframe.

JavaScript

result:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement