Pandas DataFrame: Fill NA values based on group mean

Question

I would like to update the NA values of a Pandas DataFrame column with the values in a groupby object. Let's illustrate with an example: We have the following DataFrame columns: We're simply measuring temperature multiple times a day for many months. Now, let's assume that for some of our records, the temperature reading failed and we have a NA.

Accepted Answer

I&#8217;m not sure if this is the fastest, however instead of taking ~1 hour for apply, it takes ~20 sec for +1M records. The below code has been updated to work on 1 or many columns.local_avg_cols = ['temperature'] # can work with multiple columns# Create groupby's to get local averageslocal_averages = df.groupby(['month', 'day'])[local_avg_cols].mean()# Convert to DataFrame and prepare for mergelocal_averages = pd.DataFrame(local_averages, columns=local_avg_cols).reset_index()# Merge into original dataframedf = df.merge(local_averages, on=['month', 'day'], how='left', suffixes=('', '_avg'))# Now overwrite na values with values from new '_avg' colfor col in local_avg_cols:    df[col] = df[col].mask(df[col].isna(), df[col+'_avg'])    # Drop new avg colsdf = df.drop(columns=[col+'_avg' for col in local_avg_cols])If anyone finds a more efficient way to do this, (efficient in processing time, or in just readability), I&#8217;ll unmark this answer and mark yours. Thank you!

Advertisement

Answer