Replacing NaN values in timeseries Pandas dataframe with mean values

Question

I have a dataframe that has 2 columns, date and values. I want to replace NaN values in the dataframe with mean values, but with specific condition. NaN values should be replaced with mean value of the values from the same period for the year that has that value (+/- 1 day). Value for 2021-02-04 should be: Because dates "2022-02-03",

Accepted Answer

Really hard to say what exactly you want to do, but given your data:import pandas as pdimport numpy as npdates = ["2022-02-01", "2022-02-02", "2022-02-03", "2022-02-04", "2022-02-05", "2022-02-06",         "2021-02-01", "2021-02-02", "2021-02-03", "2021-02-04", "2021-02-05", "2021-02-06"]         values = [3,1,6,2,5,7,3, None, 3, None, None, None]         df = pd.DataFrame({"date": dates,                   "values": values                    })df = df.sort_values(by = 'date', ascending = False).reset_index(drop=True)You can try something like this:def process_data(dates, values):  new_values = np.copy(values)  indices = np.argwhere(np.isnan(values))  dates_without_year = ['-'.join(d.split('-')[1:]) for d in dates.astype(str)]  for i, d in enumerate(dates):    if i in indices:      possible_dates = np.array(pd.Series(pd.date_range(d - pd.Timedelta(days=1), d + pd.Timedelta(days=1))), dtype='datetime64[D]')      possible_dates = ['-'.join(d.split('-')[1:]) for d in possible_dates.astype(str)]      mean_values = values[np.argwhere(np.isin(dates_without_year, possible_dates))]      new_values[i] = np.mean(mean_values[~np.isnan(mean_values)])  return new_valuesdf['values'] = process_data(np.array(df['date'].values, dtype='datetime64[D]'), df['values'].to_numpy())          date    values0   2022-02-06  7.0000001   2022-02-05  5.0000002   2022-02-04  2.0000003   2022-02-03  6.0000004   2022-02-02  1.0000005   2022-02-01  3.0000006   2021-02-06  6.0000007   2021-02-05  4.6666678   2021-02-04  4.0000009   2021-02-03  3.00000010  2021-02-02  3.20000011  2021-02-01  3.000000Take a close look at, for example, 2021-02-04 which had a NaN value. I disregard the years (as mentioned in the comments) and just look at the months and days resulting in (6 + 2 + 5 + 3) / 4 = 4.0, since &#8220;2022-02-03&#8221;, &#8220;2022-02-04&#8221;, &#8220;2022-02-05&#8221;, and &#8220;2021-02-03&#8221; have values of 6, 2, 5, and 3.

Advertisement

Answer