Pandas DataFrame mean of data in columns occurring before certain date time

Question

I have a dataframe with ID&#8217;s of clients and their expenses for 2014-2018. What I want is to have the mean of the expenses per ID but only the years before a certain date can be taken into account when calculating the mean value (so column &#8216;Date&#8217; dictates which columns can be taken into accou…

Accepted Answer

Solved: one possible answer to my own questionimport pandas as pd import numpy as np  df = pd.DataFrame({"ID":   [12,96,20,73,84,26,87,64,11,34],                                 "y_2014": [100,120,np.nan,180,110,130,170,140,80,96],                   "y_2015": [122,159,164,421,654,np.nan,256,754,985,65],                                 "y_2016": [324,54,687,512,913,754,843,95,184,127],                  "y_2017": [632,452,165,184,173,124,97,101,84,130],                                 "y_2018": [np.nan,541,245,953,103,207,806,541,90,421],                    "Date": ['2016-03-08', '2015-04-09', '2016-02-15', '2018-05-01', '2017-08-04',                                          '2016-07-03', '2013-02-04', '2016-06-08', '2019-03-05', '2014-05-14']})#Subset from original df to calculate meansubset = df.loc[:,['y_2014', 'y_2015', 'y_2016', 'y_2017', 'y_2018']] #an expense value is only available for the calculation of the mean when that year has passed, therefore 2015-01-01 is chosen for the 'y_2014' column in the subset etc. to check with the 'Date'-columnsubset.columns = ['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01']  s = subset.columns[0:].values < df.Date.values[:,None] t = s.astype(float)t[t == 0] = np.nan df['mean'] = (subset.iloc[:,0:]*t).mean(1)  print(df)#Additionally: (gives the sum of expenses before a certain date in the 'Date'-columndf['sum'] = (subset.iloc[:,0:]*t).sum(1)  print(df)

Advertisement

Answer