I have two dataframes: df1
and df2
. I want to use aggregates to obtain the mean
and std
between the s_values in both dataframes and put those results in a new dataframe called new_df
in df1 =
statistics s_values year 1999 cigarette use 100 1999 cellphone use 310 1999 internet use 101 1999 alcohol use 100 1999 soda use 215
in df 2 =
statistics s_values year 1999 cigarette use 156 1999 cellphone use 198 1999 internet use 232 1999 alcohol use 243 1999 soda use 534
The result that I am trying to get would look something like this.
desired output new_df =
statistics difference mean std year 1999 cigarette use 56 .. .. 1999 cellphone use 112 .. .. 1999 internet use 78 .. .. 1999 alcohol use 143 .. .. 1999 soda use 319 .. ..
I have managed to build a dataframe with a column with the difference in values using the code
new_df = df1.assign(Value=(df1['s_values'] - df2['s_values].abs()) new_df.rename(columns={'s_values':'difference'}, inplace=True)
this gives me this output but I do not know how to add the columns for the aggregate mean and std
statistics difference year 1999 cigarette use 56 1999 cellphone use 112 1999 internet use 78 1999 alcohol use 143 1999 soda use 319
Any help is much appreciated
Advertisement
Answer
If i am understanding you right, you want to join the two dataframes and compute the mean and std dev
Can you try this?
df = df1.merge(df2, on= ['Year', 'statistics']) df['mean']=df[['difference_x', 'difference_y']].mean(axis=1) df['std'] = df[['difference_x', 'difference_y']].std(axis=1)
You could also try this if you want a groupby solution as mentioned in your comments
pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).std()
pd.concat([df1[['difference']], df2[['difference']]]).groupby(level=0).mean()