I want to divide a dataframe column by a specific cell in the same dataframe.
I have a dataframe like this:
date type score
20201101 experiment1 30
20201101 experiment2 20
20201101 baseline 10
20201102 experiment1 60
20201102 experiment2 50
20201102 baseline 10
I want to compute the score_ratio by dividing the score by the ‘baseline’ score of that date.
date type score score_ratio
20201101 experiment1 30 3
20201101 experiment2 20 2
20201101 baseline 10 1
20201102 experiment1 60 6
20201102 experiment2 50 5
20201102 baseline 10 1
The score_ratio for (date, type) = (20201101, experiment1)
should be obtained by dividing its score by the score of (20201101, baseline)
. In this case, it should be 30 / 10 = 3. Similarly. for (20201101, experiment2)
, we should divide the score by the same thing, (20201101, baseline)
. For a different date, say (20201102, experiment1)
, it should be divided by the baseline of that date, (20201102, baseline)
.
How do I add this column with dataframe operations?
So far, I have this but am unsure of what expression I should be dividing by: df['score_ratio'] = df['score'].div(...)
Edit:
I get the error for the last line ValueError: Length of values does not match length of index
ID date type room score
0 id1 20201120 baseline 1 450.25
0 id2 20201120 experiment1 1 -3637.24
0 id3 20201121 baseline 1 200.00
1 id4 20201121 experiment1 1 300.00
2 id5 20201120 baseline 2 600.00
3 id6 20201120 experiment1 2 800.00
_df = df.set_index('date', 'room')
d = _df.query('type=="baseline"')
print(_df['score'].div(d['score']).values)
df['score_ratio'] = _df['score'].div(d['score']).values
Advertisement
Answer
#Mask all whose type is baseline into a new datframe and merge to the main df
g=pd.merge(df, df[df.type.eq('baseline')].drop(columns='type'),how='left', on='date', suffixes=('', '_right'))
#Calculate the score_ratio and drop the extra column acquired during merge
df=g.assign(score_ratio=g.score.div(g.score_right).astype(int)).drop(columns=['score_right'])
print(df)
date type score score_ratio
0 20201101 experiment1 30 3
1 20201101 experiment2 20 2
2 20201101 baseline 10 1
3 20201102 experiment1 60 6
4 20201102 experiment2 50 5
5 20201102 baseline 10 1
How it works
#New dataframe with baselines only
df1=df[df.type.eq('baseline')].drop(columns='type')
#Modified original dataframe with baselines added
g=pd.merge(df, df1,how='left', on='date', suffixes=('', '_right'))
#new column called score_ratio
g=g.assign(score_ratio=g.score.div(g.score_right).astype(int))
#drop column called score_right which was acquired during merge
g=g.drop(columns=['score_right'])