I have a following Pandas dataframe, where I want to change a value of ‘fmc’ column based on ‘time’, ‘samples’ and ‘uid’ columns.
Concept is as following:
For the same date
, if df.samples == 'C' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.4
similarly for the same date
, if df.samples == 'C' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.8
For the same date
, if df.samples == 'E' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.4
similarly for the same date
, if df.samples == 'E' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.15
For the same date
, if df.samples == 'ns' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.2
similarly for the same date
, if df.samples == 'ns' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.05
I am new to python, so I apologize if I couldn’t explain well and please let me know if you need more clarification.
time samples uid fmc
0 2015-10-11 C Plot1 98.226352
1 2015-10-11 C Plot2 132.984817
2 2015-10-11 E Plot1 114.147964
3 2015-10-11 E Plot2 110.083699
4 2015-10-11 ns Plot1 113.258977
5 2015-10-11 ns Plot2 113.768023
6 2015-10-19 C Plot1 118.503214
7 2015-10-19 E Plot1 108.733209
8 2015-10-19 ns Plot1 59.316977
9 2015-10-27 C Plot1 104.977531
10 2015-10-27 C Plot2 121.213887
11 2015-10-27 E Plot1 129.575670
12 2015-10-27 E Plot2 118.639048
13 2015-10-27 ns Plot1 103.581065
14 2015-10-27 ns Plot2 102.278469
15 2015-11-17 C Plot1 103.820689
16 2015-11-17 C Plot2 117.333382
17 2015-11-17 E Plot1 143.418932
18 2015-11-17 E Plot2 160.342155
19 2015-11-17 ns Plot1 89.890484
Advertisement
Answer
This code:
import pandas as pd
data = [
['2015-10-11', 'C', 'Plot1', 98.226352 ],
['2015-10-11', 'C', 'Plot2', 132.984817 ],
['2015-10-11', 'E', 'Plot1', 114.147964 ],
['2015-10-11', 'E', 'Plot2', 110.083699 ],
['2015-10-11', 'ns', 'Plot1', 113.258977 ],
['2015-10-11', 'ns', 'Plot2', 113.768023 ],
['2015-10-19', 'C', 'Plot1', 118.503214 ],
['2015-10-19', 'E', 'Plot1', 108.733209 ],
['2015-10-19', 'ns', 'Plot1', 59.316977 ],
['2015-10-27', 'C', 'Plot1', 104.977531 ],
['2015-10-27', 'C', 'Plot2', 121.213887 ],
['2015-10-27', 'E', 'Plot1', 129.575670 ],
['2015-10-27', 'E', 'Plot2', 118.639048 ],
['2015-10-27', 'ns', 'Plot1', 103.581065 ],
['2015-10-27', 'ns', 'Plot2', 102.278469 ],
['2015-11-17', 'C', 'Plot1', 103.820689 ],
['2015-11-17', 'C', 'Plot2', 117.333382 ],
['2015-11-17', 'E', 'Plot1', 143.418932 ],
['2015-11-17', 'E', 'Plot2', 160.342155 ],
['2015-11-17', 'ns', 'Plot1', 89.890484]
]
df = pd.DataFrame(columns=['time', 'samples', 'uid', 'fmc'], data=data)
print (df.head(10))
df['result'] = df.apply(
lambda item:
(item.fmc * 0.4) if item.samples == 'C' and item.uid == 'Plot1' else
(item.fmc * 0.8) if item.samples == 'C' and item.uid == 'Plot2' else
(item.fmc * 0.4) if item.samples == 'E' and item.uid == 'Plot1' else
(item.fmc * 0.15)if item.samples == 'E' and item.uid == 'Plot2' else
(item.fmc * 0.2) if item.samples == 'ns'and item.uid == 'Plot1' else
(item.fmc * 0.05)if item.samples == 'ns'and item.uid == 'Plot2' else None,
axis=1
)
print(df.head(10))
Should produce this output:
time samples uid fmc
0 2015-10-11 C Plot1 98.226352
1 2015-10-11 C Plot2 132.984817
2 2015-10-11 E Plot1 114.147964
3 2015-10-11 E Plot2 110.083699
4 2015-10-11 ns Plot1 113.258977
5 2015-10-11 ns Plot2 113.768023
6 2015-10-19 C Plot1 118.503214
7 2015-10-19 E Plot1 108.733209
8 2015-10-19 ns Plot1 59.316977
9 2015-10-27 C Plot1 104.977531
time samples uid fmc result
0 2015-10-11 C Plot1 98.226352 39.290541
1 2015-10-11 C Plot2 132.984817 106.387854
2 2015-10-11 E Plot1 114.147964 45.659186
3 2015-10-11 E Plot2 110.083699 16.512555
4 2015-10-11 ns Plot1 113.258977 22.651795
5 2015-10-11 ns Plot2 113.768023 5.688401
6 2015-10-19 C Plot1 118.503214 47.401286
7 2015-10-19 E Plot1 108.733209 43.493284
8 2015-10-19 ns Plot1 59.316977 11.863395
9 2015-10-27 C Plot1 104.977531 41.991012
Process finished with exit code 0
Inspired by df.apply
, using axis=1
, and passing a lambda function containing full set of the conditions, you will have the expected values in result
column.
The apply
function will pass the dataframe’s columns (because axis=1
), to the lambda function as item
for each record in the series of values. The lambda function also, returns the corresponding result
value for each given record/item in the series, so we don’t need to worry about matching date/index values.
Reference for pandas.DataFrame.apply
here.