I have a following Pandas dataframe, where I want to change a value of ‘fmc’ column based on ‘time’, ‘samples’ and ‘uid’ columns.
Concept is as following:
For the same date
, if df.samples == 'C' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.4
similarly for the same date
, if df.samples == 'C' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.8
For the same date
, if df.samples == 'E' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.4
similarly for the same date
, if df.samples == 'E' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.15
For the same date
, if df.samples == 'ns' & df.uid == 'Plot1'
, then corresponding row value of fmc * 0.2
similarly for the same date
, if df.samples == 'ns' and df.uid == 'Plot2'
, then corresponding row value of fmc*0.05
I am new to python, so I apologize if I couldn’t explain well and please let me know if you need more clarification.
time samples uid fmc 0 2015-10-11 C Plot1 98.226352 1 2015-10-11 C Plot2 132.984817 2 2015-10-11 E Plot1 114.147964 3 2015-10-11 E Plot2 110.083699 4 2015-10-11 ns Plot1 113.258977 5 2015-10-11 ns Plot2 113.768023 6 2015-10-19 C Plot1 118.503214 7 2015-10-19 E Plot1 108.733209 8 2015-10-19 ns Plot1 59.316977 9 2015-10-27 C Plot1 104.977531 10 2015-10-27 C Plot2 121.213887 11 2015-10-27 E Plot1 129.575670 12 2015-10-27 E Plot2 118.639048 13 2015-10-27 ns Plot1 103.581065 14 2015-10-27 ns Plot2 102.278469 15 2015-11-17 C Plot1 103.820689 16 2015-11-17 C Plot2 117.333382 17 2015-11-17 E Plot1 143.418932 18 2015-11-17 E Plot2 160.342155 19 2015-11-17 ns Plot1 89.890484
Advertisement
Answer
This code:
import pandas as pd data = [ ['2015-10-11', 'C', 'Plot1', 98.226352 ], ['2015-10-11', 'C', 'Plot2', 132.984817 ], ['2015-10-11', 'E', 'Plot1', 114.147964 ], ['2015-10-11', 'E', 'Plot2', 110.083699 ], ['2015-10-11', 'ns', 'Plot1', 113.258977 ], ['2015-10-11', 'ns', 'Plot2', 113.768023 ], ['2015-10-19', 'C', 'Plot1', 118.503214 ], ['2015-10-19', 'E', 'Plot1', 108.733209 ], ['2015-10-19', 'ns', 'Plot1', 59.316977 ], ['2015-10-27', 'C', 'Plot1', 104.977531 ], ['2015-10-27', 'C', 'Plot2', 121.213887 ], ['2015-10-27', 'E', 'Plot1', 129.575670 ], ['2015-10-27', 'E', 'Plot2', 118.639048 ], ['2015-10-27', 'ns', 'Plot1', 103.581065 ], ['2015-10-27', 'ns', 'Plot2', 102.278469 ], ['2015-11-17', 'C', 'Plot1', 103.820689 ], ['2015-11-17', 'C', 'Plot2', 117.333382 ], ['2015-11-17', 'E', 'Plot1', 143.418932 ], ['2015-11-17', 'E', 'Plot2', 160.342155 ], ['2015-11-17', 'ns', 'Plot1', 89.890484] ] df = pd.DataFrame(columns=['time', 'samples', 'uid', 'fmc'], data=data) print (df.head(10)) df['result'] = df.apply( lambda item: (item.fmc * 0.4) if item.samples == 'C' and item.uid == 'Plot1' else (item.fmc * 0.8) if item.samples == 'C' and item.uid == 'Plot2' else (item.fmc * 0.4) if item.samples == 'E' and item.uid == 'Plot1' else (item.fmc * 0.15)if item.samples == 'E' and item.uid == 'Plot2' else (item.fmc * 0.2) if item.samples == 'ns'and item.uid == 'Plot1' else (item.fmc * 0.05)if item.samples == 'ns'and item.uid == 'Plot2' else None, axis=1 ) print(df.head(10))
Should produce this output:
time samples uid fmc 0 2015-10-11 C Plot1 98.226352 1 2015-10-11 C Plot2 132.984817 2 2015-10-11 E Plot1 114.147964 3 2015-10-11 E Plot2 110.083699 4 2015-10-11 ns Plot1 113.258977 5 2015-10-11 ns Plot2 113.768023 6 2015-10-19 C Plot1 118.503214 7 2015-10-19 E Plot1 108.733209 8 2015-10-19 ns Plot1 59.316977 9 2015-10-27 C Plot1 104.977531 time samples uid fmc result 0 2015-10-11 C Plot1 98.226352 39.290541 1 2015-10-11 C Plot2 132.984817 106.387854 2 2015-10-11 E Plot1 114.147964 45.659186 3 2015-10-11 E Plot2 110.083699 16.512555 4 2015-10-11 ns Plot1 113.258977 22.651795 5 2015-10-11 ns Plot2 113.768023 5.688401 6 2015-10-19 C Plot1 118.503214 47.401286 7 2015-10-19 E Plot1 108.733209 43.493284 8 2015-10-19 ns Plot1 59.316977 11.863395 9 2015-10-27 C Plot1 104.977531 41.991012 Process finished with exit code 0
Inspired by df.apply
, using axis=1
, and passing a lambda function containing full set of the conditions, you will have the expected values in result
column.
The apply
function will pass the dataframe’s columns (because axis=1
), to the lambda function as item
for each record in the series of values. The lambda function also, returns the corresponding result
value for each given record/item in the series, so we don’t need to worry about matching date/index values.
Reference for pandas.DataFrame.apply
here.