Skip to content
Advertisement

Changing values of one column based on the other three columns in pandas dataframe

I have a following Pandas dataframe, where I want to change a value of ‘fmc’ column based on ‘time’, ‘samples’ and ‘uid’ columns.

Concept is as following:

For the same date, if df.samples == 'C' & df.uid == 'Plot1', then corresponding row value of fmc * 0.4

similarly for the same date, if df.samples == 'C' and df.uid == 'Plot2', then corresponding row value of fmc*0.8

For the same date, if df.samples == 'E' & df.uid == 'Plot1', then corresponding row value of fmc * 0.4

similarly for the same date, if df.samples == 'E' and df.uid == 'Plot2', then corresponding row value of fmc*0.15

For the same date, if df.samples == 'ns' & df.uid == 'Plot1', then corresponding row value of fmc * 0.2

similarly for the same date, if df.samples == 'ns' and df.uid == 'Plot2', then corresponding row value of fmc*0.05

I am new to python, so I apologize if I couldn’t explain well and please let me know if you need more clarification.

       time        samples    uid                 fmc
0  2015-10-11        C       Plot1              98.226352
1  2015-10-11        C       Plot2             132.984817
2  2015-10-11        E       Plot1             114.147964
3  2015-10-11        E       Plot2             110.083699
4  2015-10-11        ns      Plot1             113.258977
5  2015-10-11        ns      Plot2             113.768023
6  2015-10-19        C       Plot1             118.503214
7  2015-10-19        E       Plot1             108.733209
8  2015-10-19        ns      Plot1              59.316977
9  2015-10-27        C       Plot1             104.977531
10 2015-10-27        C       Plot2             121.213887
11 2015-10-27        E       Plot1             129.575670
12 2015-10-27        E       Plot2             118.639048
13 2015-10-27        ns      Plot1             103.581065
14 2015-10-27        ns      Plot2             102.278469
15 2015-11-17        C       Plot1             103.820689
16 2015-11-17        C       Plot2             117.333382
17 2015-11-17        E       Plot1             143.418932
18 2015-11-17        E       Plot2             160.342155
19 2015-11-17        ns      Plot1              89.890484

Advertisement

Answer

This code:

import pandas as pd
data = [
    ['2015-10-11', 'C', 'Plot1',  98.226352 ],
    ['2015-10-11', 'C', 'Plot2', 132.984817 ],
    ['2015-10-11', 'E', 'Plot1', 114.147964 ],
    ['2015-10-11', 'E', 'Plot2', 110.083699 ],
    ['2015-10-11', 'ns', 'Plot1', 113.258977 ],
    ['2015-10-11', 'ns', 'Plot2', 113.768023 ],
    ['2015-10-19', 'C', 'Plot1', 118.503214 ],
    ['2015-10-19', 'E', 'Plot1', 108.733209 ],
    ['2015-10-19', 'ns', 'Plot1',  59.316977 ],
    ['2015-10-27', 'C', 'Plot1', 104.977531 ],
    ['2015-10-27', 'C', 'Plot2', 121.213887 ],
    ['2015-10-27', 'E', 'Plot1', 129.575670 ],
    ['2015-10-27', 'E', 'Plot2', 118.639048 ],
    ['2015-10-27', 'ns', 'Plot1', 103.581065 ],
    ['2015-10-27', 'ns', 'Plot2', 102.278469 ],
    ['2015-11-17', 'C', 'Plot1', 103.820689 ],
    ['2015-11-17', 'C', 'Plot2', 117.333382 ],
    ['2015-11-17', 'E', 'Plot1', 143.418932 ],
    ['2015-11-17', 'E', 'Plot2', 160.342155 ],
    ['2015-11-17', 'ns', 'Plot1',  89.890484]
]

df = pd.DataFrame(columns=['time', 'samples', 'uid', 'fmc'], data=data)

print (df.head(10))

df['result'] = df.apply(
                lambda item:
                   (item.fmc * 0.4) if item.samples == 'C' and item.uid == 'Plot1' else 
                   (item.fmc * 0.8) if item.samples == 'C' and item.uid == 'Plot2' else 
                   (item.fmc * 0.4) if item.samples == 'E' and item.uid == 'Plot1' else 
                   (item.fmc * 0.15)if item.samples == 'E' and item.uid == 'Plot2' else 
                   (item.fmc * 0.2) if item.samples == 'ns'and item.uid == 'Plot1' else 
                   (item.fmc * 0.05)if item.samples == 'ns'and item.uid == 'Plot2' else None,
                axis=1
            )

print(df.head(10))

Should produce this output:

         time samples    uid         fmc
0  2015-10-11       C  Plot1   98.226352
1  2015-10-11       C  Plot2  132.984817
2  2015-10-11       E  Plot1  114.147964
3  2015-10-11       E  Plot2  110.083699
4  2015-10-11      ns  Plot1  113.258977
5  2015-10-11      ns  Plot2  113.768023
6  2015-10-19       C  Plot1  118.503214
7  2015-10-19       E  Plot1  108.733209
8  2015-10-19      ns  Plot1   59.316977
9  2015-10-27       C  Plot1  104.977531
         time samples    uid         fmc      result
0  2015-10-11       C  Plot1   98.226352   39.290541
1  2015-10-11       C  Plot2  132.984817  106.387854
2  2015-10-11       E  Plot1  114.147964   45.659186
3  2015-10-11       E  Plot2  110.083699   16.512555
4  2015-10-11      ns  Plot1  113.258977   22.651795
5  2015-10-11      ns  Plot2  113.768023    5.688401
6  2015-10-19       C  Plot1  118.503214   47.401286
7  2015-10-19       E  Plot1  108.733209   43.493284
8  2015-10-19      ns  Plot1   59.316977   11.863395
9  2015-10-27       C  Plot1  104.977531   41.991012

Process finished with exit code 0

Inspired by df.apply, using axis=1, and passing a lambda function containing full set of the conditions, you will have the expected values in result column.

The apply function will pass the dataframe’s columns (because axis=1), to the lambda function as item for each record in the series of values. The lambda function also, returns the corresponding result value for each given record/item in the series, so we don’t need to worry about matching date/index values.

Reference for pandas.DataFrame.apply here.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement