Skip to content
Advertisement

Create a new column based on different columns

I have the below dataframe:

Col1 Col2 Col3 Col4 Col5  Col6  Col7  
1     A    T     1    AG   NBL   NH
2     A    T     1    NAG  BL    NH
3     A    M     2    NAG  NBL   HL 
4     NS   M     1    NAG  BL    NH
5     NS   T     1    NAG  NBL   HL 
6     A    M     2    NAG  NBL   HL

I want to create a new column based on this conditions:

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'A1'

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'A2'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'B1'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'B2'

IF Col7 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'H1'

IF Col6 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'H2'

I have many different conditions in the original df but I’m trying to create a way just to add this conditions. I’m working with phyton in jupyter

Advertisement

Answer

Since the logic here is pretty complicated, I would suggest putting your conditions inside a function, and using DataFrame.apply() to call that function for every row in your dataset. Here’s how I would translate your example into Pandas:

import pandas as pd

df = pd.read_csv("test.csv")


def classify(row):
    Col2 = row["Col2"]
    Col3 = row["Col3"]
    Col4 = row["Col4"]
    Col5 = row["Col5"]
    Col6 = row["Col6"]
    Col7 = row["Col7"]
    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'A1'

    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'A2'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'B1'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'B2'

    if Col7 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'H1'

    if Col6 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'H2'

    # No match
    return None


df["NewCol"] = df.apply(classify, axis=1)
print(df)

Note: I tried this function on your dataset, and I get the following result, which is probably not what you want:

   Col1 Col2 Col3  Col4 Col5 Col6 Col7 NewCol
0     1    A    T     1   AG  NBL   NH   None
1     2    A    T     1  NAG   BL   NH   None
2     3    A    M     2  NAG  NBL   HL   None
3     4   NS    M     1  NAG   BL   NH   None
4     5   NS    T     1  NAG  NBL   HL   None
5     6    A    M     2  NAG  NBL   HL   None

Inspecting the individual rows, they seem to be correct – none of them follow any of your rules. I’d suggest double checking that your rules are correct.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement