Skip to content
Advertisement

How to do line level logic in Pandas

I have a table that has a bunch of columns and I need to create a new column based on the row type but the logic will be different for each type of row.

My data looks like this:

type field1 field2 field3 field4
1 a b c 17
2 e f g 20
3 i j k 100

the logic for rows of type 1 is concatenating field1, field2, field3

the logic for rows of type 2 is concatenating field2, field3, field4

the logic for rows of type 3 is squaring field4

The super important part I would like to avoid coding each type manually as there are hundreds of different types each with their own distinct logic that will change constantly. We enforce strict SDLC so deploying updates would be a nightmare. Ideally I would put this logic into a SQL table somewhere and then just somehow use the data in my pandas logic but I don’t know how to do that sort of thing.

Example:

data = pd.read_sql(query) #above data
rules = pd.read_sql(query)
rules.head()
Type Rule
1 field1+field2+field3
2 field2+field3+field4
3 field4**2
for i in rules:
    data['output'] = data[filtered to i.type].apply(i.typeLogic)

data.head()
output
abc
fg20
10000

Advertisement

Answer

Merge the rules to the data and use eval method to evaluate rules according to type

# data
df = pd.DataFrame({'type': [1, 2, 3],
                   'field1': ['a', 'e', 'i'],
                   'field2': ['b', 'f', 'j'],
                   'field3': ['c', 'g', 'k'],
                   'field4': [17, 20, 100]})
# rules df
rules = pd.DataFrame({'type': [1, 2, 3],
                      'rule': ['field1+field2+field3', 'field2+field3+field4', 'field4**2']})

# merge the dfs to be able to do a rules lookup later
df = df.merge(rules, on='type')

# create a list in a loop
lst = []
for _, d in df.groupby("type"):
    # get the field columns
    f_cols = [c for c in d.columns if 'field' in c]
    # get the rule 
    r = d.rule.iat[0]
    # rules with + concatenates strings and ints, so convert such rows to string dtype
    if '+' in r:
        d[f_cols] = d[f_cols].astype(str)
    # evaluate the rule
    d['new'] = d[f_cols].eval(f"{r}", engine='python')
    # append to lst
    lst.append(d)
# concatenate all dfs in lst into a single df
res = pd.concat(lst)
res

enter image description here

Let me know if you have any questions.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement