I have a table that has a bunch of columns and I need to create a new column based on the row type but the logic will be different for each type of row.
My data looks like this:
type | field1 | field2 | field3 | field4 |
---|---|---|---|---|
1 | a | b | c | 17 |
2 | e | f | g | 20 |
3 | i | j | k | 100 |
the logic for rows of type 1 is concatenating field1, field2, field3
the logic for rows of type 2 is concatenating field2, field3, field4
the logic for rows of type 3 is squaring field4
The super important part I would like to avoid coding each type manually as there are hundreds of different types each with their own distinct logic that will change constantly. We enforce strict SDLC so deploying updates would be a nightmare. Ideally I would put this logic into a SQL table somewhere and then just somehow use the data in my pandas logic but I don’t know how to do that sort of thing.
Example:
data = pd.read_sql(query) #above data rules = pd.read_sql(query) rules.head()
Type | Rule |
---|---|
1 | field1+field2+field3 |
2 | field2+field3+field4 |
3 | field4**2 |
for i in rules: data['output'] = data[filtered to i.type].apply(i.typeLogic) data.head()
output |
---|
abc |
fg20 |
10000 |
Advertisement
Answer
Merge the rules to the data and use eval
method to evaluate rules according to type
# data df = pd.DataFrame({'type': [1, 2, 3], 'field1': ['a', 'e', 'i'], 'field2': ['b', 'f', 'j'], 'field3': ['c', 'g', 'k'], 'field4': [17, 20, 100]}) # rules df rules = pd.DataFrame({'type': [1, 2, 3], 'rule': ['field1+field2+field3', 'field2+field3+field4', 'field4**2']}) # merge the dfs to be able to do a rules lookup later df = df.merge(rules, on='type') # create a list in a loop lst = [] for _, d in df.groupby("type"): # get the field columns f_cols = [c for c in d.columns if 'field' in c] # get the rule r = d.rule.iat[0] # rules with + concatenates strings and ints, so convert such rows to string dtype if '+' in r: d[f_cols] = d[f_cols].astype(str) # evaluate the rule d['new'] = d[f_cols].eval(f"{r}", engine='python') # append to lst lst.append(d) # concatenate all dfs in lst into a single df res = pd.concat(lst) res
Let me know if you have any questions.