Factor out the name of the dataframe in python pandas to get better to read mathematical expressions

Tags: , ,



If you do for example mathematical operations with columns of a python pandas dataframe (call it data), you repeatedly have to write data do access the columns, which is very annoying, if you want nice to read mathematical formulas. So I am looking for a way to “factor out” the data keyword. Consider this simple example:

import pandas as pd
from numpy import *

k = 3
data = pd.read_csv('data.dat',sep=',')

data['a4'] = data.a1 + data.a2
data['a5'] = sqrt(data.a3)*k

## Imagine much more complex mathematical operations


## instead of this I want something like this pseudocode:

## cd data
## a4 = a1 + a2
## a5 = sqrt(a3)*k
## end cd data

Where data.dat is

a1,a2,a3
1,2,3
4,5,6
7,8,9

Answer

You can use pandas.DataFrame.eval:

>>> df
   a1  a2  a3
0   1   2   3
1   4   5   6
2   7   8   9

>>> k = 3

>>> df = df.eval('a4 = a1 + a2')

>>> df = df.eval('a5 = a3**2 * @k')

>>> df

   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243

If you want to put all on same line, you can do so:

>>> df
   a1  a2  a3
0   1   2   3
1   4   5   6
2   7   8   9

>>> k = 3

>>> df.eval('''
     a4 = a1 + a2
     a5 = a3**2 * @k
   ''')
   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243

# Alternatively you can also store the expr in a string and then pass the string:
>>> expr = '''
     a4 = a1 + a2
     a5 = a3**2 * @k
   '''
>>> df.eval(expr)
   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243


Source: stackoverflow