Skip to content
Advertisement

Factor out the name of the dataframe in python pandas to get better to read mathematical expressions

If you do for example mathematical operations with columns of a python pandas dataframe (call it data), you repeatedly have to write data do access the columns, which is very annoying, if you want nice to read mathematical formulas. So I am looking for a way to “factor out” the data keyword. Consider this simple example:

import pandas as pd
from numpy import *

k = 3
data = pd.read_csv('data.dat',sep=',')

data['a4'] = data.a1 + data.a2
data['a5'] = sqrt(data.a3)*k

## Imagine much more complex mathematical operations


## instead of this I want something like this pseudocode:

## cd data
## a4 = a1 + a2
## a5 = sqrt(a3)*k
## end cd data

Where data.dat is

a1,a2,a3
1,2,3
4,5,6
7,8,9

Advertisement

Answer

You can use pandas.DataFrame.eval:

>>> df
   a1  a2  a3
0   1   2   3
1   4   5   6
2   7   8   9

>>> k = 3

>>> df = df.eval('a4 = a1 + a2')

>>> df = df.eval('a5 = a3**2 * @k')

>>> df

   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243

If you want to put all on same line, you can do so:

>>> df
   a1  a2  a3
0   1   2   3
1   4   5   6
2   7   8   9

>>> k = 3

>>> df.eval('''
     a4 = a1 + a2
     a5 = a3**2 * @k
   ''')
   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243

# Alternatively you can also store the expr in a string and then pass the string:
>>> expr = '''
     a4 = a1 + a2
     a5 = a3**2 * @k
   '''
>>> df.eval(expr)
   a1  a2  a3  a4   a5
0   1   2   3   3   27
1   4   5   6   9  108
2   7   8   9  15  243
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement