If you do for example mathematical operations with columns of a python pandas dataframe (call it data), you repeatedly have to write data do access the columns, which is very annoying, if you want nice to read mathematical formulas. So I am looking for a way to “factor out” the data keyword. Consider this simple example:
import pandas as pd
from numpy import *
k = 3
data = pd.read_csv('data.dat',sep=',')
data['a4'] = data.a1 + data.a2
data['a5'] = sqrt(data.a3)*k
## Imagine much more complex mathematical operations
## instead of this I want something like this pseudocode:
## cd data
## a4 = a1 + a2
## a5 = sqrt(a3)*k
## end cd data
Where data.dat is
a1,a2,a3 1,2,3 4,5,6 7,8,9
Advertisement
Answer
You can use pandas.DataFrame.eval:
>>> df
a1 a2 a3
0 1 2 3
1 4 5 6
2 7 8 9
>>> k = 3
>>> df = df.eval('a4 = a1 + a2')
>>> df = df.eval('a5 = a3**2 * @k')
>>> df
a1 a2 a3 a4 a5
0 1 2 3 3 27
1 4 5 6 9 108
2 7 8 9 15 243
If you want to put all on same line, you can do so:
>>> df
a1 a2 a3
0 1 2 3
1 4 5 6
2 7 8 9
>>> k = 3
>>> df.eval('''
a4 = a1 + a2
a5 = a3**2 * @k
''')
a1 a2 a3 a4 a5
0 1 2 3 3 27
1 4 5 6 9 108
2 7 8 9 15 243
# Alternatively you can also store the expr in a string and then pass the string:
>>> expr = '''
a4 = a1 + a2
a5 = a3**2 * @k
'''
>>> df.eval(expr)
a1 a2 a3 a4 a5
0 1 2 3 3 27
1 4 5 6 9 108
2 7 8 9 15 243