If you do for example mathematical operations with columns of a python pandas dataframe (call it data
), you repeatedly have to write data
do access the columns, which is very annoying, if you want nice to read mathematical formulas. So I am looking for a way to “factor out” the data
keyword. Consider this simple example:
import pandas as pd from numpy import * k = 3 data = pd.read_csv('data.dat',sep=',') data['a4'] = data.a1 + data.a2 data['a5'] = sqrt(data.a3)*k ## Imagine much more complex mathematical operations ## instead of this I want something like this pseudocode: ## cd data ## a4 = a1 + a2 ## a5 = sqrt(a3)*k ## end cd data
Where data.dat
is
a1,a2,a3 1,2,3 4,5,6 7,8,9
Advertisement
Answer
You can use pandas.DataFrame.eval
:
>>> df a1 a2 a3 0 1 2 3 1 4 5 6 2 7 8 9 >>> k = 3 >>> df = df.eval('a4 = a1 + a2') >>> df = df.eval('a5 = a3**2 * @k') >>> df a1 a2 a3 a4 a5 0 1 2 3 3 27 1 4 5 6 9 108 2 7 8 9 15 243
If you want to put all on same line, you can do so:
>>> df a1 a2 a3 0 1 2 3 1 4 5 6 2 7 8 9 >>> k = 3 >>> df.eval(''' a4 = a1 + a2 a5 = a3**2 * @k ''') a1 a2 a3 a4 a5 0 1 2 3 3 27 1 4 5 6 9 108 2 7 8 9 15 243 # Alternatively you can also store the expr in a string and then pass the string: >>> expr = ''' a4 = a1 + a2 a5 = a3**2 * @k ''' >>> df.eval(expr) a1 a2 a3 a4 a5 0 1 2 3 3 27 1 4 5 6 9 108 2 7 8 9 15 243