I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:
x y z ts 2014-05-15 10:38 0.120117 0.987305 0.116211 2014-05-15 10:39 0.117188 0.984375 0.122070 2014-05-15 10:40 0.119141 0.987305 0.119141 2014-05-15 10:41 0.116211 0.984375 0.120117 2014-05-15 10:42 0.119141 0.983398 0.118164
I would like to apply a transformation to each row that also returns a vector
def myfunc(a, b, c): do something return e, f, g
but if I do:
df.apply(myfunc, axis=1)
I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?
Edit:
All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.
def myfunc1(args): e=args[0] + 2*args[1] f=args[1]*args[2] +1 g=args[2] + args[0] * args[1] return pd.Series([e,f,g], index=['a', 'b', 'c']) def myfunc2(args): e=args[0] + 2*args[1] f=args[1]*args[2] +1 g=args[2] + args[0] * args[1] return [e,f,g] %timeit df.apply(myfunc1 ,axis=1) 100 loops, best of 3: 4.51 ms per loop %timeit df.apply(myfunc2 ,axis=1) 100 loops, best of 3: 2.75 ms per loop
Advertisement
Answer
Just return a list instead of tuple.
In [81]: df Out[81]: x y z ts 2014-05-15 10:38:00 0.120117 0.987305 0.116211 2014-05-15 10:39:00 0.117188 0.984375 0.122070 2014-05-15 10:40:00 0.119141 0.987305 0.119141 2014-05-15 10:41:00 0.116211 0.984375 0.120117 2014-05-15 10:42:00 0.119141 0.983398 0.118164 [5 rows x 3 columns] In [82]: def myfunc(args): ....: e=args[0] + 2*args[1] ....: f=args[1]*args[2] +1 ....: g=args[2] + args[0] * args[1] ....: return [e,f,g] ....: In [83]: df.apply(myfunc ,axis=1) Out[83]: x y z ts 2014-05-15 10:38:00 2.094727 1.114736 0.234803 2014-05-15 10:39:00 2.085938 1.120163 0.237427 2014-05-15 10:40:00 2.093751 1.117629 0.236770 2014-05-15 10:41:00 2.084961 1.118240 0.234512 2014-05-15 10:42:00 2.085937 1.116202 0.235327