Sometimes I end up with a series of tuples/lists when using Pandas. This is common when, for example, doing a group-by and passing a function that has multiple return values:
JavaScript
x
14
14
1
import numpy as np
2
from scipy import stats
3
df = pd.DataFrame(dict(x=np.random.randn(100),
4
y=np.repeat(list("abcd"), 25)))
5
out = df.groupby("y").x.apply(stats.ttest_1samp, 0)
6
print out
7
8
y
9
a (1.3066417476, 0.203717485506)
10
b (0.0801133382517, 0.936811414675)
11
c (1.55784329113, 0.132360504653)
12
d (0.267999459642, 0.790989680709)
13
dtype: object
14
What is the correct way to “unpack” this structure so that I get a DataFrame with two columns?
A related question is how I can unpack either this structure or the resulting dataframe into two Series/array objects. This almost works:
JavaScript
1
2
1
t, p = zip(*out)
2
but it t
is
JavaScript
1
5
1
(array(1.3066417475999257),
2
array(0.08011333825171714),
3
array(1.557843291126335),
4
array(0.267999459641651))
5
and one needs to take the extra step of squeezing it.
Advertisement
Answer
maybe this is most strightforward (most pythonic i guess):
JavaScript
1
2
1
out.apply(pd.Series)
2
if you would want to rename the columns to something more meaningful, than:
JavaScript
1
2
1
out.columns=['Kstats','Pvalue']
2
if you do not want the default name for the index:
JavaScript
1
2
1
out.index.name=None
2