I need to generate a pd.DataFrame with columns being composed by a list and a Multiindex object, and I need to do it before filling the final dataframe with data.
Say the columns are ['one', 'two'] and the multiindex obtained from from_product:
import pandas as pd col_21 = ['day', 'month'] col_22 = ['a', 'b'] mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ])
I would like to get a list of columns which looks like this:
'one' | 'two' | ('day','a') | ('day','b') | ('month','a') | ('month','b')
One possible solution would be to use two different and separate Multiindex, one with a dummy column, both generate by from_product
col_11 = ['one', 'two'] col_12 = [''] col_21 = ['day', 'month'] col_22 = ['a', 'b'] mult_1 = pd.MultiIndex.from_product([ col_11, col_12 ]) mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ])
How could I get to this?
(one, '') | (two, '') | ('day','a') | ('day','b') | ('month','a') | ('month','b')
I have tried several trivial solutions, but each gave me a different error or a wrong result
mult_1+mult_2 #TypeError: cannot perform __add__ with this index type: MultiIndex
pd.merge #TypeError: Can only merge Series or DataFrame objects, a <class 'list'> was passed
pd.MultiIndex.from_arrays([ mult_1, mult_2 ]) #NotImplementedError: isna is not defined for MultiIndex
Thank you for your advices
Advertisement
Answer
If possible, the best is create MultiIndex in index by columns one, two and then MultiIndex in columns by pairs – so not mixed non multiindex with multindex values:
col_21 = ['day', 'month']
col_22 = ['a', 'b']
mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ])
one = range(5)
two = list('ABCDE')
mult_3 = pd.MultiIndex.from_arrays([ one, two], names=['one','two'])
df = pd.DataFrame(0, columns=mult_2, index=mult_3)
print (df)
day month
a b a b
one two
0 A 0 0 0 0
1 B 0 0 0 0
2 C 0 0 0 0
3 D 0 0 0 0
4 E 0 0 0 0
Use Index.append:
print (mult_1.append(mult_2))
MultiIndex([( 'one', ''),
( 'two', ''),
( 'day', 'a'),
( 'day', 'b'),
('month', 'a'),
('month', 'b')],
)
Or Index.union with sort=False:
print (mult_1.union(mult_2, sort=False))
MultiIndex([( 'one', ''),
( 'two', ''),
( 'day', 'a'),
( 'day', 'b'),
('month', 'a'),
('month', 'b')],
)