I need to generate a pd.DataFrame with columns being composed by a list and a Multiindex object, and I need to do it before filling the final dataframe with data.
Say the columns are ['one', 'two']
and the multiindex obtained from from_product
:
import pandas as pd col_21 = ['day', 'month'] col_22 = ['a', 'b'] mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ])
I would like to get a list of columns which looks like this:
'one' | 'two' | ('day','a') | ('day','b') | ('month','a') | ('month','b')
One possible solution would be to use two different and separate Multiindex, one with a dummy column, both generate by from_product
col_11 = ['one', 'two'] col_12 = [''] col_21 = ['day', 'month'] col_22 = ['a', 'b'] mult_1 = pd.MultiIndex.from_product([ col_11, col_12 ]) mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ])
How could I get to this?
(one, '') | (two, '') | ('day','a') | ('day','b') | ('month','a') | ('month','b')
I have tried several trivial solutions, but each gave me a different error or a wrong result
mult_1+mult_2 #TypeError: cannot perform __add__ with this index type: MultiIndex
pd.merge #TypeError: Can only merge Series or DataFrame objects, a <class 'list'> was passed
pd.MultiIndex.from_arrays([ mult_1, mult_2 ]) #NotImplementedError: isna is not defined for MultiIndex
Thank you for your advices
Advertisement
Answer
If possible, the best is create MultiIndex in index
by columns one
, two
and then MultiIndex in columns
by pairs – so not mixed non multiindex with multindex values:
col_21 = ['day', 'month'] col_22 = ['a', 'b'] mult_2 = pd.MultiIndex.from_product([ col_21, col_22 ]) one = range(5) two = list('ABCDE') mult_3 = pd.MultiIndex.from_arrays([ one, two], names=['one','two']) df = pd.DataFrame(0, columns=mult_2, index=mult_3) print (df) day month a b a b one two 0 A 0 0 0 0 1 B 0 0 0 0 2 C 0 0 0 0 3 D 0 0 0 0 4 E 0 0 0 0
Use Index.append
:
print (mult_1.append(mult_2)) MultiIndex([( 'one', ''), ( 'two', ''), ( 'day', 'a'), ( 'day', 'b'), ('month', 'a'), ('month', 'b')], )
Or Index.union
with sort=False
:
print (mult_1.union(mult_2, sort=False)) MultiIndex([( 'one', ''), ( 'two', ''), ( 'day', 'a'), ( 'day', 'b'), ('month', 'a'), ('month', 'b')], )