I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.
For example :
# Existing Dataframes df1 = df1_colA df1_colB df1_colC 0 1 2 3 1 4 5 6 2 7 8 9 df2 = df2_colA df2_colB df3_colC 0 10 11 12 1 13 14 15 2 16 17 18 df3 = df3_colA df3_colB df3_colC 0 30 31 32 1 33 34 35 2 36 37 38
I would like to get a dataframe like this :
names = df_names col_names 0 df1 df1_colA 1 df1 df1_colB 2 df1 df1_colC 3 df2 df2_colA 4 df2 df2_colB 5 df2 df2_colC 6 df3 df3_colA 7 df3 df3_colB 8 df3 df3_colC
Help would be very appreciated and thank you in advance!
Advertisement
Answer
If possible extract DataFrame names
fom columns names
use list comprehension with concat
and last for new column in first position use DataFrame.insert
with Series.str.extract
Series.str.extractall
for values from columnsnames before _
:
dfs = [df1, df2, df3] df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True) df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_')) print (df) df_names col_names 0 df1 df1_colA 1 df1 df1_colB 2 df1 df1_colC 3 df2 df2_colA 4 df2 df2_colB 5 df3 df3_colC 6 df3 df3_colA 7 df3 df3_colB 8 df3 df3_colC
Similar ide with flatten list comprehension:
dfs = [df1, df2, df3] df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]}) df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_')) print (df) df_names col_names 0 df1 df1_colA 1 df1 df1_colB 2 df1 df1_colC 3 df2 df2_colA 4 df2 df2_colB 5 df3 df3_colC 6 df3 df3_colA 7 df3 df3_colB 8 df3 df3_colC
Alternative is create dictionary of DataFrames and in dict comprehension use concat
, keys of dict create first level of MultiIndex
, so not necessary parse columns names:
dfs = {'df1':df1, 'df2':df2, 'df3':df3} df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()}) .droplevel(1) .rename_axis('df_names') .reset_index()) print (df) df_names col_names 0 df1 df1_colA 1 df1 df1_colB 2 df1 df1_colC 3 df2 df2_colA 4 df2 df2_colB 5 df2 df3_colC 6 df3 df3_colA 7 df3 df3_colB 8 df3 df3_colC