Skip to content
Advertisement

How to return dataframe containing column names of multiple dataframe

I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.

For example :

# Existing Dataframes
df1 =
    df1_colA  df1_colB  df1_colC
0   1         2         3
1   4         5         6
2   7         8         9

df2 =
    df2_colA  df2_colB  df3_colC
0   10        11        12
1   13        14        15
2   16        17        18

df3 =
    df3_colA  df3_colB  df3_colC
0   30        31        32
1   33        34        35
2   36        37        38

I would like to get a dataframe like this :

names =
     df_names   col_names
0    df1        df1_colA
1    df1        df1_colB
2    df1        df1_colC
3    df2        df2_colA
4    df2        df2_colB
5    df2        df2_colC
6    df3        df3_colA
7    df3        df3_colB
8    df3        df3_colC

Help would be very appreciated and thank you in advance!

Advertisement

Answer

If possible extract DataFrame names fom columns names use list comprehension with concat and last for new column in first position use DataFrame.insert with Series.str.extractSeries.str.extractall for values from columnsnames before _:

dfs = [df1, df2, df3]
df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True)
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Similar ide with flatten list comprehension:

dfs = [df1, df2, df3]
df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]})
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df3  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC

Alternative is create dictionary of DataFrames and in dict comprehension use concat, keys of dict create first level of MultiIndex, so not necessary parse columns names:

dfs = {'df1':df1, 'df2':df2, 'df3':df3}
df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()})
        .droplevel(1)
        .rename_axis('df_names')
        .reset_index())

print (df)
  df_names col_names
0      df1  df1_colA
1      df1  df1_colB
2      df1  df1_colC
3      df2  df2_colA
4      df2  df2_colB
5      df2  df3_colC
6      df3  df3_colA
7      df3  df3_colB
8      df3  df3_colC
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement