I have multiple dataframes and would like a dataframe that contains all column names from said multiple dataframes.
For example :
JavaScript
x
19
19
1
# Existing Dataframes
2
df1 =
3
df1_colA df1_colB df1_colC
4
0 1 2 3
5
1 4 5 6
6
2 7 8 9
7
8
df2 =
9
df2_colA df2_colB df3_colC
10
0 10 11 12
11
1 13 14 15
12
2 16 17 18
13
14
df3 =
15
df3_colA df3_colB df3_colC
16
0 30 31 32
17
1 33 34 35
18
2 36 37 38
19
I would like to get a dataframe like this :
JavaScript
1
12
12
1
names =
2
df_names col_names
3
0 df1 df1_colA
4
1 df1 df1_colB
5
2 df1 df1_colC
6
3 df2 df2_colA
7
4 df2 df2_colB
8
5 df2 df2_colC
9
6 df3 df3_colA
10
7 df3 df3_colB
11
8 df3 df3_colC
12
Help would be very appreciated and thank you in advance!
Advertisement
Answer
If possible extract DataFrame names
fom columns names
use list comprehension with concat
and last for new column in first position use DataFrame.insert
with Series.str.extract
Series.str.extractall
for values from columnsnames before _
:
JavaScript
1
15
15
1
dfs = [df1, df2, df3]
2
df = pd.concat([df.columns.to_frame(name='col_names') for df in dfs], ignore_index=True)
3
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
4
print (df)
5
df_names col_names
6
0 df1 df1_colA
7
1 df1 df1_colB
8
2 df1 df1_colC
9
3 df2 df2_colA
10
4 df2 df2_colB
11
5 df3 df3_colC
12
6 df3 df3_colA
13
7 df3 df3_colB
14
8 df3 df3_colC
15
Similar ide with flatten list comprehension:
JavaScript
1
15
15
1
dfs = [df1, df2, df3]
2
df = pd.DataFrame({'col_names': [x for df in dfs for x in df.columns]})
3
df.insert(0, 'df_names', df['col_names'].str.extract('^(.*)_'))
4
print (df)
5
df_names col_names
6
0 df1 df1_colA
7
1 df1 df1_colB
8
2 df1 df1_colC
9
3 df2 df2_colA
10
4 df2 df2_colB
11
5 df3 df3_colC
12
6 df3 df3_colA
13
7 df3 df3_colB
14
8 df3 df3_colC
15
Alternative is create dictionary of DataFrames and in dict comprehension use concat
, keys of dict create first level of MultiIndex
, so not necessary parse columns names:
JavaScript
1
18
18
1
dfs = {'df1':df1, 'df2':df2, 'df3':df3}
2
df = (pd.concat({k:v.columns.to_frame(name='col_names') for k, v in dfs.items()})
3
.droplevel(1)
4
.rename_axis('df_names')
5
.reset_index())
6
7
print (df)
8
df_names col_names
9
0 df1 df1_colA
10
1 df1 df1_colB
11
2 df1 df1_colC
12
3 df2 df2_colA
13
4 df2 df2_colB
14
5 df2 df3_colC
15
6 df3 df3_colA
16
7 df3 df3_colB
17
8 df3 df3_colC
18