Skip to content
Advertisement

Extract data from pandas dataframe columns with duplicate names

I have a dataframe which has duplicate column names:

Accepted    Accepted    Accepted    Reject    Accepted    Reject
ABC         IJK         JKL         XYJ       LMN         UIO
BCD         PQR         EFG         YVG       GHIJ        PLK

…and want to convert it into two dataframes; one only of “Accepted” columns and other for “Reject” Columns:

df1:

Accepted    Accepted    Accepted    Accepted
ABC         IJK         JKL         LMN     
BCD         PQR         EFG         GHIJ    

df2:

Reject    Reject
XYJ       UIO
YVG       PLK 

Tried:

df1=df["Accepted"]
df2=df["Reject"]

… but this only gives the first column matching this name.

Advertisement

Answer

If select one column with same name are selected all columns with same name in DataFrame:

df1 = df['Accepted']
df2 = df['Reject']

Then is possible deduplicate columns:

df1.columns = [f'{x}_{i}' for i, x in enumerate(df1.columns, 1)]
df2.columns = [f'{x}_{i}' for i, x in enumerate(df2.columns, 1)]

EDIT: If get only first column name it means there are not duplicated columns names, so is possible use DataFrame.filter:

df1 = df.filter(like='Accepted')
df2 = df.filter(like='Reject')
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement