How to merge multiple columns of a dataframe using regex？

Question

I have a df which as following Here what I want to do is combine those columns and we have two rules: If a column removes _C{0~9} or _C{0~9}{0~9} or _C{0~9}{0~9}{0~9} is equal to another column, these two columns can be combined. Let&#8217;s take number_C1_E1 number_C2_E2 number_C3_E1 as an example, here numb…

Accepted Answer

Use the same as your previous question, but also compute a renamer for your columns:group = df.columns.str.replace(r'_Cd+', '', regex=True)names = df.columns.to_series().groupby(group).first()out = (df.groupby(group, axis=1, sort=False).first()         .rename(columns=names)       )Alternative:group = df.columns.str.replace(r'_Cd+', '', regex=True)out = (df.groupby(group, axis=1, sort=False).first()         .set_axis(df.columns[~group.duplicated()], axis=1)       )Output:  number_C1_E1 fruit_C11_E1 name_C111_E1 number_C2_E2 fruit_C22_E2 name_C222_E20            1        apple          tom         None         None         None1            2       banana        jerry         None         None         None2            3    blueberry      anthony            3    blueberry      anthony3            4   strawberry        terry         None         None         None4            5   watermelon         paul         None         None         None5            6        peach       edward         None         None         None6            7       orange       reggie         None         None         None7            8        lemon     nicholas         None         None         None

Advertisement

Answer