I have a df
which as following
import pandas as pd df = pd.DataFrame( {'number_C1_E1': ['1', '2', None, None, '5', '6', '7', '8'], 'fruit_C11_E1': ['apple', 'banana', None, None, 'watermelon', 'peach', 'orange', 'lemon'], 'name_C111_E1': ['tom', 'jerry', None, None, 'paul', 'edward', 'reggie', 'nicholas'], 'number_C2_E2': [None, None, '3', None, None, None, None, None], 'fruit_C22_E2': [None, None, 'blueberry', None, None, None, None, None], 'name_C222_E2': [None, None, 'anthony', None, None, None, None, None], 'number_C3_E1': [None, None, '3', '4', None, None, None, None], 'fruit_C33_E1': [None, None, 'blueberry', 'strawberry', None, None, None, None], 'name_C333_E1': [None, None, 'anthony', 'terry', None, None, None, None], } )
Here what I want to do is combine those columns and we have two rules:
- If a column removes
_C{0~9}
or_C{0~9}{0~9}
or_C{0~9}{0~9}{0~9}
is equal to another column, these two columns can be combined.
Let’s take
number_C1_E1
number_C2_E2
number_C3_E1
as an example, herenumber_C1_E1
andnumber_C3_E1
can be combined because they are bothnumber_E1
afterremoving _C{0~9}
.
- The two combined columns should get rid of the
None
values.
The desired result is
number_C1_1_E1 fruit_C11_1_E1 name_C111_1_E1 number_C2_1_E2 fruit_C22_1_E2 name_C222_1_E2 0 1 apple tom None None None 1 2 banana jerry None None None 2 3 blueberry anthony 3 blueberry anthony 3 4 strawberry terry None None None 4 5 watermelon paul None None None 5 6 peach edward None None None 6 7 orange reggie None None None 7 8 lemon nicholas None None None
Anyone has a good solution?
Advertisement
Answer
Use the same as your previous question, but also compute a renamer for your columns:
group = df.columns.str.replace(r'_Cd+', '', regex=True) names = df.columns.to_series().groupby(group).first() out = (df.groupby(group, axis=1, sort=False).first() .rename(columns=names) )
Alternative:
group = df.columns.str.replace(r'_Cd+', '', regex=True) out = (df.groupby(group, axis=1, sort=False).first() .set_axis(df.columns[~group.duplicated()], axis=1) )
Output:
number_C1_E1 fruit_C11_E1 name_C111_E1 number_C2_E2 fruit_C22_E2 name_C222_E2 0 1 apple tom None None None 1 2 banana jerry None None None 2 3 blueberry anthony 3 blueberry anthony 3 4 strawberry terry None None None 4 5 watermelon paul None None None 5 6 peach edward None None None 6 7 orange reggie None None None 7 8 lemon nicholas None None None