I have a df which as following
import pandas as pd
df = pd.DataFrame(
{'number_C1_E1': ['1', '2', None, None, '5', '6', '7', '8'],
'fruit_C11_E1': ['apple', 'banana', None, None, 'watermelon', 'peach', 'orange', 'lemon'],
'name_C111_E1': ['tom', 'jerry', None, None, 'paul', 'edward', 'reggie', 'nicholas'],
'number_C2_E2': [None, None, '3', None, None, None, None, None],
'fruit_C22_E2': [None, None, 'blueberry', None, None, None, None, None],
'name_C222_E2': [None, None, 'anthony', None, None, None, None, None],
'number_C3_E1': [None, None, '3', '4', None, None, None, None],
'fruit_C33_E1': [None, None, 'blueberry', 'strawberry', None, None, None, None],
'name_C333_E1': [None, None, 'anthony', 'terry', None, None, None, None],
}
)
Here what I want to do is combine those columns and we have two rules:
- If a column removes
_C{0~9}or_C{0~9}{0~9}or_C{0~9}{0~9}{0~9}is equal to another column, these two columns can be combined.
Let’s take
number_C1_E1number_C2_E2number_C3_E1as an example, herenumber_C1_E1andnumber_C3_E1can be combined because they are bothnumber_E1afterremoving _C{0~9}.
- The two combined columns should get rid of the
Nonevalues.
The desired result is
number_C1_1_E1 fruit_C11_1_E1 name_C111_1_E1 number_C2_1_E2 fruit_C22_1_E2 name_C222_1_E2 0 1 apple tom None None None 1 2 banana jerry None None None 2 3 blueberry anthony 3 blueberry anthony 3 4 strawberry terry None None None 4 5 watermelon paul None None None 5 6 peach edward None None None 6 7 orange reggie None None None 7 8 lemon nicholas None None None
Anyone has a good solution?
Advertisement
Answer
Use the same as your previous question, but also compute a renamer for your columns:
group = df.columns.str.replace(r'_Cd+', '', regex=True)
names = df.columns.to_series().groupby(group).first()
out = (df.groupby(group, axis=1, sort=False).first()
.rename(columns=names)
)
Alternative:
group = df.columns.str.replace(r'_Cd+', '', regex=True)
out = (df.groupby(group, axis=1, sort=False).first()
.set_axis(df.columns[~group.duplicated()], axis=1)
)
Output:
number_C1_E1 fruit_C11_E1 name_C111_E1 number_C2_E2 fruit_C22_E2 name_C222_E2 0 1 apple tom None None None 1 2 banana jerry None None None 2 3 blueberry anthony 3 blueberry anthony 3 4 strawberry terry None None None 4 5 watermelon paul None None None 5 6 peach edward None None None 6 7 orange reggie None None None 7 8 lemon nicholas None None None