How do I create a new column which joins the column names for any non na values on a per row basis.
- Please note the duplicate index.
Code
so_df = pd.DataFrame({"ma_1":[10,np.nan,13,15], "ma_2":[10,11,np.nan,15], "ma_3":[np.nan,11,np.nan,15]},index=[0,1,1,2])
Example DF
ma_1 ma_2 ma_3 0 10.0 10.0 NaN 1 NaN 11.0 11.0 1 13.0 NaN NaN 2 15.0 15.0 15.0
Desired output is a new column which joins the column names for non na values as per col_names
example below.
so_df["col_names"] = ["ma_1, ma_2","ma_2, ma_3","ma_1","ma_1, ma_2, ma_3"] ma_1 ma_2 ma_3 col_names 0 10.0 10.0 NaN ma_1, ma_2 1 NaN 11.0 11.0 ma_2, ma_3 1 13.0 NaN NaN ma_1 2 15.0 15.0 15.0 ma_1, ma_2, ma_3
Advertisement
Answer
Try with dot
df['new'] = df.notna().dot(df.columns+',').str[:-1] df Out[77]: ma_1 ma_2 ma_3 new 0 10.0 10.0 NaN ma_1,ma_2 1 NaN 11.0 11.0 ma_2,ma_3 1 13.0 NaN NaN ma_1 2 15.0 15.0 15.0 ma_1,ma_2,ma_3