How do I create a new column which joins the column names for any non na values on a per row basis.
- Please note the duplicate index.
Code
so_df = pd.DataFrame({"ma_1":[10,np.nan,13,15],
             "ma_2":[10,11,np.nan,15],
             "ma_3":[np.nan,11,np.nan,15]},index=[0,1,1,2])
Example DF
ma_1 ma_2 ma_3 0 10.0 10.0 NaN 1 NaN 11.0 11.0 1 13.0 NaN NaN 2 15.0 15.0 15.0
Desired output is a new column which joins the column names for non na values as per col_names example below.
so_df["col_names"] = ["ma_1, ma_2","ma_2, ma_3","ma_1","ma_1, ma_2, ma_3"]
    ma_1    ma_2    ma_3    col_names
0   10.0    10.0    NaN     ma_1, ma_2
1   NaN     11.0    11.0    ma_2, ma_3
1   13.0    NaN     NaN     ma_1
2   15.0    15.0    15.0    ma_1, ma_2, ma_3
Advertisement
Answer
Try with dot
df['new'] = df.notna().dot(df.columns+',').str[:-1] df Out[77]: ma_1 ma_2 ma_3 new 0 10.0 10.0 NaN ma_1,ma_2 1 NaN 11.0 11.0 ma_2,ma_3 1 13.0 NaN NaN ma_1 2 15.0 15.0 15.0 ma_1,ma_2,ma_3