I have a DF with about 50 columns. 5 of them contain strings that I want to combine into a single column, separating the strings with commas but also keeping the spaces within each of the strings. Moreover, some values are missing (NaN). The last requirement would be to remove duplicates if they exist.
So I have something like this in my DF:
symptom_1 | symptom_2 | symptom_3 | symptom_4 | symptom 5 |
---|---|---|---|---|
muscle pain | super headache | diarrhea | Sore throat | Fatigue |
super rash | ulcera | super headache | ||
diarrhea | super diarrhea | |||
something awful | something awful |
And I need something like this:
symptom_1 | symptom_2 | symptom_3 | symptom_4 | symptom 5 | all_symptoms |
---|---|---|---|---|---|
muscle pain | super headache | diarrhea | Sore throat | Fatigue | muscle pain, super headache, diarrhea, Sore throat, Fatigue |
super rash | ulcera | super headache | super rash, ulcera, headache | ||
diarrhea | super diarrhea | diarrhea, super diarrhea | |||
something awful | something awful | something awful |
I wrote the following function and while it merges all the columns it does not respect the spaces within the original strings, which is a must.
def merge_columns_into_one(DataFrame, columns_to_combine, new_col_name, drop_originals = False): DataFrame[new_col_name] = DataFrame[columns_to_combine].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1) return DataFrame
Thanks in advance for your help!
edit: when I’m writing this question the second markdown table appears just fine in the preview, but as soon as I post it the table loses it’s format. I hope you get the idea of what I’m trying to do. Else I’d appreciate your feedback on how to fix the MD table.
Advertisement
Answer
Just use fillna()
, apply()
and rstrip()
method:
df['all_symptoms']=df1.fillna('').apply(pd.unique,1).apply(','.join).str.rstrip(',')
Now if you print df
you will get your desired output:
symptom_1 | symptom_2 | symptom_3 | symptom_4 | symptom 5 | all_symptoms |
---|---|---|---|---|---|
muscle pain | super headache | diarrhea | Sore throat | Fatigue | muscle pain, super headache, diarrhea, Sore throat, Fatigue |
super rash | ulcera | super headache | super rash, ulcera, headache | ||
diarrhea | super diarrhea | diarrhea, super diarrhea | |||
something awful | something awful | something awful |