So, I have the following data frame.
import numpy as np import pandas as pd df = pd.DataFrame({"col1": [4444,1111,np.nan, np.nan], "col2":["ABC", "BCD", "DEF", "FGH"]}) df
I want to combine the values from those two columns into one separated by an underscore. I don’t want to have underscores in column with nan values. So far I have tried this.
df['col3'] = df['col2']+"_"+df['col1'].fillna("").astype(str) df
But the result is unsatisfying.
I don’t want the trailing underscore for column without number. And I don’t want to have that 0, but I failed to convert them to integer without running into an error.
My expected result is like this.
["ABC_4444", "BCD_1111", "DEF", "FGH"]
I believe this should be pretty simple, but I am stuck. It’s probably important to mention that this code is part of a longer function. So I pretty much prefer a lighter computation and shorter code as to not make the function look even more complicated than it already is. How do I approach this? Thank you.
Advertisement
Answer
Other way via np.where()
check for NaN’s in column ‘col1’ and If found then put the values of ‘col2’ else put ‘col1′,’col2’ joined with ‘_’:
df['col3']=np.where( df['col1'].isna(), df['col2'], df['col2']+'_'+df['col1'].astype(str).str.strip('.0') )
Output of df
:
col1 col2 col3 0 4444.0 ABC ABC_4444 1 1111.0 BCD BCD_1111 2 NaN DEF DEF 3 NaN FGH FGH