Skip to content
Advertisement

How do I join two column values in string with empty values and integer in pandas?

So, I have the following data frame.

import numpy as np
import pandas as pd

df = pd.DataFrame({"col1": [4444,1111,np.nan, np.nan], "col2":["ABC", "BCD", "DEF", "FGH"]})
df

I want to combine the values from those two columns into one separated by an underscore. I don’t want to have underscores in column with nan values. So far I have tried this.

df['col3'] = df['col2']+"_"+df['col1'].fillna("").astype(str)
df

But the result is unsatisfying.

enter image description here

I don’t want the trailing underscore for column without number. And I don’t want to have that 0, but I failed to convert them to integer without running into an error.

My expected result is like this.

["ABC_4444", "BCD_1111", "DEF", "FGH"]

I believe this should be pretty simple, but I am stuck. It’s probably important to mention that this code is part of a longer function. So I pretty much prefer a lighter computation and shorter code as to not make the function look even more complicated than it already is. How do I approach this? Thank you.

Advertisement

Answer

Other way via np.where()

check for NaN’s in column ‘col1’ and If found then put the values of ‘col2’ else put ‘col1′,’col2’ joined with ‘_’:

df['col3']=np.where(
                  df['col1'].isna(),
                  df['col2'],
                  df['col2']+'_'+df['col1'].astype(str).str.strip('.0')
                   )

Output of df:

    col1    col2    col3
0   4444.0  ABC     ABC_4444
1   1111.0  BCD     BCD_1111
2   NaN     DEF     DEF
3   NaN     FGH     FGH
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement