Skip to content
Advertisement

Creating another column in pandas df based on partially empty columns

I want to create a third column in my pandas dataframe that is based on cols 1 and 2. They are always matching, but I want to make it so that the third column takes whichever value is available. If I just go off of id1, sometimes it is blank, so the third col will end up being blank as well. I want it so that it will take whichever one isn’t blank to create the college name.

Original:

    id1     id2            
0   ID01   ID01             
1          ID03            
2   ID07                   
3   ID08   ID08            

Desired:

    id1     id2            college_name
0   ID01   ID01             College1
1          ID03             College3
2   ID07                    College7
3   ID08   ID08             College8

Also, one thing about this data frame is that I am pretty sure the first 2 columns either are an exact match or one of them is empty. I would like to double-check if there is an instance where id1 and id2 are completely different numbers in the same row. How should I do that?

Advertisement

Answer

Backfill values from id2 to id1. Extract the numbers. Convert to int then str.

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID08

Doing:

df['college_name'] = 'College' + (df.bfill(axis=1)['id1']
                                    .str.extract('(d+)')
                                    .astype(int)
                                    .astype(str))

Output:

    id1   id2 college_name
0  ID01  ID01     College1
1   NaN  ID03     College3
2  ID07   NaN     College7
3  ID08  ID08     College8

To check for rows where the ids are different:

Given:

    id1   id2
0  ID01  ID01
1   NaN  ID03
2  ID07   NaN
3  ID08  ID98

Doing:

print(df[df.id1.ne(df.id2) & df.id1.notna() & df.id2.notna()])

Output:

    id1   id2
3  ID08  ID98
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement