I want to merge 2 columns of the same dataframe, but by using some specific condition.
consider the following dataframe :
number-first | Number-second |
---|---|
1 | Nan |
2 | 4C |
3A | 5 |
Nan | 6 |
Nan | 7 |
Nan | Nan |
The conditions are:
- If the Number-first column has a alphanumeric value and the Number-second Column has a Nan value or a ” (empty string) -> the Result column should only consider the value from Number-first
- If the Number-first column has a Nan or ” (empty string) value and the Number-second Column has a alphanumeric value -> the Result column should only consider the value from Number-second
- If the values from both the columns are alphanumeric the result column should consist of value from Number-first and Number-second, which are separated by a ‘-‘
- If both the Columns have Nan or empty string values, the result should consist of a ” (empty string) value
Following would be the output for the above dataframe:
Number-first | Number-second | Result |
---|---|---|
1 | Nan | 1 |
2 | 4C | 2 – 4C |
3A | 5 | 3A – 5 |
Nan | 6 | 6 |
Nan | 7 | 7 |
Nan | Nan | Nan |
I have been unsuccessful using the .select method and providing the above conditions.
Thanks in advance for the help !
below is the code snippet of the conditions, which don’t seem to work for me:
JavaScript
x
5
1
conditions = [
2
df['Number-first'].str.isalnum(),
3
df['Number-second'].str.isalnum(),
4
df['Number-first'].str.isalnum() & df['Number-second'].str.isalnum() ]
5
Advertisement
Answer
You can use the combine
function to do this with a custom function like so:
JavaScript
1
30
30
1
import pandas as pd
2
import numpy as np
3
4
def custom_combine(v1, v2):
5
if pd.isna(v1) & pd.isna(v2):
6
return np.nan
7
elif pd.isna(v1):
8
return v2
9
elif pd.isna(v2):
10
return v1
11
else:
12
return f'{v1} - {v2}'
13
14
df['Result'] = (
15
# ignore non alphanumeric values
16
df.where(df.apply(lambda s: s.str.isalnum()))
17
.pipe(lambda df:
18
df['Number-first'].combine(df['Number-second'], custom_combine)
19
)
20
)
21
22
print(df)
23
Number-first Number-second Result
24
0 1 NaN 1
25
1 2 4C 2 - 4C
26
2 3A 5 3A - 5
27
3 NaN 6 6
28
4 NaN 7 7
29
5 NaN NaN NaN
30
Alternatively, you can take advantage of pandas’ vectorized string methods
JavaScript
1
21
21
1
import pandas as pd
2
import numpy as np
3
4
df['Result'] = (
5
df.where(df.apply(lambda s: s.str.isalnum()))
6
.pipe(lambda df:
7
df['Number-first'].str.cat(df['Number-second'], '-', na_rep='')
8
)
9
.str.strip('-')
10
.replace('', np.nan)
11
)
12
13
print(df)
14
Number-first Number-second Result
15
0 1 NaN 1
16
1 2 4C 2-4C
17
2 3A 5 3A-5
18
3 NaN 6 6
19
4 NaN 7 7
20
5 NaN NaN NaN
21