I need to copy and paste the previous non-empty value of a column based on a condition. I need to do it in the most efficient way because the number of rows is a couple of millions. Using for loop
will be computationally costly.
So it will be highly appreciated if somebody can help me in this regard.
|Col_A |Col_B | |--------|--------| |10.2.6.1| NaN | | NaN | 51 | | NaN | NaN | |10.2.6.1| NaN | | NaN | 64 | | NaN | NaN | | NaN | NaN | |10.2.6.1| NaN |
Based on the condition, whenever the Col_A
will have any value (not null) 10.2.6.1 in this example, the last seen value in Col_B
(51...64 respectively)
will be paste on that corresponding row where the Col_A
value is not null. And the dataset should look like this:
|Col_A |Col_B | |--------|--------| |10.2.6.1| NaN | | NaN | 51 | | NaN | NaN | |10.2.6.1| 51 | | NaN | 64 | | NaN | NaN | | NaN | NaN | |10.2.6.1| 64 |
I tried with this code below but it’s not working:
df.loc[df["Col_A"].notnull(),'Col_B'] = df.loc[df["Col_B"].notnull(),'Col_B']
Advertisement
Answer
You can forward-fill the NaN values using ffill
with the most recent non-NaN value.
If you want to keep the NaNs in Col_B
then simply create a new column (Col_C
) as follows:
df['Col_C'] = df['Col_B'].ffill()
Then replace the value in Col_B
where Col_A
has a value:
df.loc[df['Col_A'].notnull(), 'Col_B'] = df.loc[df['Col_A'].notnull(), 'Col_C'] df = df.drop(columns=['Col_C'])
Result:
Col_A Col_B 0 10.2.6.1 NaN 1 NaN 51.0 2 NaN NaN 3 10.2.6.1 51.0 4 NaN 64.0 5 NaN NaN 6 NaN NaN 7 10.2.6.1 64.0
The above can be simplified if you do not need to keep all NaN rows. For example, it’s possible to do:
df['Col_B'] = df['Col_B'].ffill() df = df.dropna()
Result:
Col_A Col_B 3 10.2.6.1 51.0 7 10.2.6.1 64.0