Skip to content
Advertisement

Replace values in one dataframe with values in second dataframe in Python

I have a large dataframe (DF1) that contains a variable containing UK postcode data. Inevitably there are some typos in the data. However, after some work with regular expressions, I have created a second database that contains corrected versions of the postcode data (but only for those rows where the original postcode was incorrect) – DF2. (N.B. the index values are not necessarily consecutive.)

JavaScript

The dataframe containing the corrected data is:

JavaScript

I want to combine the 2 databases such that the new values in the pcCorrected column in DF2 replace the old postcode values in the DF1 dataframe but, for other cells, the existing postcode values remain in tact. The final database should look like:

JavaScript

The databases are quite large (>1 million rows). Does this action have a name and what is the most efficient way to do this?

Advertisement

Answer

You can try merge by indexes , create mask by notnull and add new values by loc:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement