Replace values in one dataframe with values in second dataframe in Python

Question

I have a large dataframe (DF1) that contains a variable containing UK postcode data. Inevitably there are some typos in the data. However, after some work with regular expressions, I have created a second database that contains corrected versions of the postcode data (but only for those rows where the original postcode was incorrect) – DF2. (N.B. the index values

Accepted Answer

You can try merge by indexes , create mask by notnull and add new values by loc:df = pd.merge(df1, df2, left_index=True, right_index=True, how='left')mask = pd.notnull(df['pcCorrected'])print mask0     False2     False4     False6     False8     False10     True12     True14     True16    False18    FalseName: pcCorrected, dtype: booldf.loc[mask, 'remark'] = 'Normal'df.loc[mask, 'postcode'] = df['pcCorrected']print df[['id','postcode','remark']]    id   postcode     remark0    1      L93AP     Normal2    2     LD38AH     Normal4    3    SO224ER     Normal6    4       SO21  Too short8    5    DN379HJ     Normal10   6     M210RH     Normal12   7     NP74SG     Normal14   8    SE136RZ     Normal16   9  BN251ESBN   Too long18  10    TD152EH     Normal

Advertisement

Answer