Replacing NaN values in a DataFrame row with values from other rows based on a (non-unique) column value

Question

I have a DataFrame similar to the following where I have a column with a non-unique value (in this case address) as well as some other columns containing information about it. Some of the addresses appear more than once in the DataFrame and some of those repeated ones are missing information. If a certain row…

Accepted Answer

number of ways you can do this, the most easiest is groupby and ffill / bfill the groups.import numpy as npimport pandas as pddf = df.replace('',np.nan,regex=True).groupby('address').apply(lambda x : x.ffill().bfill())print(df)           address   val  val20   11 Star Street  10.0  20.01     22 Milky Way  20.0  10.02    88 Dark Drive   NaN   NaN3  33 Planet Place  20.0  40.04     22 Milky Way  20.0  10.05     22 Milky Way  20.0  10.0Another, and more performant method would be using update along your axis.vals = df.replace('',np.nan,regex=True).groupby('address').first()print(vals)                         val  val2    address                        11 Star Street   10.0  20.0    22 Milky Way     20.0  10.0    33 Planet Place  20.0  40.0    88 Dark Drive     NaN   NaNdf = df.set_index('address')df.update(vals)                val val2address                 11 Star Street   10   2022 Milky Way     20   1088 Dark Drive           33 Planet Place  20   4022 Milky Way     20   1022 Milky Way     20   10

Advertisement

Answer