Loop over regular expressions using Pandas str.extract

Question

I want to extract numeric values from arbitrary strings in a column in my pandas dataframe. Two regexes that shall be looped over the column &#8220;watt&#8221; using str.extract. The str.extract function shall be applied to all NaN values. On the next iteration, non NaN values (=matches) shall be excluded fro…

Accepted Answer

I think you have 2 options within your framework: In both you should mask the NaNs, in the column you’re searching in as well as the column you’re writing to.Since .str.extract() returns a series with expand=False (default), the writing needs a bit of tuning (using .values):regexes = [r'([0-9.,]+)[s-]?watt[s]? ', r'([0-9.,]+)[s-]?w ']df['watt'] = np.nanfor regex in regexes: mask = df['watt'].isna() df.loc[mask, 'watt'] = df.loc[mask, 'title'].str.extract(regex).valuesOr you could use named groups in the regex such that the group name matches the label of the column you’re writing to:regexes = [r'(?P[0-9.,]+)[s-]?watt[s]? ', r'(?P[0-9.,]+)[s-]?w ']df['watt'] = np.nanfor regex in regexes: mask = df['watt'].isna() df.loc[mask, 'watt'] = df.loc[mask, 'title'].str.extract(regex)Both produce the following result: title watt0 This bulb operates at 222 watts and is fabulous. 2221 This bulb operates at 999 w and is fantastic. 999

Advertisement

Answer