I want to extract numeric values from arbitrary strings in a column in my pandas dataframe.
Two regexes that shall be looped over the column “watt” using str.extract.
The str.extract function shall be applied to all NaN values.
On the next iteration, non NaN values (=matches) shall be excluded from the str.extract operation so that previous results are retained and not overwritten.
I must be totally misunderstanding something here, because my implementation is not working.
Although I am using .isnan() to filter out previous matches, it overwrites previous matches.
import pandas as pd df = pd.DataFrame([{'title':'This bulb operates at 222 watts and is fabulous.'}, {'title':'This bulb operates at 999 w and is fantastic.'}]) regexes = ['([0-9.,]{1,})[s-]{0,1}watt[s]{0,1} ', '([0-9.,]{1,})[s-]{0,1}w '] for regex in regexes: #create column with nan values on first iteration if 'watt' not in df.columns: df['watt'] = np.nan #select only rows from "watt" column with nan values -> run str.extract() -> update df "watt" column, repeat... print(df[df['watt'].isnull()]['title'].str.extract(regex)) #debug df['watt'] = df[df['watt'].isnull()]['title'].str.extract(regex) print(df)
Advertisement
Answer
I think you have 2 options within your framework: In both you should mask the NaN
s, in the column you’re searching in as well as the column you’re writing to.
Since .str.extract()
returns a series with expand=False
(default), the writing needs a bit of tuning (using .values
):
regexes = [r'([0-9.,]+)[s-]?watt[s]? ', r'([0-9.,]+)[s-]?w '] df['watt'] = np.nan for regex in regexes: mask = df['watt'].isna() df.loc[mask, 'watt'] = df.loc[mask, 'title'].str.extract(regex).values
Or you could use named groups in the regex such that the group name matches the label of the column you’re writing to:
regexes = [r'(?P<watt>[0-9.,]+)[s-]?watt[s]? ', r'(?P<watt>[0-9.,]+)[s-]?w '] df['watt'] = np.nan for regex in regexes: mask = df['watt'].isna() df.loc[mask, 'watt'] = df.loc[mask, 'title'].str.extract(regex)
Both produce the following result:
title watt 0 This bulb operates at 222 watts and is fabulous. 222 1 This bulb operates at 999 w and is fantastic. 999