Skip to content
Advertisement

Loop over regular expressions using Pandas str.extract

I want to extract numeric values from arbitrary strings in a column in my pandas dataframe.

Two regexes that shall be looped over the column “watt” using str.extract.

The str.extract function shall be applied to all NaN values.

On the next iteration, non NaN values (=matches) shall be excluded from the str.extract operation so that previous results are retained and not overwritten.

I must be totally misunderstanding something here, because my implementation is not working.

Although I am using .isnan() to filter out previous matches, it overwrites previous matches.

JavaScript

Advertisement

Answer

I think you have 2 options within your framework: In both you should mask the NaNs, in the column you’re searching in as well as the column you’re writing to.

Since .str.extract() returns a series with expand=False (default), the writing needs a bit of tuning (using .values):

JavaScript

Or you could use named groups in the regex such that the group name matches the label of the column you’re writing to:

JavaScript

Both produce the following result:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement