Skip to content
Advertisement

Python Pandas Extract word from column that contains String with Regex

I have this data frame (columns are strings):

JavaScript

I need to get 'ORF' values for rows with 'ORFDesc' that contains a word with “hydro” but only with 13 characters. I explain, word length must be 13 characters, not the whole description.

I’m using

JavaScript

In order to match the rows that contain “hydro” but I need to reject the ones with length != 13.

I would like to use a regex so I can make a new Column ‘word’ like:

JavaScript

And then be able to discard rows by using length in ‘word’ column.

What pattern will it be?

EDIT:

I have tryed this but still dont work:

JavaScript

Advertisement

Answer

You can use

JavaScript

See the regex demo

Details

  • b – a word boundary
  • (?=w{13}b) – a positive lookahead that requires 13 word chars to be present immediately to the right of the current location followed with a word boundary
  • w*hydro – zero or more word chars and then hydro.

Python code:

JavaScript
Advertisement