Skip to content
Advertisement

regex for finding gene product from the text

What regex should I use for matching such text

/product="hypothetical protein"".

by far I have tired this pattern:

x = re.match(r"^s*\=product(.*)",line)"

Advertisement

Answer

Use

import re
test_str = ' /product="hypothetical protein"'
match = re.search(r'product="([^"]+)"', test_str)
if match:
    print(match.group(1))

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  product="                'product="'
--------------------------------------------------------------------------------
  (                        group and capture to 1:
--------------------------------------------------------------------------------
    [^"]+                    any character except: '"' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of 1
--------------------------------------------------------------------------------
  "                        '"'
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement