Skip to content
Advertisement

Extract names of a sentence with regex

I’m very new with the syntax of regex, I already read some about the libary. I’m trying extract names from a simple sentence, but I found myself in trouble, below I show a exemple of what I’ve done.

x = 'Fred used to play with his brother, Billy, both are 10 and their parents Jude and Edde have two more kids.'

import re

re.findall('^[A-Za-z ]+$',x)

Anyone can explain me what is wrong and how to proceed?

Advertisement

Answer

I think your regex has two problems.

  • You want to extract names of sentence. You need to remove ^ start of line and $ end of line.
  • Name starts with uppercase and does not have space. You should remove in your regex.

You could use following regex.

b[A-Z][A-Za-z]+b

I also tried to test result on python.

x = 'Fred used to play with his brother, Billy, both are 10 and their parents Jude and Edde have two more kids.'

import re

result = re.findall('\b[A-Z][A-Za-z]+\b',x)
print(result)

Result.

['Fred', 'Billy', 'Jude', 'Edde']
Advertisement