Skip to content
Advertisement

Python regex: get words that are between given word and the closest character

I have a dataset that looks like this

ID Details
1 he wants to invest, Project: Emaar, budget []
2 she is interested in renting, Project: W Residence, bedrooms=2
3 wants to sell, Project: Dubai View; callback

I need to extract project name, which is located between a word ‘Project:’ and closet character (for e.x. , | ;)

So that in the result it looks like this:

ID Details
1 Emaar
2 W Residence
3 Dubai View

Advertisement

Answer

If the comma & semi-colon are always at the end of the project name and your projects only have letters & spaces in their names, then you could use this regex:

Project: ([A-Za-z ]+)[;,]

Example.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement