How to target multiple strings with single regex pattern

Tags: ,



I have multiple strings such as

POST /incentivize HTTP/1.1
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1
DELETE /virtual/solutions/target/web+services HTTP/2.0
PATCH /interactive/architect/innovative/24%2f7 HTTP/1.1

I want to target all these strings with regex.

I tried the following pattern

pattern = r"([A-Z]* /([A-Za-z0-9])D+ [A-Z]*/d.d)"

Here is the full code

string = """
POST /incentivize HTTP/1.1
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1
DELETE /virtual/solutions/target/web+services HTTP/2.0
PATCH /interactive/architect/innovative/24%2f7 HTTP/1.1
"""

pattern = r"(?P<url>[A-Z]* /([A-Za-z0-9])D+ [A-Z]*/d.d)"

result = [item.groupdict() for item in re.finditer(pattern,string)]

result

This outputs the following

[{'url': 'POST /incentivize HTTP/1.1'},
 {'url': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'},
 {'url': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}]

With this pattern, I am able to target the first three strings. But for the life of me, I am not able to figure out how to target the last string. This is just a sample of many more strings in the list. I need to make this dynamic so that the program is able to capture strings that are similar to this.

I am a rookie in python and have just started learning regex.

Any help will be appreciated.

Answer

I would use re.findall here with the following regex pattern:

b(?:POST|GET|PUT|PATCH|DELETE)b /[^/s]+(?:/[^/s]+)* HTTP/d+(?:.d+)?

Script:

string = """
POST /incentivize HTTP/1.1
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1
DELETE /virtual/solutions/target/web+services HTTP/2.0
PATCH /interactive/architect/innovative/24%2f7 HTTP/1.1
"""
matches = re.findall(r'b(?:POST|GET|PUT|PATCH|DELETE)b /[^/s]+(?:/[^/s]+)* HTTP/d+(?:.d+)?', string)
print(matches)

This prints:

['POST /incentivize HTTP/1.1',
 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1',
 'DELETE /virtual/solutions/target/web+services HTTP/2.0',
 'PATCH /interactive/architect/innovative/24%2f7 HTTP/1.1']

The regex pattern works by matching one of several HTTP methods in an alternation, to which you may add more methods if necessary. Then, it matches a path, followed by HTTP and a version number.



Source: stackoverflow