I want to find out how can I extract only the correct IP address from a very long string. The problem is that my code extract the IP address even if a part of it has more than 3 digits (which is incorrect).
I tried to learn more about Python REGEX, but I don’t know exactly how to stop it at maximum 3 consecutive digits after a dot.
What I mean is that if an IP is 1.2.3.4
it finds it correctly, which is indeed correct, but if an IP is 1.2.3.4567
it also finds it correctly, which is not correct. I don’t know how to say to it that if a group has more than 3 digits, than that’s not an IP address.
import re secv = "akmfiawnmgisa gisamgisamgsagr[sao l321r1m r2p4 2342po4k2m4 22.33.4.aer 1.2.3.5344 99.99.99.100 asoifinagf sadgsangidsng sg" b = re.findall(r"[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.d{1,3}", secv) print(b)
It prints 1.2.3.5344
and also 99.99.99.100
, but 1.2.3.5344
is not an IP address because it has more than 3 consecutive digits.
Advertisement
Answer
import re secv = "90.123.1.100 akmfiawnmgisa gisamgisamgsagr[sao l321r1m r2p4 2342po4k2m4 22.33.4.aer 1.2.3.5344 99.99.99.100 asoifinagf sadgsangidsng sg 13.18.19.100 1.2.3.4" b = re.findall(r"(?:s|A)(d{1,3}.d{1,3}.d{1,3}.d{1,3})(?=s|Z)",secv) b = list(filter(lambda x: all([int(y) <= 255 for y in x.split('.')]), b)) print(b)
To make it more interesting I added IP addresses at the beginning and end of your string. I am assuming that the ip address needs to be separated by white space on both sides if not at the beginning or end of the string. So I added to the REGEX at the beginning a non-capturing group (?:s|A) that will match either a white space character or the beginning of the string. I have also added to the end of the REGEX a lookahead assertion (?=s|Z) that will match a single white space character or the end of the line without consuming any characters. The above prints out:
['90.123.1.100', '99.99.99.100', '13.18.19.100', '1.2.3.4']