Python regular expression help needed, multiple lines regex

Question

I was trying to scape a link out of a .eml file but somehow I always get &#8220;NONE&#8221; as return for my search. But I don&#8217;t even get the link with the confirm brackets, no problem in getting that valid link once the string is pulled. One problem that I see is, that the string that is found by the

Accepted Answer

First thing, the .eml is encoded in MIME quoted-printable (the hint is the = signs at the end of the line. You should decode this first, instead of dealing with the encoded raw text.Second, regex is overkill. Some nice string.split() usage will work just as fine. Regex is extremely usefull in it&#8217;s proper usage scenarios, but some simple python can usually do the same without having to use regex&#8217; flavor of magic, which can be confusing as [REDACTED].Note that if you&#8217;re building regex, it&#8217;s always adviced to use one of the gazillion regex editors as these will help you build your regex&#8230; My personal favorite is regex101EDIT: added regex way to do it.import quopriimport redef get_url_by_regex(raw):    decoded = quopri.decodestring(raw).decode("utf-8")     return re.search('(<a href=")(.*?)(")', decoded).group(2)def get_url(raw):    decoded = quopri.decodestring(raw).decode("utf-8")     for line in decoded.split('n'):        if 'token=' in line:            return line.split('<a href="')[1].split('"')[0]    return None  # just in case this is neededprint(get_url(raw_email))print(get_url_by_regex(raw_email))result is:https://app.rule.io/subscriber/optIn?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzd[REST_OF_TOKEN_REDACTED]https://app.rule.io/subscriber/optIn?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzd[REST_OF_TOKEN_REDACTED]

Advertisement

Answer