generate python regex at runtime to match numbers from ‘n’ to infinite

Question

I am using scrapy to crawl a website and extract data from it, scrapy uses regex-based rules to check if a page has to be parsed, or a link has to be followed. I am implementing a resume feature for my spider, so it could continue crawling from the last visited page. For this, I get the last followed link

Accepted Answer

Try this:def digit_match_greater(n):    digits = str(n)    variations = []    # Anything with more than len(digits) digits is a match:    variations.append(r"d{%d,}" % (len(digits)+1))    # Now match numbers with len(digits) digits.    # (Generate, e.g, for 15, "1[6-9]", "[2-9]d")    # 9s can be skipped -- e.g. for >19 we only need [2-9]d.    for i, d in enumerate(digits):        if d != "9":             pattern = list(digits)            pattern[i] = "[%d-9]" % (int(d) + 1)            for j in range(i+1, len(digits)):                pattern[j] = r"d"            variations.append("".join(pattern))    return "(?:%s)" % "|".join("(?:%s)" % v for v in variations)It turned out easier to make it match numbers greater than the parameter, so if you give it 15, it&#8217;ll return a string for matching numbers 16 and greater, specifically&#8230;(?:(?:d{3,})|(?:[2-9]d)|(?:1[6-9]))You can then substitute this into your expression instead of d+, like so:exp = re.compile(r"page%s.html" % digit_match_greater(last_page_visited))

Advertisement

Answer