Extract IBAN from text with Python

Question

I want to extract IBAN numbers from text with Python. The challenge here is, that the IBAN itself can be written in so many ways with spaces bewteen the numbers, that I find it difficult to translate this in a usefull regex pattern. I have written a demo version which tries to match all German and Austrian IBAN numbers from

Accepted Answer

In general, to match German and Austrian IBAN codes, you can usecodes = re.findall(r'b(DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18})b(?!s*[0-9])', text)Details:b &#8211; word boundary(DE(?:s*[0-9]){20}|AT(?:s*[0-9]){18}) &#8211; Group 1: DE and 20 repetitions of a digit with any amount of whitespace in between, or AT and then 18 repetitions of single digits eventaully separated with any amount of whitespacesb(?!s*[0-9]) &#8211; word boundary that is NOT immediately followed with zero or more whitespaces and an ASCII digit.See this regex demo.For the data you showed in the question that includes non-proper IBAN codes, you can useb(?:DE|AT)(?:s?[0-9a-zA-Z]){18}(?:(?:s?[0-9a-zA-Z]){2})?bSee the regex demo. Details:b &#8211; word boundary(?:DE|AT) &#8211; DE or AT(?:s?[0-9a-zA-Z]){18}  &#8211;  eighteen occurrences of an optional whitespace and then an alphanumeric char(?:(?:s?[0-9a-zA-Z]){2})? &#8211; an optional occurrence of two sequences of an optional whitespace and an alphanumeric charb &#8211; word boundary.

Advertisement

Answer