Skip to content
Advertisement

finditer with re.DOTALL starts analysis from span=(16,17). Why?

I’m trying to dismember a text file to sections with findall sort or action. I need backreferencing so I opt for finditer. Since I’m processing a text file w multiple lines – I need re.DOTALL. It works fine as long as the match doesn’t start in first 16 characters. The (over)simplified problem example:

r=re.compile(r'[0-9]')
[print(i) for i in r.finditer('01234567890123456789',re.DOTALL)]

The output is:

<re.Match object; span=(16, 17), match='6'>
<re.Match object; span=(17, 18), match='7'>
<re.Match object; span=(18, 19), match='8'>
<re.Match object; span=(19, 20), match='9'>
[None, None, None, None]

I expect 20 matches and not 4. I guess that I could achieve my objective with re.MULTILINE but I’d like preserve my faith in python re functions and/or my understanding of them. Please advice.

KonradP

Advertisement

Answer

What you did was something that happens to the best of us, so promise not to bang your head against the wall?

re.DOTALL is correct, but it is not the parameter for the function you wanted. You need to put it in the compile function, like so:

r=re.compile(r'[0-9]', re.DOTALL)

Fun fact: re.DOTALL in its integer format has the value 16. Wanna guess why the first 16 matches of the input were ignored?

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement