In Python, how to check if a string only contains certain characters?

Question

In Python, how to check if a string only contains certain characters? I need to check a string containing only a..z, 0..9, and . (period) and no other character. I could iterate over each character and check the character is a..z or 0..9, or . but that would be slow. I am not clear now how to do it with

Accepted Answer

Final(?) editAnswer, wrapped up in a function, with annotated interactive session:>>> import re>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):...     return not bool(search(strg))...>>> special_match("")True>>> special_match("az09.")True>>> special_match("az09.n")False# The above test case is to catch out any attempt to use re.match()# with a `$` instead of `Z` -- see point (6) below.>>> special_match("az09.#")False>>> special_match("az09.X")False>>>Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it&#8217;s the cost of returning a MatchObject instead of None) and may warrant further rummaging.==== Earlier text ====The [previously] accepted answer could use a few improvements:(1) Presentation gives the appearance of being the result of an interactive Python session:reg=re.compile('^[a-z0-9.]+$')>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')Truebut match() doesn&#8217;t return True(2) For use with match(), the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^(3) Should foster the use of raw string automatically unthinkingly for any re pattern(4) The backslash in front of the dot/period is redundant(5) Slower than the OP&#8217;s code! prompt>rem OP's version -- NOTE: OP used raw string!prompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"1000000 loops, best of 3: 1.43 usec per loopprompt>rem OP's version w/o backslashprompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"1000000 loops, best of 3: 1.44 usec per loopprompt>rem cleaned-up version of accepted answerprompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile(r'[a-z0-9.]+Z')" "bool(reg.match(t))"100000 loops, best of 3: 2.07 usec per loopprompt>rem accepted answerprompt>python26python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';importre;reg=re.compile('^[a-z0-9.]+$')" "bool(reg.match(t))"100000 loops, best of 3: 2.08 usec per loop(6) Can produce the wrong answer!!>>> import re>>> bool(re.compile('^[a-z0-9.]+$').match('1234n'))True # uh-oh>>> bool(re.compile('^[a-z0-9.]+Z').match('1234n'))False

Advertisement

Answer