Skip to content
Advertisement

My code is confusing an input file name for a regex expression

My regular expression does not explicitly include a dash in a character range, but my code fails when the input file name is like this:

Rage Against The Machine - 1996 - Bulls On Parade [Maxi-Single]

Here is my code:

def find_cue_files(path):
  found_files = []
  for root, dirs, files in os.walk(path):
    if files:
      fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]')) # this is line 81 in my source file (mentioned in the traceback)
      # do a few other things...
  return found_files

It seems obvious that this part of the filename is the issue: [Maxi-Single]

How do I handle filenames similar to that so that they are treated as fixed strings, not part of the regex expression?

(Not my main question, but in case it is related, I am open to try an alternative method of making a case-insensitive search. I have looked at several Stack Overflow questions on that topic and I didn’t — so far — find any solutions that seemed to fit this case.)

Here is my error:

Traceback (most recent call last):

  File "/usr/bin/xonsh", line 33, in <module>
    sys.exit(load_entry_point('xonsh==0.10.0', 'console_scripts', 'xonsh')())
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21336, in main
    _failback_to_other_shells(args, err)
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21283, in _failback_to_other_shells
    raise err
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21334, in main
    sys.exit(main_xonsh(args))
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 21388, in main_xonsh
    run_script_with_cache(
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3285, in run_script_with_cache
    run_compiled_code(ccode, glb, loc, mode)
  File "/usr/lib/python3.9/site-packages/xonsh/__amalgam__.py", line 3190, in run_compiled_code
    func(code, glb, loc)
  File "process_audio_files.xsh", line 160, in <module>
    cue_files = find_cue_files(dest_path)
  File "process_audio_files.xsh", line 81, in find_cue_files
    fcue = glob(os.path.join(root, '*.[Cc][Uu][Ee]'))
  File "/usr/lib/python3.9/glob.py", line 22, in glob
    return list(iglob(pathname, recursive=recursive))
  File "/usr/lib/python3.9/glob.py", line 74, in _iglob
    for dirname in dirs:
  File "/usr/lib/python3.9/glob.py", line 75, in _iglob
    for name in glob_in_dir(dirname, basename, dironly):
  File "/usr/lib/python3.9/glob.py", line 86, in _glob1
    return fnmatch.filter(names, pattern)
  File "/usr/lib/python3.9/fnmatch.py", line 58, in filter
    match = _compile_pattern(pat)
  File "/usr/lib/python3.9/fnmatch.py", line 52, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.9/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.9/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.9/sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.9/sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range i-S at position 70

EDIT: I tried using re.escape which is referenced here: https://docs.python.org/3/library/re.html

def find_cue_files(path):
  found_files = []
  for root, dirs, files in os.walk(path):
    if files:
      root2 = re.escape(root)
      fcue = glob(os.path.join(root2, '*.[Cc][Uu][Ee]')) 
      # do a few other things...
  return found_files

It processed the earlier filename but now fails with the input filename Aerosmith - Aerosmith (2014) [24-96 HD] producing the same error at the same point in my revised code.

Advertisement

Answer

Rather than using glob with funny file patterns passed through root, you are better off sorting out just the names, and then prepend the root. One possible one-liner:

fcue=list(map(lambda x: os.path.join(root,x), (f for f in files if f.lower().endswith('.cue'))))
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement