Skip to content
Advertisement

Regular expression matching of the contents of text files in a directory

I have a directory of text files. I need to set a status for each file based on whether it matches 1, both or neither regex patterns. My plan is:

  1. Walk directory
  2. If the file’s content:
    • does not match either pattern, status = 1
    • matches pattern1 BUT NOT pattern2, status = 2
    • matches pattern2 BUT NOT pattern1, ignore
    • matches pattern1 AND pattern2, status = 3
  3. Print file name and status

My code:

pattern1 = re.compile(r'critical', re.IGNORECASE)
pattern2 = re.compile(r'gouting bile', re.IGNORECASE)

for file in os.listdir('/home/ea/medical'):
    if re.findall(pattern1, file) and re.findall(pattern2, file):
        status = 3
        print(file, "Status: ", status)
    elsif re.findall(pattern1, file) and not re.findall(pattern2, file):
        status = 2
        print(file, "Status: ", status)
    else:
        status = 1
        print(file, "Status: ", status)

My issue is that this doesn’t return anything.

Advertisement

Answer

you need to read the files, you’re just checking the patterns against the filenames.

for file in os.listdir('/home/ea/medical'):
    contents = open(os.path.join('/home/ea/medical', file)).read()
    status = 1
    if re.search(pattern1, contents):
        status += 1
    if re.search(pattern2, contents):
        status += 1
    print(f"{file} Status: {status}")
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement