Skip to content
Advertisement

Comparison of file list with files in folder

I have a list of filenames but in the directory they are named a little different. I wanna print filenames that are not in directory. Example of files:

FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001_2021-08-30T124702.130.tgz

import os
missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv']
for fileName in missing:
    for fileNames in next(os.walk('C:\Users\foo\bar'))[2]:
        if fileName not in fileNames:
            print(fileName)

I cannot get what I’m doing wrong…

Advertisement

Answer

The problem is that you iterate over every file in the directory (for fileNames in next(os.walk(...))[2]) and check if fileName is in each of those file names. For every file in the folder where fileName not in fileNames, fileName is printed, resulting in it being printed many times.

This can be fixed by doing a single check to see if all files in the folder do not contain the target file name.

import os
missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv']
fileNames = next(os.walk('C:\Users\foo\bar'))[2]
for missingfileName in missing:
    if all(missingfileName not in fileName for fileName in fileNames):
        print(missingfileName)

If you want it to be more efficient and you are only looking for file names that are prefixes of other names, then you can use a data structure called a trie. For example if missing equals ['bcd'], and there is a file called abcde and these are not considered a match, then a trie is appropriate here.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement