I have a list of filenames but in the directory they are named a little different. I wanna print filenames that are not in directory. Example of files:
FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001_2021-08-30T124702.130.tgz
import os missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv'] for fileName in missing: for fileNames in next(os.walk('C:\Users\foo\bar'))[2]: if fileName not in fileNames: print(fileName)
I cannot get what I’m doing wrong…
Advertisement
Answer
The problem is that you iterate over every file in the directory (for fileNames in next(os.walk(...))[2]
) and check if fileName
is in each of those file names. For every file in the folder where fileName not in fileNames
, fileName
is printed, resulting in it being printed many times.
This can be fixed by doing a single check to see if all files in the folder do not contain the target file name.
import os missing = ['FOO_BAR_524B_023D9B01_2021-157T05-34-31__00001', 'dfiknvbdjfhnv'] fileNames = next(os.walk('C:\Users\foo\bar'))[2] for missingfileName in missing: if all(missingfileName not in fileName for fileName in fileNames): print(missingfileName)
If you want it to be more efficient and you are only looking for file names that are prefixes of other names, then you can use a data structure called a trie. For example if missing
equals ['bcd']
, and there is a file called abcde
and these are not considered a match, then a trie is appropriate here.