I am using os.walk
to identify paths in a generic source directory (SRC) that contain any strings in my_list
:
SRC = '/User/dir_1/' my_list = ["dog", "cat", "mouse", "bird"] for dirpath, dirnames, filenames in os.walk(SRC): for folders in dirnames: for x in my_list: if x in folders: source_path = os.path.join(dirpath, folders)
And let’s say that print(source_path)
gives the following:
/User/dir_1/cat_test/ /User/dir_1/cat_test/bird_results/ /User/dir_1/dir_2/dog_test/ /User/dir_1/dir_2/dog_test/cat_results/ /User/dir_1/mouse_test/ /User/dir_1/mouse_test/mouse_results/ /User/dir_1/unknown_test/dog_results/ /User/dir_1/bird_files/ /User/dir_1/bird_files/bird_a_files/ /User/dir_1/bird_files/bird_b_files/
My goal is to shutil.move
my source_path’s, but since, for example, moving /User/dir_1/bird_files/
and then trying to move /User/dir_1/bird_files/bird_a_files/
will result in a FileNotFound Error, I want to filter my source_path’s to include those with only 1 occurrence of any string in my_list
, such that my source_path’s are:
/User/dir_1/cat_test/ /User/dir_1/dir_2/dog_test/ /User/dir_1/mouse_test/ /User/dir_1/unknown_test/dog_results/ /User/dir_1/bird_files/
I have tried source_path.count(x) == 1
, but that iterates through my_list
as opposed to counting any x in my_list
, such that my output is (for example):
/User/dir_1/dir_2/dog_test/cat_results/ count == 1 (for dog) /User/dir_1/dir_2/dog_test/cat_results/ count == 1 (for cat) /User/dir_1/dir_2/dog_test/cat_results/ count == 0 (for mouse) /User/dir_1/dir_2/dog_test/cat_results/ count == 0 (for bird)
but I want to see (for example):
/User/dir_1/dir_2/dog_test/cat_results/ count == 2 (for any x in my_list)
Which would allow me to filter out any source_path
with count != 1
Advertisement
Answer
Use a comprehension to filter by count, then sum the result (True
is cast to 1) to get the “any” behavior.
paths = """/User/dir_1/cat_test/ /User/dir_1/cat_test/bird_results/ /User/dir_1/dir_2/dog_test/ /User/dir_1/dir_2/dog_test/cat_results/ /User/dir_1/mouse_test/ /User/dir_1/mouse_test/mouse_results/ /User/dir_1/unknown_test/dog_results/ /User/dir_1/bird_files/ /User/dir_1/bird_files/bird_a_files/ /User/dir_1/bird_files/bird_b_files/""".split() my_list = ["dog", "cat", "mouse", "bird"] out = [] for path in paths: if sum(True for term in my_list if path.count(term) == 1) == 1: out.append(path) print(*out, sep='n')
Output
/User/dir_1/cat_test/ /User/dir_1/dir_2/dog_test/ /User/dir_1/mouse_test/ /User/dir_1/unknown_test/dog_results/ /User/dir_1/bird_files/
EDIT: From the comment, a os.walk
approach.
Idea: remove terms from the dirnames
parameter
Remark: I used as filtering condition (see comment in the code) the method substring is contained in string which is quite poor. In this special case a more robust one could be d.startswith(c)
. For more flexibility use a regex-like solution.
import os constraints = 'dog', 'cat', 'mouse', 'bird' wdir = './User' # your reference directory res = [] for path, dirs, _ in os.walk(wdir, topdown=True): # local to each directory's content counter = dict.fromkeys(constraints, False) dirs_to_skip = [] # filter by constraint for c in constraints: for d in dirs: if c in d: # <-- filter condition! if not counter[c]: # 1st match counter[c] = True res.append(os.path.join(path, d)) dirs_to_skip.append(d) # remove matched paths for d in dirs_to_skip: dirs.remove(d) print(*res, sep='n')