I have a directory structure that looks like the image.
This structure is organised in a form of objects
. File
objects are each in the form: {type:'file', name:'file a1'}
for example. Folder
objects are like File objects, but they have an extra key children
which is an array of sub folders/files.
Here’s a console log of the rootFolder
object from the image before:
The goal:
What I want is to write a search function that returns the path for a specific file in the form of an array. For example, if I were to search for file c3
, the result would be: ['folder a3', 'folder b4']
.
I have tried many things, but recursion isn’t my strongest suit.
The code could be in javascript or python; what I care about is the algorithm itself.
Here is the code used to generate rootFolder
:
// defining constants const TYPES = {FILE: 'file', FOLDER: 'folder'} const FILE_NAMES = { FILE_A1:'file a1', FILE_A2:'file a2', FILE_A4:'file a4', FILE_B2:'file b2', FILE_B3:'file b3', FILE_C1:'file c1', FILE_C2:'file c2', FILE_C3:'file c3', FILE_C4:'file c4', FILE_C5:'file c5', } const FOLDER_NAMES = { ROOT_FOLDER:'root folder', FOLDER_A3:'folder a3', FOLDER_B1:'folder b1', FOLDER_B4:'folder b4' } // defining classes class File { constructor(type, name) { this.name = name this.type = type } } class Folder { constructor (type, name, children) { this.name = name this.type = type this.children = children } } // defining file objects const fileA1 = new File(TYPES.FILE, FILE_NAMES.FILE_A1) const fileA2 = new File(TYPES.FILE, FILE_NAMES.FILE_A2) const fileA4 = new File(TYPES.FILE, FILE_NAMES.FILE_A4) const fileB2 = new File(TYPES.FILE, FILE_NAMES.FILE_B2) const fileB3 = new File(TYPES.FILE, FILE_NAMES.FILE_B3) const fileC1 = new File(TYPES.FILE, FILE_NAMES.FILE_C1) const fileC2 = new File(TYPES.FILE, FILE_NAMES.FILE_C2) const fileC3 = new File(TYPES.FILE, FILE_NAMES.FILE_C3) const fileC4 = new File(TYPES.FILE, FILE_NAMES.FILE_C4) const fileC5 = new File(TYPES.FILE, FILE_NAMES.FILE_C5) // defining folder objects const folderB1 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_B1, [fileC1, fileC2]) const folderB4 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_B4, [fileC3, fileC4, fileC5]) const folderA3 = new Folder(TYPES.FOLDER, FOLDER_NAMES.FOLDER_A3, [folderB1, fileB2, fileB3, folderB4]) const rootFolder = new Folder(TYPES.FOLDER, FOLDER_NAMES.ROOT_FOLDER, [fileA1, fileA2, folderA3, fileA4]) console.log(rootFolder);
Advertisement
Answer
remove indrection
Before we get started, first remove the myriad of indirection caused by over 20 variables. We can use just two (2) classes and a single root
–
class File { constructor(type, name) { this.name = name this.type = type } } class Folder { constructor(type, name, children) { this.name = name this.type = type this.children = children } } const root = new Folder("folder", "root folder", [ new File("file", "file a1"), new File("file", "file a2"), new Folder("folder", "folder a3", [ new Folder("folder", "folder b1", [ new File("file", "file c1"), new File("file", "file c2") ]), new File("file", "file b2"), new File("file", "file b3"), new Folder("folder", "folder b4", [ new File("file", "file c3"), new File("file", "file c4"), new File("file", "file c5") ]) ]), new File("file", "file a4") ])
remove redundancy
TYPES.FILE
and TYPES.FOLDER
are unneeded if you have separate File
and Folder
classes. Each instance will already have an associated type –
class File { constructor(name) { // ✅ type not necessary this.name = name } } class Folder { constructor (name, children) { // ✅ type not necessary this.name = name this.children = children } } const root = new Folder("root folder", [ new File("file a1"), new File("file a2"), new Folder("folder a3", [ new Folder("folder b1", [ new File("file c1"), new File("file c2") ]), new File("file b2"), new File("file b3"), new Folder("folder b4", [ new File("file c3"), new File("file c4"), new File("file c5") ]) ]), new File("file a4") ])
multiple return values
Because file systems support folders or files with the same name (at different hierarchies), it’s important that our search
algorithm can scan for all possible matches of the query. For a complete example, let’s include file c3
in folder b1
and folder b4
. When searching for this file, we expect to see both paths returned –
const root = new Folder("root folder", [ new File("file a1"), new File("file a2"), new Folder("folder a3", [ new Folder("folder b1", [ new File("file c1"), new File("file c2"), new File("file c3") // ✅ could have the same name ]), new File("file b2"), new File("file b3"), new Folder("folder b4", [ new File("file c3"), // ✅ could have the same name new File("file c4"), new File("file c5") ]) ]), new File("file a4") ])
the algorithm
Now that we’ve fixed the other issues with the code, let’s write search(t, query)
. We can do this using inductive reasoning –
- If
t
is a File and itsname
matches thequery
, yield the singleton result[t]
- (inductive)
t
is not a File. Ift
is a Folder and itsname
matches thequery
input the singleton result[t]
. And for allchild
oft.children
, for allpath
of the recursive sub-problemsearch(child, query)
, prependt
to thepath
and yield. - (inductive)
t
is neither a File nor a Folder. Throw a type error to notify the caller.
function* search(t, query) { switch(t?.constructor) { case File: if (t.name == query) yield [t] break case Folder: if (t.name == query) yield [t] for (const child of t.children) for (const path of search(child, query)) yield [t, ...path] break default: throw Error(`cannot search unsupported type ${t}`) } }
Here’s some search examples –
for (const path of search(root, "root folder")) console.log([".", ...path.map(f => f.name)].join("/"))
./root folder
for (const path of search(root, "file a1")) console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/file a1
Multiple paths are returned when a file or folder is found multiple times –
for (const path of search(root, "file c3")) console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3
When a file or folder cannot be found, no output is given –
for (const path of search(root, "file Q")) console.log([".", ...path.map(f => f.name)].join("/"))
⚠️ no output
caller decides effect
By yielding a sequence of Folder
and File
objects, our search
algorithm remains generic and reusable. Strings are not assembled by default and nothing is logged to the console. If the caller wishes to create a /
-separated path, that is demonstrated above. Options for a different effect are left for the caller to decide.
only one result
Using a generator is perfect for search
because the caller can choose to consume all matches or any number of them. In this example, we use a generic first
to return only the first match. Note the generator is canceled immediately after a match is found and no additional work is done by the cpu –
function first(iter) { for (const v of iter) return v } const match = first(search(root, "file c3")) if (match) console.log("found:", match.map(f => f.name).join("/")) else console.error("no match found")
found: root folderfolder a3/folder b1/file c3
Make sure you do a proper null-check if using first
in this way as it is possible that search
yields nothing when no match is found –
const match = first(search(root, "ZZZ")) if (match) console.log("found:", match.map(f => f.name).join("/")) else console.error("no match found")
no match found
higher-order search
In the original algorithm matches are determined using ==
. The algorithm could be more useful if we allow the caller to specify what constitutes a match –
function* search(t, query) { switch(t?.constructor) { case File: if (Boolean(query(t.name))) // ✅ query is a func yield [t] break case Folder: if (Boolean(query(t.name))) // ✅ query is a func yield [t] for (const child of t.children) for (const path of search(child, query)) yield [t, ...path] break default: throw Error(`cannot search unsupported type ${t}`) } }
Let’s find all files that start with file c
–
for (const path of search(root, filename => filename.startsWith("file c"))) console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c1 ./root folder/folder a3/folder b1/file c2 ./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3 ./root folder/folder a3/folder b4/file c4 ./root folder/folder a3/folder b4/file c5
specialization
With a higher-order search
function, you can easily implement things like searchByString
. This is known as a specialization –
function searchByString(t, query) { return search(t, filename => filename == query) }
for (const path of searchByString(root, "file c3")) console.log([".", ...path.map(f => f.name)].join("/"))
./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3
code demo
The code above has been condensed so you can see it all in one place. Run the snippet and verify the output –
class File { constructor(name) { this.name = name } } class Folder { constructor(name, children) { this.name = name; this.children = children } } function* search(t, query) { switch(t?.constructor) { case File: if (Boolean(query(t.name))) yield [t]; break case Folder: if (Boolean(query(t.name))) yield [t]; for (const child of t.children) for (const path of search(child, query)) yield [t, ...path]; break default: throw Error(`cannot search unsupported type ${t}`) } } function searchByString(t, query) { return search(t, filename => filename == query) } const relative = (path) => [".", ...path.map(f => f.name)].join("/") const root = new Folder("root folder", [new File("file a1"), new File("file a2"), new Folder("folder a3", [new Folder("folder b1", [new File("file c1"), new File("file c2"), new File("file c3")]), new File("file b2"), new File("file b3"), new Folder("folder b4", [new File("file c3"), new File("file c4"), new File("file c5")])]), new File("file a4")]) console.log("== exact match ==") for (const path of search(root, filename => filename == "root folder")) console.log(relative(path)) console.log("== starts with 'file c' ==") for (const path of search(root, filename => filename.startsWith("file c"))) console.log(relative(path)) console.log("== searchByString ==") for (const path of searchByString(root, "file c3")) console.log(relative(path))
.as-console-wrapper { min-height: 100%; top: 0; }
== exact match == ./root folder == starts with 'file c' == ./root folder/folder a3/folder b1/file c1 ./root folder/folder a3/folder b1/file c2 ./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3 ./root folder/folder a3/folder b4/file c4 ./root folder/folder a3/folder b4/file c5 == searchByString == ./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3
python
search
is written exactly the same in Python –
class file: def __init__(self, name): self.name = name class folder: def __init__(self, name, children): self.name = name self.children = children def search(t, query): if isinstance(t, file): if t.name == query: yield [t] elif isinstance(t, folder): if t.name == query: yield [t] for child in t.children: for path in search(child, query): yield [t, *path] else: raise TypeError("unsupported", t)
If you are using Python 3.10, there are some new features available to you. Adjust search
to take advantage of the new structural match..case
syntax –
from dataclasses import dataclass from typing import Generator, Union Ftree = Union["file", "folder"] @dataclass class file: name: str @dataclass class folder: name: str children: list[Ftree] def search(t: Ftree, query: str) -> Generator[list[Ftree], None, None]: match t: case file(name): if name == query: yield [t] case folder(name, children): if name == query: yield [t] for child in t.children: for path in search(child, query): yield [t, *path] case _: raise TypeError("unsupported", t)
Given a tree –
root = folder("root folder", [ file("file a1"), file("file a2"), folder("folder a3", [ folder("folder b1", [ file("file c1"), file("file c2"), file("file c3") ]), file("file b2"), file("file b3"), folder("folder b4", [ file("file c3"), file("file c4"), file("file c5") ]) ]), file("file a4") ])
Both implementations behave identically –
for path in search(root, "root folder"): print("/".join([".", *map(lambda f: f.name, path)]))
./root folder
for path in search(root, "file c3"): print("/".join([".", *map(lambda f: f.name, path)]))
./root folder/folder a3/folder b1/file c3 ./root folder/folder a3/folder b4/file c3