Skip to content
Advertisement

Matching modified filenames

I have two distinct programs that processed the same input files. During the processing, both programs generated files with names based on their original counterpart. The possible changes, without considering the modified paths and file extensions, would be, in regex terms, as follow:

  • Program1 can append (_[[:alnum:]]+)* to the original filenames.
  • Program2 can append (_[[:alnum:]]+)* and prepend ([[:alnum:]]+_)* to the original filenames.

Now, given a list of several millions of file paths from Program2, I need to match each one of them to an unique file path from Program1.

What would be a sensible way to do it?

Here’s where I’m stuck:

JavaScript
JavaScript

Advertisement

Answer

Here’s a solution that uses a prefix tree to quickly match against the program 1 output files:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement