I am working on a git repo and I need to share folder hierarchy and file names to external vendor to perform some code analysis. I have whole hierarchy available in a csv file.
Problem is that I cannot provide actual folder paths or file names as they contain protected information. For code analysis, external vendor only needs folder paths and file names. They can utilize that information and provide output of code analysis. Internally, we need to have mapping available of actual vs obfuscated file paths / names.
Example of this mapping would be: conf1/conf2/conf3.txt -> dsdasd/dsadsd/dadssd.txt conf1/conf2/conf4.py -> dsdasd/dsadsd/dasdsd.py
Manual mapping is not feasible as the repo contains over 200k files with 20 level deep folder hierarchy. There are 2 requirements for this conversion:
- Extension should be retained
- Same folder path should have same obfuscated remapping
Advertisement
Answer
I’ll describe how I’d go about this in pseudocode.
NEXT := 1 MAP := empty for each full path P in your repos split P using '/' as the delimiter for each element E of the split path if it is the last element, remove the extension if E is in the MAP CODE := MAP[E] else CODE := NEXT increase NEXT MAP[E] := CODE replace E with CODE if it is the last element, put back the extension join the transformed elements using '/' as the delimiter print the result
This will convert:
conf1/conf2/conf3.txt -> 1/2/3.txt conf1/conf2/conf4.py -> 1/2/4.py
and meets your requirements. If you need to literally obfuscate the path, then you should use some unique random word, instead of NEXT
, in the pseudocode above.