Skip to content
Advertisement

Python finding the common parts of a string throughout a list and removing it from every item

I have a list of file directories that looks similar to this:

path/new/stuff/files/morefiles/A/file2.txt
path/new/stuff/files/morefiles/B/file7.txt
path/new/stuff/files/morefiles/A/file1.txt
path/new/stuff/files/morefiles/C/file5.txt

I am trying to remove the beginnings of the paths that are the same from every list, and then deleting that from each file.

The list can be any length, and in the example I would be trying to change the list into:

A/file2.txt
B/file7.txt
A/file1.txt
C/file5.txt

Methods like re.sub(r'.*I', 'I', filepath) and filepath.split('_', 1)[-1] can be used for the replacing, but I’m not sure about how to find the common parts in the list of filepaths

Note:

I am using Windows and python 3

Advertisement

Answer

The first part of the answer is here: Python: Determine prefix from a set of (similar) strings

Use os.path.commonprefix() to find the longest common (first part) of the string

The code for selecting the part of the list that is the same as from that answer is:

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            return s1[:i]
    return s1

Now all you have to do is use slicing to remove the resulting string from each item in the list

This results in:

# Return the longest prefix of all list elements.
def commonprefix(m):
    "Given a list of pathnames, returns the longest common leading component"
    if not m: return ''
    s1 = min(m)
    s2 = max(m)
    for i, c in enumerate(s1):
        if c != s2[i]:
            ans = s1[:i]
            break
    for each in range(len(m)):
        m[each] = m[each].split(ans, 1)[-1]
    return m
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement