How to extract base path from DataFrame column of path strings

There are several questions about string manipulation, but I can’t find an answer which allows me to do the following—I thought it should have been simple…

I have a DataFrame which includes a column containing a filename and path

The following produces a representative example DataFrame:

df = pd.DataFrame({
    'root': {'1': 'C:\folder1\folder2\folder3\folder4\filename.csv'}
})

JavaScript
​x
 
df = pd.DataFrame({
    'root': {'1': 'C:\folder1\folder2\folder3\folder4\filename.csv'}
})
​

                                              root
1  C:folder1folder2folder3folder4filename.csv

JavaScript
 
                                              root
1  C:folder1folder2folder3folder4filename.csv
​

I want to end up with just the ‘filename’ part of the string. There is a large number of rows and the path is not constant, so I can’t use str.replace

I can strip out the rightmost ‘.csv’ part like this:

df['root'] = df['root'].str.rstrip('.csv')

JavaScript
 
df['root'] = df['root'].str.rstrip('.csv') 
​

                                          root
1  C:folder1folder2folder3folder4filename

JavaScript
 
                                          root
1  C:folder1folder2folder3folder4filename
​

But I cannot make any of the methods I have read about work to remove the path part in the left side of the string.

How can I return just the ‘filename’ part of this path (string), given that the preceding elements of the path can change from record to record?

Answer

You can use the utilities in os.path to make this easier, namely splitext and basename:

>>> import os
>>> df["root"].apply(lambda x: os.path.splitext(os.path.basename(x))[0])
0    filename
Name: root, dtype: object

JavaScript
 
>>> import os
>>> df["root"].apply(lambda x: os.path.splitext(os.path.basename(x))[0])
0    filename
Name: root, dtype: object
​

PS: rstrip doesn’t work the way you think it does– it removes those characters, not that substring. For example:

>>> "a11_vsc.csv".rstrip(".csv")
'a11_'

JavaScript
 
>>> "a11_vsc.csv".rstrip(".csv")
'a11_'
​

Advertisement

Answer