Skip to content
Advertisement

Replace character in file name with regex python

My script should replace the “|” character of a file it finds via regex in a directory with an “l”.

The code runs but filenames are not replaced. What is wrong?

#!/usr/bin/python

import os
from posixpath import dirname
import re
import glob
import fnmatch

class bcolors:
    HEADER = '33[95m'
    OKBLUE = '33[94m'
    OKCYAN = '33[96m'
    OKGREEN = '33[92m'
    WARNING = '33[93m'
    FAIL = '33[91m'
    ENDC = '33[0m'
    BOLD = '33[1m'
    UNDERLINE = '33[4m' 

#Path
file_src = dirname(os.path.abspath(__file__))

#Current directory name
print(bcolors.OKBLUE + bcolors.BOLD + 'Directory:', file_src)
'n'

#List all files in directory
list_file = os.listdir(file_src)
print(bcolors.BOLD + 'In this directory:', 'n', list_file)
'n'

#Finding all the "|" characters in a string
file_pattern = re.compile('[\":<>;|*?]*')


#Replace "|" with "l"
list = str(list_file)
re.sub(file_pattern, 'l', list, re.I)

Advertisement

Answer

Joshua’s answer and the many comments, especially the suggestions from ekhumoro, already pointed out issues and guided to the solution.

Fixed and improved

Here is my copy-paste ready code, with some highlighting inline comments:

#!/usr/bin/python

import os
from posixpath import dirname
import re
import glob
import fnmatch

class bcolors:
    HEADER = '33[95m'
    OKBLUE = '33[94m'
    OKCYAN = '33[96m'
    OKGREEN = '33[92m'
    WARNING = '33[93m'
    FAIL = '33[91m'
    ENDC = '33[0m'
    BOLD = '33[1m'
    UNDERLINE = '33[4m' 
    RESET = 'u001b[0m' # added to get regular style

def print_list(files):
    '''Print a list, one element per line.'''
    for f in files:
        print(bcolors.OKBLUE + f + bcolors.RESET)

#Path
directory = dirname(os.path.abspath(__file__))

#Current directory name
print(bcolors.BOLD + 'Directory:' + bcolors.OKBLUE, directory)
print(bcolors.RESET)

#List all files in directory
files = os.listdir(directory)
print(bcolors.BOLD + 'In this directory:' + bcolors.OKBLUE, len(files), bcolors.RESET + 'files')
print_list(files)

#Finding all the "|" characters in a string
pipe_pattern = re.compile('|')  # need to escape the special character pipe (in regex means logical-OR)


#Replace "|" with "l"
renamed_files = []
for f in files:
    f_renamed = re.sub(r'|', 'l', f, re.I)
    if (str(f_renamed) != str(f)):
        renamed_files.append(f_renamed)

# print the list of filenames, each on a separate line
print(bcolors.BOLD, "Renamed:" + bcolors.OKGREEN, len(renamed_files), bcolors.RESET + "files")
print_list(renamed_files)

Explanation

  • A simple regex to match a pipe-character is |
  • Note: prepended backslash is required to escape special characters (like | (or), escape, ( and ) grouping etc.)
  • Sometimes it is useful to extract code-blocks to functions (e.g. the def print_list) . These can be easily tested.

Test your replacement

To test your replacement a simple function would help. Then you can test it with a fictive example.

def replace_pipe(file):
    return file.replace('|', 'l') # here the first argument is no regex, thus not escaped!

### Test it with an example first
print( replace_pipe('my|file.txt') )

If everything works like expected. Then you add further (critical) steps.

Avoid integrating the I/O layer to early

To elaborate on the important advice from ekhumoro: The os.rename is a file-system operation at I/O layer. It has immediate effect on your system and can not easily be undone.

So it can be regarded as critical. Imagine your renaming does not work as expected. Then all the files can be renamed to a cryptic mess, at worst (harmful like ransomware).

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement