Skip to content
Advertisement

How to convert CRLF to LF on a Windows machine in Python

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".

Those templates are used in a deployment script on a windows machine to deploy on a unix server.

Problem is, a lot of people are going to mess with those template, and I’m 100% sure that some of them will put some CRLF inside.

How could I, using Python, convert all the CRLF to LF?

Advertisement

Answer

Convert line endings in-place (with Python 3)

Line endings:

  • Windowsrn, called CRLF
  • Linux/Unix/MacOSn, called LF

Windows to Linux/Unix/MacOS (CRLFLF)

Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.

# replacement strings
WINDOWS_LINE_ENDING = b'rn'
UNIX_LINE_ENDING = b'n'

# relative or absolute file path, e.g.:
file_path = r"c:UsersUsernameDesktopfile.txt"

with open(file_path, 'rb') as open_file:
    content = open_file.read()
    
# Windows ➡ Unix
content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)

# Unix ➡ Windows
# content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)

with open(file_path, 'wb') as open_file:
    open_file.write(content)

Linux/Unix/MacOS to Windows (LFCRLF)

To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows back in (remove the # in front of the line).

DO NOT comment out the command for the Windows ➡ Unix replacement, as it ensures a correct conversion. When converting from LF to CRLF, it is important that there are no CRLF line endings already present in the file. Otherwise, those lines would be converted to CRCRLF. Converting lines from CRLF to LF first and then doing the aspired conversion from LF to CRLF will avoid this issue (thanks @neuralmer for pointing that out).


Code Explanation

Binary Mode

Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.

When opening files in text mode (mode='r' or mode='w' without b), the platform’s native line endings (rn on Windows and r on old Mac OS versions) are automatically converted to Python’s Unix-style line endings: n. So the call to content.replace() couldn’t find any rn line endings to replace.

In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.

Binary Strings

In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode – therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.

Raw Strings

On Windows the path separator is a backslash which we would need to escape in a normal Python string with \. By adding r in front of the string we create a so called “raw string” which doesn’t need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.

(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)

Alternative solution

We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

Simply opening the file again in write mode does that automatically for us.

Cheers and happy programming,
winklerrr

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement