Skip to content
Advertisement

Python – how to split by blank space if string element itself contains space?

I have a file with lines:

/home/Plugins/file1 e:222 k:dir (327/1)
/home/Plugins/file2 e:100 k:dir (326/1)

I want to take a path and element id. That’s easy.

with open('output_file.txt', 'r') as output_file:
    for line in output_file:
        file_path = line.split()[0]
        eId = line.split()[1].split(":")[1]
        logging.info("file path:"+file_path)
        logging.info("eId:"+eId)

But, the problem is that this Path of the file name (first element) itself can contain spaces – as folders OR files on the disk are created with blank space in their name (which is common case). So, I have these examples:

/home/tools/AMS Provider/file3.txt e:224 k:dir (127/1)
/home/account validator e:227 k:dir (247/1)

So path is always the first element but sometimes it contains spaces. My script above will fail because of these examples. In the given example:

AMS Provider (subfolder name)

account validator (file name at the end of the path)

Since, in this case, paths contain blank spaces (in the subfolder name but also in the file name at the end of the path) how I can still retrieve the path of the file. How to split it?

Note: unfortunately I am limited with python 2.7 on the server. Thanks!

Advertisement

Answer

I’d use regex:

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(.*) (e:d*) (k:.*) ((d{3}/d))$"

test_str = ("/home/Plugins/file1 e:222 k:dir (327/1)n"
            "/home/Plugins/file2 e:100 k:dir (326/1)n"
            "/home/tools/AMS Provider/file3.txt e:224 k:dir (127/1)n"
            "/home/account validator e:227 k:dir (247/1)")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".
           format(matchNum = matchNum, start = match.start(),
                  end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".
               format(groupNum = groupNum, start = match.start(groupNum),
                      end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex
#       and u"" to prefix the test string and substitution.

Output:

Match 1 was found at 0-39: /home/Plugins/file1 e:222 k:dir (327/1)
Group 1 found at 0-19: /home/Plugins/file1
Group 2 found at 20-25: e:222
Group 3 found at 26-31: k:dir
Group 4 found at 33-38: 327/1
Match 2 was found at 40-79: /home/Plugins/file2 e:100 k:dir (326/1)
Group 1 found at 40-59: /home/Plugins/file2
Group 2 found at 60-65: e:100
Group 3 found at 66-71: k:dir
Group 4 found at 73-78: 326/1
Match 3 was found at 80-134: /home/tools/AMS Provider/file3.txt e:224 k:dir (127/1)
Group 1 found at 80-114: /home/tools/AMS Provider/file3.txt
Group 2 found at 115-120: e:224
Group 3 found at 121-126: k:dir
Group 4 found at 128-133: 127/1
Match 4 was found at 135-178: /home/account validator e:227 k:dir (247/1)
Group 1 found at 135-158: /home/account validator
Group 2 found at 159-164: e:227
Group 3 found at 165-170: k:dir
Group 4 found at 172-177: 247/1

Playground.

Advertisement