Skip to content
Advertisement

python open csv search for pattern and strip everything else

I got a csv file ‘svclist.csv’ which contains a single column list as follows:

pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1
pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs

I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory and should look like that

PL5,00
PL5,01

I started the code as follow:

clean_data = []
with open('svclist.csv', 'rt') as f:
    for line in f:
        if line.__contains__('profile'):
        print(line, end='')

and I’m stuck here.

Thanks in advance for the help.

Advertisement

Answer

you can use the regular expression – (PL5)[^/].{0,}([0-9]{2,2})

For explanation, just copy the regex and paste it here – ‘https://regexr.com’. This will explain how the regex is working and you can make the required changes.

import re
test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1',
                    'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs']

regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})")
result = [] 
for test_string in test_string_list:
    matchArray = regex.findall(test_string)
    result.append(matchArray[0])
with open('outfile.txt', 'w') as f:
    for row in result:
        f.write(f'{str(row)[1:-1]}n')

In the above code, I’ve created one empty list to hold the tuples. Then, I’m writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string. Then, I’m using formatted string to write content into ‘outfile.csv’

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement