I got a csv file ‘svclist.csv’ which contains a single column list as follows:
pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1 pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs
I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory and should look like that
PL5,00 PL5,01
I started the code as follow:
clean_data = [] with open('svclist.csv', 'rt') as f: for line in f: if line.__contains__('profile'): print(line, end='')
and I’m stuck here.
Thanks in advance for the help.
Advertisement
Answer
you can use the regular expression – (PL5)[^/].{0,}([0-9]{2,2})
For explanation, just copy the regex and paste it here – ‘https://regexr.com’. This will explain how the regex is working and you can make the required changes.
import re test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1', 'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs'] regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})") result = [] for test_string in test_string_list: matchArray = regex.findall(test_string) result.append(matchArray[0]) with open('outfile.txt', 'w') as f: for row in result: f.write(f'{str(row)[1:-1]}n')
In the above code, I’ve created one empty list to hold the tuples. Then, I’m writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string. Then, I’m using formatted string to write content into ‘outfile.csv’