I have an input data of the form:
[2] IN: 2.12 INOUT: 3.52 (Input) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output)
I need to parse through this data and the IN: / OUT: /INOUT: depending on three regexes given as:
regex1 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s" regex2 = r"[2]s*OUT:s*(S+?)s*INOUT:s*(S+?)s" regex3 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s.*?.s*OUT:s*(S+?)s*INOUT:s*(S+?)s"
My output should be:
IN_r1 2.12 INOUT_r1 3.52 IN_r3 2.12 INOUT1_r3 3.52 OUT_r3 2.42 INOUT2_r3 2.62 OUT_r2 2.42 INOUT_r2 2.62 IN_r3 2.12 INOUT1_r3 3.52 OUT_r3 2.42 INOUT2_r3 2.62 IN_r1 2.12 INOUT_r1 3.52 OUT_r2 2.42 INOUT_r2 2.62 IN_r3 2.12 INOUT1_r3 3.52 OUT_r3 2.42 INOUT2_r3 2.62
The code I tried:
import re regex1 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s" regex2 = r"[2]s*OUT:s*(S+?)s*INOUT:s*(S+?)s" regex3 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s.*?.s*OUT:s*(S+?)s*INOUT:s*(S+?)s" data = " [2] IN: 2.12 INOUT: 3.52 (Input) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) " lines = re.split("[2]",data) for line in lines: if re.search(regex1,data): tracks = re.findall(regex1,data,re.DOTALL) for track in tracks: input,inout = (float(z) for z in track) with open("checked_ant.txt",'a') as a: print("IN_r1",input,"INOUT_r1",inout,file=a) elif re.search(regex2,data): tracks = re.findall(regex2,data,re.DOTALL) for track in tracks: output,inout = (float(z) for z in track) with open("checked_ant.txt",'a') as a: print("OUT_r2",output,"INOUT_r2",inout,file=a) elif re.search(regex3,data): tracks = re.findall(regex3,data,re.DOTALL) for track in tracks: input,inout1,output,inout2 = (float(z) for z in track) with open("checked_ant.txt",'a') as a: print("IN_r3",input,"INOUT1_r3",inout1,"OUT_r3",output,"INOUT2_r3",inout2,file=a)
The problem I face is that it does not parse correctly and it is not getting matched for each subdata beginning with [2]
Advertisement
Answer
Though I find the requirement quite strange(regex is provided and cannot change), I got the expected result. Can you try.
import re s = '''[2] IN: 2.12 INOUT: 3.52 (Input) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) [2] OUT: 2.42 INOUT: 2.62 (Output) [2] IN: 2.12 INOUT: 3.52 (Input) OUT: 2.42 INOUT: 2.62 (Output)''' r1 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s" r2 = r"[2]s*OUT:s*(S+?)s*INOUT:s*(S+?)s" r3 = r"[2]s*IN:s*(S+?)s*INOUT:s*(S+?)s.*?.s*OUT:s*(S+?)s*INOUT:s*(S+?)s" def g(reg, s, n): return float(re.search(reg, s).group(n)) paras = s.split('nn') for p in paras: if re.search(r1, p): print(f'IN_r1 {g(r1, p, 1)} INOUT_r1 {g(r1, p, 2)}') if re.search(r2, p): print(f'OUT_r2 {g(r2, p, 1)} INOUT_r2 {g(r2, p, 2)}') if re.search(r3, p): print( f'IN_r3 {g(r3, p, 1)} INOUT1_r3 {g(r3, p, 2)} OUT_r3 {g(r3, p, 3)} INOUT2_r3 {g(r3, p, 4)}')
Update
For better performance, you can match only once, and get the groups. Take r1 as example:
gs = re.search(r1, p) if gs: print(f'IN_r1 {gs.group(1)} INOUT_r1 {gs.group(2)}')