I have a text file where I have several lines and between them, some data which I need to convert to the dataframe(useful data).
I iterated the text file line by line and captured the useful data with the help of a regex.
Something like this,
pattern = r'^(s)(d+)(s+)(d)(s+)(w+)(s+)(w+)(.*)'
capture_data = []
with open(file,'r') as file_obj:
lineList = file_obj.readlines()
for line in lineList:
info_list = re.search(pattern, line)
if info_list is not None:
capture_data.append(line)
The data captured look like this
' 100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0000 V -427.9508 mV -300.0000 mV -100.0000 uA 0 n'
' 100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0000 V -430.3089 mV -300.0000 mV -100.0000 uA 0 n'
I thought to iterate each captured row and split on the basis of whitespace, but the issue is, the units have white space in between them and the value, for example….
-300.0000 mV, -100.0000 uA etc
Also one more issue is the trailing newline character, it is also been treated as a new element in the .split(” “).
Can someone please help to find some smarter way to do this?
All I want is to have the values as a separate column value.
For example in the first string,
100 becomes 1st col, 0 – 2nd, PASS – 3rd, Continuity_PPMU_mV – 4th, etc…
Thanks.
Edit:
The raw data somewhat look like this –
Site Number:
0, 1, 2, 3
Device#: 1-4
*********************************************************************
FT45434HAP PQF64 Test @ RHC
*********************************************************************
---------------------------Continuity Test---------------------------
Number Site Result Test Name Pin Channel Low Measured High Force Loc
100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0000 V -427.9508 mV -300.0000 mV -100.0000 uA 0
100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0000 V -430.3089 mV -300.0000 mV -100.0000 uA 0
100 2 PASS Continuity_PPMU_mV XSCI 76 -1.0000 V -430.7492 mV -300.0000 mV -100.0000 uA 0
100 3 PASS Continuity_PPMU_mV XSCI 204 -1.0000 V -431.0482 mV -300.0000 mV -100.0000 uA 0
101 0 PASS Continuity_PPMU_mV XSCO 139 -1.0000 V -456.0359 mV -300.0000 mV -100.0000 uA 0
101 1 PASS Continuity_PPMU_mV XSCO 11 -1.0000 V -458.0605 mV -300.0000 mV -100.0000 uA 0
101 2 PASS Continuity_PPMU_mV XSCO 75 -1.0000 V -457.8564 mV -300.0000 mV -100.0000 uA 0
EDIT:
The top rows are not fixed, they are dynamically generated. Also, some other text data can appear in between the relevant data, like between two useful rows. So, I don’t think skipping rows will work here.
Advertisement
Answer
- Read the file and look for the row the starts with
'Number'
, and then append those rows after that todata
. - In the data rows, only the units are separated by a space.
- It’s better to have the unit separate from the numeric value, so we can split the rows on spaces.
- Create a new header, with new columns for the units.
- This will allow the numeric values to be interpreted as floats.
import pandas as pd
import seaborn as sns
# read the file in
data = list()
with open('test.txt', 'r') as f:
rows = f.readlines()
flag = False # flag to True once the header row with Number is found
for row in rows:
row = row.strip()
if row.startswith('Number'):
flag = True
continue # after the header row is found, skip it
if flag:
data.append(row.split()) # append rows after the header to data
# create a custom header where the unites have been added as column headers
header = ['Number', 'Site', 'Result', 'Test_Name', 'Pin', 'Channel', 'Low', 'U1', 'Measured', 'U2', 'High', 'U3', 'Force', 'U4', 'Loc']
# create the dataframe
df = pd.DataFrame(data, columns=header)
# save to csv
df.to_csv('file.csv', index=False)
# convert columns to numeric dtypes
df = df.apply(pd.to_numeric, errors='ignore')
# scale the columns as per their units
df.Measured = df.Measured.div(1000)
df.High = df.High.div(1000)
df.Force = df.Force.div(100000)
# display(df)
Number Site Result Test_Name Pin Channel Low U1 Measured U2 High U3 Force U4 Loc
0 100 0 PASS Continuity_PPMU_mV XSCI 140 -1.0 V -0.427951 mV -0.3 mV -0.001 uA 0
1 100 1 PASS Continuity_PPMU_mV XSCI 12 -1.0 V -0.430309 mV -0.3 mV -0.001 uA 0
2 100 2 PASS Continuity_PPMU_mV XSCI 76 -1.0 V -0.430749 mV -0.3 mV -0.001 uA 0
3 100 3 PASS Continuity_PPMU_mV XSCI 204 -1.0 V -0.431048 mV -0.3 mV -0.001 uA 0
4 101 0 PASS Continuity_PPMU_mV XSCO 139 -1.0 V -0.456036 mV -0.3 mV -0.001 uA 0
5 101 1 PASS Continuity_PPMU_mV XSCO 11 -1.0 V -0.458060 mV -0.3 mV -0.001 uA 0
6 101 2 PASS Continuity_PPMU_mV XSCO 75 -1.0 V -0.457856 mV -0.3 mV -0.001 uA 0
# plot
ax = sns.lineplot(data=df.iloc[:, 6:-2])
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')