I’m currently working on a script in python. I want to convert an xls file into a txt file but I also want to clean and manage the data. In the xls files, there’s 4 columns which does interest me. Here is a sample of the txt I get from the conversion :
JavaScript
x
5
1
OPEN 0 a_inst0 signal_a
2
OPEN 0 b_inst0 signal_b
3
a_inst0 signal_c OPEN 0
4
c_inst0 signal_d OPEN 0
5
To get this result I used this script :
JavaScript
1
31
31
1
import re
2
3
# Function to convert
4
def listToStringOpen(s):
5
6
str1 = ""
7
8
for ele in s:
9
str1 += ele
10
str1 += "n"
11
12
return str1
13
14
import pandas as pd
15
16
df = pd.read_excel('my_xls.xlsm', sheet_name='Sheet1', usecols="A:D")
17
with open('my_txt.txt', 'w', encoding ='utf-8', errors='ignore') as outfile:
18
df.to_string(outfile)
19
20
with open('my_txt.txt', 'r', encoding ='utf-8', errors='ignore') as f:
21
data = f.readlines()
22
23
DONOTIGNORE2 = 'OPEN'
24
cleaned_lines = []
25
for line in data:
26
if (DONOTIGNORE2 not in line) :
27
continue
28
cleaned_lines.append(line.rstrip())
29
with open('result_open.txt', 'w', encoding ='utf-8', errors='ignore') as f:
30
f.write(listToStringOpen(cleaned_lines))
31
That’s a good beginning but now I’d like to remove the “OPEN 0” of each file but I don’t know how to do it because it could be placed eather in the A:B columns or in the C:D columns. Also the result in the txt is not vertically aligned :( . Do you have any ideas ?
Thanks a lot
Advertisement
Answer
Please find below the updated code. This is the input I took, added a couple of rows to ensure it works fine..
Few changes:
- read_excel has
header=None
as your data did not have header - While writing to my_txt, I added
index=False, header=False
so that there is no index added to the txt file and no headers. This way your output file will only have data - I used the dataframe to clean the OPEN and ‘0’ when they are in columns A:B or C:D ONLY. Otherwise, leave them alone.
JavaScript
1
37
37
1
import re
2
3
# Function to convert
4
def listToStringOpen(s):
5
6
str1 = ""
7
8
for ele in s:
9
str1 += ele
10
str1 += "n"
11
12
return str1
13
14
import pandas as pd
15
16
df = pd.read_excel('my_xls.xlsm', sheet_name='Sheet1', usecols="A:D", header=None)
17
with open('my_txt.txt', 'w', encoding ='utf-8', errors='ignore') as outfile:
18
df.to_string(outfile, index=False, header=False)
19
20
with open('my_txt.txt', 'r', encoding ='utf-8', errors='ignore') as f:
21
data = f.readlines()
22
23
df = df.astype(str)
24
cleaned_lines = []
25
row = 0
26
for row in range(len(df)):
27
print(df.iloc[row,:][0], df.iloc[row,:][1], df.iloc[row,:][2], df.iloc[row,:][3])
28
if ((df.iloc[row,:][0].strip() == 'OPEN') and (df.iloc[row,:][1].strip() == "0")) :
29
df.iloc[row,:][0] = ""
30
df.iloc[row,:][1] = ""
31
elif ((df.iloc[row,:][2].strip() == 'OPEN') and (df.iloc[row,:][3].strip() == "0")):
32
df.iloc[row,:][2] = ""
33
df.iloc[row,:][3] = ""
34
35
with open('result_open.txt', 'w', encoding ='utf-8', errors='ignore') as f:
36
df.to_string(f, index=False, header=False)
37