Skip to content
Advertisement

Remove specific string char at the beginning of each lines of a txt file using python

I’m currently working on a script in python. I want to convert an xls file into a txt file but I also want to clean and manage the data. In the xls files, there’s 4 columns which does interest me. Here is a sample of the txt I get from the conversion :

OPEN                              0      a_inst0               signal_a
OPEN                              0      b_inst0               signal_b
a_inst0                           signal_c OPEN                0
c_inst0                           signal_d OPEN                0

To get this result I used this script :

import re

# Function to convert  
def listToStringOpen(s): 
    
    str1 = "" 
    
    for ele in s: 
        str1 += ele
        str1 += "n"
            
    return str1

import pandas as pd

df = pd.read_excel('my_xls.xlsm', sheet_name='Sheet1', usecols="A:D")
with open('my_txt.txt', 'w', encoding ='utf-8', errors='ignore') as outfile:
    df.to_string(outfile)

with open('my_txt.txt', 'r', encoding ='utf-8', errors='ignore') as f:
    data = f.readlines()

DONOTIGNORE2 = 'OPEN'
cleaned_lines = []
for line in data:
    if (DONOTIGNORE2 not in line) :
        continue
    cleaned_lines.append(line.rstrip())    
with open('result_open.txt', 'w', encoding ='utf-8', errors='ignore') as f:
    f.write(listToStringOpen(cleaned_lines))

That’s a good beginning but now I’d like to remove the “OPEN 0” of each file but I don’t know how to do it because it could be placed eather in the A:B columns or in the C:D columns. Also the result in the txt is not vertically aligned :( . Do you have any ideas ?

Thanks a lot

Advertisement

Answer

Please find below the updated code. This is the input I took, added a couple of rows to ensure it works fine..

input xlsx

Few changes:

  1. read_excel has header=None as your data did not have header
  2. While writing to my_txt, I added index=False, header=False so that there is no index added to the txt file and no headers. This way your output file will only have data
  3. I used the dataframe to clean the OPEN and ‘0’ when they are in columns A:B or C:D ONLY. Otherwise, leave them alone.
import re

# Function to convert  
def listToStringOpen(s): 
    
    str1 = "" 
    
    for ele in s: 
        str1 += ele
        str1 += "n"
            
    return str1

import pandas as pd

df = pd.read_excel('my_xls.xlsm', sheet_name='Sheet1', usecols="A:D", header=None)
with open('my_txt.txt', 'w', encoding ='utf-8', errors='ignore') as outfile:
    df.to_string(outfile, index=False, header=False)

with open('my_txt.txt', 'r', encoding ='utf-8', errors='ignore') as f:
    data = f.readlines()

df = df.astype(str)
cleaned_lines = []
row = 0
for row in range(len(df)):
    print(df.iloc[row,:][0], df.iloc[row,:][1], df.iloc[row,:][2], df.iloc[row,:][3])
    if ((df.iloc[row,:][0].strip() == 'OPEN') and (df.iloc[row,:][1].strip() == "0")) :
        df.iloc[row,:][0] = ""
        df.iloc[row,:][1] = ""
    elif ((df.iloc[row,:][2].strip() == 'OPEN') and (df.iloc[row,:][3].strip() == "0")):
        df.iloc[row,:][2] = ""
        df.iloc[row,:][3] = ""

with open('result_open.txt', 'w', encoding ='utf-8', errors='ignore') as f:
    df.to_string(f, index=False, header=False)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement