I have an text file as below format. I want to read through all the records in the file and output in a dataframe.
NEW ACCOUNT ABC COMPANY 00123 CCY/BALANCE USD 3,600 ACCOUNT APPROVAL ABC COMPANY 00123 NEW ACCOUNT BBC COMPANY 00124 CCY/BALANCE USD 5,600
Expected output:
TRAN DESCRIPTION CUSTOMER NAME A/C NO. CCY BALANCE NEW ACCOUNT ABC COMPANY 00123 USD 3,600.00 ACCOUNT APPROVAL ABC COMPANY 00123 NEW ACCOUNT BBC COMPANY 00124 USD 5,600.00
There will be two types of trans description. Code I am trying as below, but it only works for one line of the text file. How can I modify to read through all the records in the files? Thanks!
text = ‘NEW ACCOUNT ABC COMPANY 00123’ sep = ' ' lst = text.split(sep) while(' ' in lst) : lst.remove(' ') lst = np.array(lst).reshape(1,3) df = pd.DataFrame(lst,columns =['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.'])
Advertisement
Answer
Try this :
import pandas as pd import numpy as np from io import StringIO t = """NEW ACCOUNT ABC COMPANY 00123 CCY/BALANCE USD 3,600 ACCOUNT APPROVAL ABC COMPANY 00123 NEW ACCOUNT BBC COMPANY 00124 CCY/BALANCE USD 5,600""" names=['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.'] df = pd.read_fwf(StringIO(t), header=None, names=names) # or df = pd.read_fwf(r'path_to_your_textfile.txt', header=None, names=names) df['CCY/BALANCE'] = np.where(df['CUSTOMER NAME'] == 'CCY/BALANCE', df['A/C NO.'], np.nan) df['CCY/BALANCE'] = df['CCY/BALANCE'].shift(-1) out = df[df['TRAN DESCRIPTION'].notna()].reset_index(drop=True)