Advertisement

Read through a text file and output to a dataframe by Python

dataframe pandas python

chilleung6

asked 13 Sep, 2022

I have an text file as below format. I want to read through all the records in the file and output in a dataframe.

NEW ACCOUNT       ABC COMPANY  00123

                  CCY/BALANCE  USD 3,600

ACCOUNT APPROVAL  ABC COMPANY  00123

NEW ACCOUNT       BBC COMPANY  00124

                  CCY/BALANCE  USD 5,600

JavaScript
​x
 
NEW ACCOUNT       ABC COMPANY  00123
​
                  CCY/BALANCE  USD 3,600
​
ACCOUNT APPROVAL  ABC COMPANY  00123
​
NEW ACCOUNT       BBC COMPANY  00124
​
                  CCY/BALANCE  USD 5,600
​

Expected output:

TRAN DESCRIPTION CUSTOMER NAME  A/C NO. CCY BALANCE
NEW ACCOUNT      ABC COMPANY    00123   USD 3,600.00
ACCOUNT APPROVAL ABC COMPANY    00123        
NEW ACCOUNT      BBC COMPANY    00124   USD 5,600.00

JavaScript
 
TRAN DESCRIPTION CUSTOMER NAME  A/C NO. CCY BALANCE
NEW ACCOUNT      ABC COMPANY    00123   USD 3,600.00
ACCOUNT APPROVAL ABC COMPANY    00123        
NEW ACCOUNT      BBC COMPANY    00124   USD 5,600.00
​

There will be two types of trans description. Code I am trying as below, but it only works for one line of the text file. How can I modify to read through all the records in the files? Thanks!

text = ‘NEW ACCOUNT  ABC COMPANY  00123’
sep = '  '
lst = text.split(sep)
while(' ' in lst) :
   lst.remove(' ')
lst = np.array(lst).reshape(1,3)

df = pd.DataFrame(lst,columns =['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.'])

JavaScript
 
text = ‘NEW ACCOUNT  ABC COMPANY  00123’
sep = '  '
lst = text.split(sep)
while(' ' in lst) :
   lst.remove(' ')
lst = np.array(lst).reshape(1,3)
​
df = pd.DataFrame(lst,columns =['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.'])
​

Advertisement

Answer

Try this :

import pandas as pd
import numpy as np
from io import StringIO

t = """NEW ACCOUNT       ABC COMPANY  00123

                  CCY/BALANCE  USD 3,600

ACCOUNT APPROVAL  ABC COMPANY  00123

NEW ACCOUNT       BBC COMPANY  00124

                  CCY/BALANCE  USD 5,600"""

names=['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.']
df = pd.read_fwf(StringIO(t), header=None, names=names)
# or df = pd.read_fwf(r'path_to_your_textfile.txt', header=None, names=names)

df['CCY/BALANCE'] = np.where(df['CUSTOMER NAME'] == 'CCY/BALANCE', df['A/C NO.'], np.nan)
df['CCY/BALANCE'] = df['CCY/BALANCE'].shift(-1)

out = df[df['TRAN DESCRIPTION'].notna()].reset_index(drop=True)

JavaScript
 
import pandas as pd
import numpy as np
from io import StringIO
​
t = """NEW ACCOUNT       ABC COMPANY  00123
​
                  CCY/BALANCE  USD 3,600
​
ACCOUNT APPROVAL  ABC COMPANY  00123
​
NEW ACCOUNT       BBC COMPANY  00124
​
                  CCY/BALANCE  USD 5,600"""
​
names=['TRAN DESCRIPTION', 'CUSTOMER NAME', 'A/C NO.']
df = pd.read_fwf(StringIO(t), header=None, names=names)
# or df = pd.read_fwf(r'path_to_your_textfile.txt', header=None, names=names)
​
df['CCY/BALANCE'] = np.where(df['CUSTOMER NAME'] == 'CCY/BALANCE', df['A/C NO.'], np.nan)
df['CCY/BALANCE'] = df['CCY/BALANCE'].shift(-1)
​
out = df[df['TRAN DESCRIPTION'].notna()].reset_index(drop=True)
​

`>>> display(out)`

Advertisement