I have a huge CSV file with sample data that looks like so:
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags" "Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";"" "";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";"" "";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";"" "";"";"Capital One Quicksilver";"";"Gas";""USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
The original CVS file had some unnecessary characters that I removed to obtain the data as shown above using the following code:
import matplotlib.pyplot as plt import pandas as pd import numpy as np import scipy as sp text = open("report.csv", "r") text = ''.join([i for i in text]) .replace('old', 'new') x = open("report_mod.csv","w") x.writelines(text) x.close()
Where I’m stuck now is, how do I replace the double quotes (“”) with single quotes (“) for all the entries of the field column Payee
?
In the above example, the 3 entries for the Payee
is ""Carfax"
, ""Carfax"
, and ""USA Petroleum"
. I would like to replace the double quotes at the beginning with single quotes, i.e. "Carfax"
, "Carfax"
, and "USA Petroleum"
The new CSV file should look like so:
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags" "Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";"" "";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";"" "";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";"" "";"";"Capital One Quicksilver";"";"Gas";"USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
Sample data file: report.csv
Advertisement
Answer
You can use regex maybe
import re text = re.sub('^""$', '"' ,text)
so full code must be like this:
import matplotlib.pyplot as plt import pandas as pd import numpy as np import scipy as sp import re # reading CSV file data = pd.read_csv("report.csv",delimiter=';') for val in data['Payee']: val = str (val) newVal = re.sub(r'"', '' ,val) newVal = '"'+newVal+'"' print(newVal)
the output is this on my terminal:
"nan" "Carfax" "Carfax" "USA Petroleum"
Edit: Add full code to create the file
import matplotlib.pyplot as plt import pandas as pd import numpy as np import scipy as sp import re # reading CSV file data = pd.read_csv("report.csv",delimiter=';') names = data['Name'].tolist(); balances = data['Current balance'].tolist(); accounts = data['Account'].tolist(); transfers = data['Transfers'].tolist(); descriptions = data['Description'].tolist(); categories = data['Category'].tolist(); dates = data['Date'].tolist(); memos = data['Memo'].tolist(); amount = data['Amount'].tolist(); currency = data['Currency'].tolist(); check = data['Check #'].tolist(); tags = data['Tags'].tolist(); counter = 0 f = open("report_modified.csv", "w+") f.write('"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"n'); for val in data['Payee']: val = str (val) newVal = re.sub(r'"', '' ,val) newVal = '"'+newVal+'"' print(newVal) f.write(str(names[counter])+';') f.write(str(balances[counter])+';') f.write(str(accounts[counter])+';') f.write(str(transfers[counter])+';') f.write(str(descriptions[counter])+';') f.write(str(newVal)+';') f.write(str(categories[counter])+';') f.write(str(dates[counter])+';') f.write(str(memos[counter])+';') f.write(str(amount[counter])+';') f.write(str(currency[counter])+';') f.write(str(check[counter])+';') f.write(str(tags[counter])+'n') f.close()