I have a huge CSV file with sample data that looks like so:
JavaScript
x
6
1
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
2
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
3
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
4
"";"";"Capital One Quicksilver";"";"DMV";""Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
5
"";"";"Capital One Quicksilver";"";"Gas";""USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
6
The original CVS file had some unnecessary characters that I removed to obtain the data as shown above using the following code:
JavaScript
1
13
13
1
import matplotlib.pyplot as plt
2
import pandas as pd
3
import numpy as np
4
import scipy as sp
5
6
text = open("report.csv", "r")
7
text = ''.join([i for i in text])
8
.replace('old', 'new')
9
x = open("report_mod.csv","w")
10
x.writelines(text)
11
12
x.close()
13
Where I’m stuck now is, how do I replace the double quotes (“”) with single quotes (“) for all the entries of the field column Payee
?
In the above example, the 3 entries for the Payee
is ""Carfax"
, ""Carfax"
, and ""USA Petroleum"
. I would like to replace the double quotes at the beginning with single quotes, i.e. "Carfax"
, "Carfax"
, and "USA Petroleum"
The new CSV file should look like so:
JavaScript
1
6
1
"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"
2
"Capital One Quicksilver";"-119.99";"USD";"";"";"";"";";"";"";";"";""
3
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
4
"";"";"Capital One Quicksilver";"";"DMV";"Carfax";"";"08/19/2004";"";"-24.99";"USD";"";""
5
"";"";"Capital One Quicksilver";"";"Gas";"USA Petroleum";"";"09/13/2004";"";"-20.43";"USD";"";""
6
Sample data file: report.csv
Advertisement
Answer
You can use regex maybe
JavaScript
1
3
1
import re
2
text = re.sub('^""$', '"' ,text)
3
so full code must be like this:
JavaScript
1
15
15
1
import matplotlib.pyplot as plt
2
import pandas as pd
3
import numpy as np
4
import scipy as sp
5
import re
6
7
# reading CSV file
8
data = pd.read_csv("report.csv",delimiter=';')
9
10
for val in data['Payee']:
11
val = str (val)
12
newVal = re.sub(r'"', '' ,val)
13
newVal = '"'+newVal+'"'
14
print(newVal)
15
the output is this on my terminal:
JavaScript
1
5
1
"nan"
2
"Carfax"
3
"Carfax"
4
"USA Petroleum"
5
Edit: Add full code to create the file
JavaScript
1
44
44
1
import matplotlib.pyplot as plt
2
import pandas as pd
3
import numpy as np
4
import scipy as sp
5
import re
6
7
# reading CSV file
8
data = pd.read_csv("report.csv",delimiter=';')
9
names = data['Name'].tolist();
10
balances = data['Current balance'].tolist();
11
accounts = data['Account'].tolist();
12
transfers = data['Transfers'].tolist();
13
descriptions = data['Description'].tolist();
14
categories = data['Category'].tolist();
15
dates = data['Date'].tolist();
16
memos = data['Memo'].tolist();
17
amount = data['Amount'].tolist();
18
currency = data['Currency'].tolist();
19
check = data['Check #'].tolist();
20
tags = data['Tags'].tolist();
21
22
counter = 0
23
f = open("report_modified.csv", "w+")
24
f.write('"Name";"Current balance";"Account";"Transfers";"Description";"Payee";"Category";"Date";"Memo";"Amount";"Currency";"Check #";"Tags"n');
25
for val in data['Payee']:
26
val = str (val)
27
newVal = re.sub(r'"', '' ,val)
28
newVal = '"'+newVal+'"'
29
print(newVal)
30
f.write(str(names[counter])+';')
31
f.write(str(balances[counter])+';')
32
f.write(str(accounts[counter])+';')
33
f.write(str(transfers[counter])+';')
34
f.write(str(descriptions[counter])+';')
35
f.write(str(newVal)+';')
36
f.write(str(categories[counter])+';')
37
f.write(str(dates[counter])+';')
38
f.write(str(memos[counter])+';')
39
f.write(str(amount[counter])+';')
40
f.write(str(currency[counter])+';')
41
f.write(str(check[counter])+';')
42
f.write(str(tags[counter])+'n')
43
f.close()
44