I’m writing a script that combine csv into one and then do a find an replace for specific formatting and ask the user to correct the ones that not match.
import glob import re import csv read_files = glob.glob("*.txt") final_file = 'TABS3.ASC' strformat1 = re.compile(r'd{3}.d{5}') #combine all files into one with open(final_file, "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read()) #read new file and check for matching string with open(final_file, newline='') as fin: check = list(csv.reader(fin)) for row in check: if strformat1.match(row[0]): pass else: show = print (row[0]) find = row[0] replace = input ('How do you want to change: ') check = ''.join(str(i) for i in check) .replace(str(find), str(replace)) with open(final_file, "w") as final: final.write(check)
The output is not on the right csv format
['710.00501', '79', '1', 'L140', 'A110', '10/10/2022', '0.60', '', '', ""]['774.00006', '79', '1', 'L160', 'A103', '10/10/2022', '0.50', '', '', '']['714.00150', '79', '1', 'L140', 'A110', '10/10/2022', '4.00', '', '', '']
It should be
"710.00501", "79", "1", "L140", "A110", "10/10/2022", "0.60", "", "", "" "774.00006", "79", "1", "L160", "A103", "10/10/2022", "0.50", "", "", "" "714.00150", "79", "1", "L140", "A110", "10/10/2022", "4.00", "", "", ""
Advertisement
Answer
This problems has a number of different parts, and you’ve made a good attempt :)
I recommend breaking down the problem into three distinct steps, and chunks of code:
- Read all the txt files, as CSVs, and add each set of rows to a list of all rows.
- Iterate all the rows and modify it to fit your needs.
- Finally, write all the rows out as CSV.
It gives me the benefit of being able to print out the data and visualize what just happened; it also keeps the code less indented and easier to read.
I mocked up three very small input TXT files based on the data in your post:
input1.txt ========== 710.00501,79,1,L140,A110,10/10/2022,0.60,,, 999.44,22,2,Z333,Y111,10/10/2022,0.66,,, input2.txt ========== 774.00006,79,1,L160,A103,10/10/2022,0.50,,, input3.txt ========== 714.00150,79,1,L140,A110,10/10/2022,4.00,,,
I also added a line that won’t match your regex, to exercise the code in step 2.
Get all the rows
I say “as CSV” because I see you opening the files as text to collect all the rows; and also as text to write a row out. This might work, but if the files don’t have a proper, final newline, then you’ll end up merging the last line of the current file and the first line of the next file, like:
999.44,22,2,Z333,Y111,10/10/2022,0.66,,,774.00006,79,1,L160,A103,10/10/2022,0.50,,,
It’s always safer to use the csv readers/writers on CSV-like files: they will see and respect distinct rows of data, as oppossed to just lines of text.
all_rows = [] for fname in glob.glob("input*.txt"): with open(fname, newline="") as f: reader = csv.reader(f) all_rows.extend(list(reader)) pprint.pprint(all_rows)
The csv docs recommend using newline=”” in all instances of reading and writing.
Run that and we can see all the rows:
[['710.00501', '79', '1', 'L140', 'A110', '10/10/2022', '0.60', '', '', ''], [ '999.44', '22', '2', 'Z333', 'Y111', '10/10/2022', '0.66', '', '', ''], ['774.00006', '79', '1', 'L160', 'A103', '10/10/2022', '0.50', '', '', ''], ['714.00150', '79', '1', 'L140', 'A110', '10/10/2022', '4.00', '', '', '']]
This printout also helps us visualize that the csv reader thinks of a row as a list of strings; all_rows is a list of a list of strings.
Next, dealing with data like ‘999.44’.
Modify rows selectively
I’m not exactly sure what you’re trying to do with that join() statement and the replace variable. The following code shows how to change just the first column of a row that doesn’t match your regex.
I like to check if a row matches some condition which means I should ignore it, then use the continue keyword to jump to the next row; it removes a level of indentation and avoids an else statement, which I usually find preferable:
num_re = re.compile(r"d{3}.d{5}") for row in all_rows: some_num = row[0] if num_re.match(some_num): continue change = input(f"n'{some_num}' doesn't match, change it to: ") row[0] = change.strip() print() pprint.pprint(all_rows)
Running that, I can change ‘999.44’ to ‘999.44009’ (again, not sure if that’s what you actually need):
'999.44' doesn't match, change it to: 999.44009 [['710.00501', '79', '1', 'L140', 'A110', '10/10/2022', '0.60', '', '', ''], ['999.44009', '22', '2', 'Z333', 'Y111', '10/10/2022', '0.66', '', '', ''], ['774.00006', '79', '1', 'L160', 'A103', '10/10/2022', '0.50', '', '', ''], ['714.00150', '79', '1', 'L140', 'A110', '10/10/2022', '4.00', '', '', '']]
If you want to replace some piece of data in every column for the entire row, I’d use a traditional for-loop over every column:
for i, col in enumerate(row): row[i] = col.replace("foo", "bar")
That’s very easy to read and make correct.
Write all the rows
We’ve been able to visualize the data at every step and ensure its correctness. Writing it out is just a few lines, now:
with open("output.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerows(all_rows)
My output.csv looks like:
710.00501,79,1,L140,A110,10/10/2022,0.60,,, 999.44009,22,2,Z333,Y111,10/10/2022,0.66,,, 774.00006,79,1,L160,A103,10/10/2022,0.50,,, 714.00150,79,1,L140,A110,10/10/2022,4.00,,,
Putting it all together, final thoughts
Without all the debug-print statements, solving the problem looks rather compact:
all_rows = [] for fname in glob.glob("input*.txt"): with open(fname, newline="") as f: reader = csv.reader(f) all_rows.extend(list(reader)) num_re = re.compile(r"d{3}.d{5}") for row in all_rows: some_num = row[0] if num_re.match(some_num): continue change = input(f"n'{some_num}' doesn't match, change it to: ") row[0] = change.strip() with open("output.csv", "w", newline="") as f: writer = csv.writer(f) writer.writerows(all_rows)
I also saw you do this in your attempt:
... check = list(csv.reader(fin)) for row in check: ...
We can use the list() function on a reader to get all the rows for later processing, like we did for all_rows. If we want to just step over each row, one at time (and not save every row in memory), we can use a for-loop directly on the reader:
reader = csv.reader(fin) for row in reader: ...