Skip to content
Advertisement

insert line break in python file after a fix number of elements to delimit columns in a csv file

I´ve been struggling a bit to find a way in python to force this file to create a jump to a new line after some number of elements (equal to the number of columns I will need to add which is 12) the CSV currently looks like this. enter image description here

the text of the first row looks like this .

D276″,31386,10610,12122021 00:00:47840 85,0.00+842646,M000395708109323,ACTIVE CARD CHECK,844-6593879,NY,59655,840 6511011091718056,D276,31386,10610,12122021 00:00:59840Y00,5.36-842647,M527021000201360,Etsy.com – TheCraftyCa Brooklyn,NY,56995,840 6511011091718056,D276,86495,29807,12122021 00:08:22840N51,11.99-842648,M248747000103177,GOOGLE *YouTubePremium g.co/helppay# CA,78295,840 6511016547548056,D276,29969,10038,12122021 00:27:19840 57,11.30-842649,M000445474354997,SPOTIFY,NEW YORK,NY,48995,840 6511010952148056,D276,62521,21152,12122021 00:28:54840N51,5.40-842650,M527021000211443,Google Play,Mountain View CA,58175,840 6511014173278056,D276,802,701,12122021 00:30:38840Y00,49.67-842651,M235251000762203,AMZN Mktp US,Amzn.com/bill WA,59425,840 6511010003058056,D276,114710,41280,12122021 00:31:22840Z00,21.92-842652,M000445488848992,DD *DOORDASH MCDONALDS SAN FRANCISCO CA,58125,840 6511019296778056,D276,125175,45529,12122021 00:31:50840Y05,0.00+842653,M145376000144509,PLAYSTATION NETWORK,800-345-7669 CA,58165,840 6511020299078056,D276,125175,45529,12122021 00:32:07840Y57,21.44-842654,M145376000144509,PLAYSTATION NETWORK,800-345-7669 CA,58165,840 6511020299078056,D276,125175,45529,12122021 00:32:08840Y57,21.44-842299,M527021000222747,PlaystationNetwork,San Mateo,CA,58185,840 6511020299078056,D276,125175,45529,12122021 00:32:09840Y57,21.44-842300,M527021000222747,PlaystationNetwork,San Mateo,CA,58185,840 6511020299078056,D276,125175,45529,12122021 00:32:09840 57,0.00+842655,MCARD ACCPT IDC,Sony – Playstation N.. St. Louis,USA,59695,840 6511020299078056,D276,125175,45529,12122021 00:32:27840Y57,21.44-842301,M145376000144509,PLAYSTATION NETWORK,800-345-7669 CA,58165,840 6511020299078056,D276,125175,45529,12122021 00:32:28840Y57,21.44-842657,M527021000222747,PlaystationNetwork,San Mateo,CA,58185,840 6511020299078056,D276,125175,45529,12122021 00:32:28840Y57,21.44-842656,M527021000222747,PlaystationNetwork,San Mateo,CA,58185,840 6511020299078056,D276,125175,45529,12122021 00:32:29840 57,0.00+842658,MCARD ACCPT IDC,Sony – Playstation N.. St. Louis,USA,59695,840 6511020299078056,D276,112802,40216,12122021 00:32:30840Y00,6.49-842659,M784959000762203,Amazon.com,Amzn.com/bill WA,59425,840 6511019112388056,D276,120407,44199,12122021 00:35:24840 05,3.12-67433,P536385810103481,MILLS FOOD CENTER,OAKLAND,CA,54115,840 6511019841028056,D276,120407,44199,12122021 00:35:48840 05,2.29-67434,P536385810103481,MILLS FOOD CENTER,OAKLAND,CA,54115,840 6511019841028056,D276,129143,47047,12122021

and I would like it to look something like this

enter image description here

Continuing until it completed all the registers in the original file.

Advertisement

Answer

The first thing I would try is simply split the line on commas, and write records using csv.writer, calling .writerow() with twelve elements at a time. I notice you have a double quote at the beginning, but not later, so this approach might be good enough, you would just have to remove that double quote. Of course, if any field in your file has commas within its text, my suggestion will fall appart, but it’s a place to start, since you seem to be trying to fix one specific file, rather than solving a general problem.

Here’s my implementation of that suggestion:

import csv

out_f = open("fixed-csv.txt", mode="w")
writer = csv.writer(out_f)
with open("bad-csv.txt") as in_f:
    for line in in_f:
        fields = line.strip("nr").split(",")
        for position in range(0, len(fields), 12):
            writer.writerow(fields[position:position+12])

Now, I noticed running that code that you don’t actually have exactly 12 columns per row, it’s more like 10 or 11, and it’s not constant.

Here’s a variant that looks for D276 and makes it the first column of each row:

import csv

out_f = open("fixed-csv-2.txt", mode="w")
writer = csv.writer(out_f)
with open("bad-csv.txt") as in_f:
    for line in in_f:
        fields = line.strip("nr").split(",")
        d276_positions = [
            i
            for i, value in enumerate(fields)
            if i == 0 or value == "D276"
        ]
        d276_positions.append(len(fields))
        for start, end in zip(d276_positions, d276_positions[1:]):
            writer.writerow(fields[start:end])

I don’t imagine all your data will have D276 as the first value in the row, so you might have to change if i == 0 or value == "D276" to something that more generally locates the field that flags a new row, but this code should set you up with that you need to solve your problem, assuming, as I said at the beginning, that you don’t have commas inside any fields in your whole data file.

If you do have commas in some of your fields, I would manually edit the output file with a text editor and patch the problems by hand. If there aren’t too many of them, it shouldn’t be a lot of work.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement