Skip to content
Advertisement

Python: Dynamically growing CSV

I am building a CSV chunk by chunk using the csv module from the standard library.

This means that I am adding rows one by one in a loop. Each row that I add contains information for each column of my dataframe.

So, I have this CSV:

A     B      C     D

And I am adding rows one by one:

    A       B      C      D
  aaaaa   bbb    ccccc   ddddd
  a1a1a   b1b1   c1c1c1  d1d1d1
  a2a2a   b2b2   c2c2c2  d2d2d2

And so on.

My problem is that sometimes, the row that I am adding contains MORE information (that is, information that does not have a column). For example:

    A       B      C      D
  aaaaa   bbb    ccccc   ddddd
  a1a1a   b1b1   c1c1c1  d1d1d1
  a2a2a   b2b2   c2c2c2  d2d2d2
  a3a3a   b3b3   c3c3c3  d3d3d3   e3e3e3  #this row has extra information

My question is: Is there any way to make the CSV grow (during runtime) when that happens? (with ‘grow’ I mean to add the “extra” columns)

So basically I want this to happen:

    A       B      C       D        E    # this column was added because 
  aaaaa   bbb    ccccc   ddddd           # of the extra column found
  a1a1a   b1b1   c1c1c1  d1d1d1          # in the new row
  a2a2a   b2b2   c2c2c2  d2d2d2
  a3a3a   b3b3   c3c3c3  d3d3d3   e3e3e3

I am adding the rows using the csv module from the standard library, the with statement and a dictionary:

import csv

addThis = {A:'a3a3a', B:'b3b3', C:'c3c3c3', D:'d3d3d3', E:'e3e3e3'}

with open('csvFile', 'a') as f:
    writer = csv.writer(f)
    writer.writerow(addThis)

As you can see, in the dictionary that I’m adding, I specify the name of the new column. What happens when I try that is that I get this exception:

ValueError: dict contains fields not in fieldnames: 'E'

I have tried adding the “extra” fieldname to the csv before adding the row like this:

fields = writer.__getattribute__('fieldnames')
writer.fieldnames = fields + ['E']

Note: It seems from this example that I already now that E will be added but that is not the case. I showed it like this just for the example. I don’t know what the “extra” data will be until I get the “extra” rows (which I get over a period of time from a web scrape).

That manages to evade the exception, but does not add the extra column, so I end up with something like this:

    A       B      C       D
  aaaaa   bbb    ccccc   ddddd
  a1a1a   b1b1   c1c1c1  d1d1d1
  a2a2a   b2b2   c2c2c2  d2d2d2
  a3a3a   b3b3   c3c3c3  d3d3d3   e3e3e3   # value is added but the column
                                           # name is not there

I am not using Pandas because I understand that Pandas is designed to load fully populated DataFrames, but I am open to using something besides the csv module if you suggest it. Any ideas regarding that?

Thanks for your help and sorry for the long question, I tried to be as clear as possible.

Advertisement

Answer

I think you would need to rewrite the entire file when that happens. Currently you are opening the file with a so you can only append stuff at the end, and not add something in the middle of the file. I don’t think there is an easy solution to add something in the middle of a file.

The easiest solution would then be to read the entire file into memory, add the new column to the header row and then rewrite the complete file.

See this question for an example of how you could do that.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement