Skip to content
Advertisement

Python: Dynamically growing CSV

I am building a CSV chunk by chunk using the csv module from the standard library.

This means that I am adding rows one by one in a loop. Each row that I add contains information for each column of my dataframe.

So, I have this CSV:

JavaScript

And I am adding rows one by one:

JavaScript

And so on.

My problem is that sometimes, the row that I am adding contains MORE information (that is, information that does not have a column). For example:

JavaScript

My question is: Is there any way to make the CSV grow (during runtime) when that happens? (with ‘grow’ I mean to add the “extra” columns)

So basically I want this to happen:

JavaScript

I am adding the rows using the csv module from the standard library, the with statement and a dictionary:

JavaScript

As you can see, in the dictionary that I’m adding, I specify the name of the new column. What happens when I try that is that I get this exception:

JavaScript

I have tried adding the “extra” fieldname to the csv before adding the row like this:

JavaScript

Note: It seems from this example that I already now that E will be added but that is not the case. I showed it like this just for the example. I don’t know what the “extra” data will be until I get the “extra” rows (which I get over a period of time from a web scrape).

That manages to evade the exception, but does not add the extra column, so I end up with something like this:

JavaScript

I am not using Pandas because I understand that Pandas is designed to load fully populated DataFrames, but I am open to using something besides the csv module if you suggest it. Any ideas regarding that?

Thanks for your help and sorry for the long question, I tried to be as clear as possible.

Advertisement

Answer

I think you would need to rewrite the entire file when that happens. Currently you are opening the file with a so you can only append stuff at the end, and not add something in the middle of the file. I don’t think there is an easy solution to add something in the middle of a file.

The easiest solution would then be to read the entire file into memory, add the new column to the header row and then rewrite the complete file.

See this question for an example of how you could do that.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement