Skip to content
Advertisement

How to iterate over dataframe such that all rows which have a specific column value in common are saved to their respective files?

This questions was a little harder for me to phrase so I request to help edit the question which would make more sense (if necessary).

Problem Statement: I want all the rows which have a specific column value in common, saved to same file.

Example Code I want to do something like this. Say, I have a dataframe:

JavaScript

enter image description here

I want to create csv files such that:

  • all rows where col3 is a, they all get saved in a.csv all rows
  • where col3 is b, they all get saved in b.csv all rows where
  • col3 is c, they all get saved in c.csv

Hypothesized Solution: The only way I can think of creating the CSV files is iterating through the dataframe per row and checking if the column (e.g. col3 val) has a csv created already, if not — create and add the rows or else append to exists csv file.

Issue: Above sample code is just a representation. I have a very large dataframe. If it helps, I know the unique value in the column in question (like, col3 is example) as a list somewhere. However, on of the most popular answer on how to iterate over a dataframe? : How to iterate over rows in a DataFrame in Pandas says (in the second answer there) that DON'T. I might have to use it as a last resort if there is no other way but if there is one, can someone help me get a better solution to this problem?

Advertisement

Answer

If your file (here all.csv) is large and you want to process csv in chunks, you can try this strategy: open a file when the first occurrence is met and save the handle into a dict. Next when you meet the same occurrence, load the handle and use it to write the data and so on.

JavaScript
JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement