How to iterate over dataframe such that all rows which have a specific column value in common are saved to their respective files?

Question

This questions was a little harder for me to phrase so I request to help edit the question which would make more sense (if necessary). Problem Statement: I want all the rows which have a specific column value in common, saved to same file. Example Code I want to do something like this. Say, I have a dataframe…

Accepted Answer

If your file (here all.csv) is large and you want to process csv in chunks, you can try this strategy: open a file when the first occurrence is met and save the handle into a dict. Next when you meet the same occurrence, load the handle and use it to write the data and so on.import pandas as pdimport pathlibDIRPATH = "/tmp/csv_folder"# create folder if it doesn't existdirpath = pathlib.Path(DIRPATH)dirpath.mkdir(parents=True, exist_ok=True)# chunksize=2 for demo purpose only...reader = pd.read_csv("all.csv", chunksize=2)streams = {}for df in reader:    for grp, dfg in df.groupby("col3"):        try:            buffer = streams[grp]            dfg.to_csv(buffer, index=False, header=False)        except KeyError:            # grp is met for the first time            buffer = open(dirpath / f"{grp}.csv", "w")            streams[grp] = buffer            dfg.to_csv(buffer, index=False)for fp in streams.values():    fp.close()$ cat /tmp/csv_folder/a.csvcol1,col2,col3,col41,3,a,23,5,a,2$ cat /tmp/csv_folder/b.csvcol1,col2,col3,col42,4,b,34,6,b,2$ cat /tmp/csv_folder/c.csvcol1,col2,col3,col46,2,c,2

Advertisement

Answer