How to split a dataframe by unique groups and save to a csv

Question

I have a pandas dataframe I would like to iterate over. A simplified example of my dataframe: I would like to iterate over each unique gene and create a new file named: For the above example I should get three iterations with 3 outfiles and 3 dataframes: The resulting data frame contents split up by chunks will be sent to

Accepted Answer

You can obtain the unique values calling unique, iterate over this, build the filename and write this out to csv:genes = df['Gene'].unique()for gene in genes:    outfilename = gene + '.pdf'    print(outfilename)    df[df['Gene'] == gene].to_csv(outfilename)HAPPY.pdfSAD.pdfLEG.pdfA more pandas-thonic method is to groupby on 'Gene' and then iterate over the groups:gp = df.groupby('Gene')# groups() returns a dict with 'Gene':indices as k:v pairfor g in gp.groups.items():    print(df.loc[g[1]])           chr  start  end   Gene  Value  MoreData0  chr1    123  123  HAPPY   41.1       3.41  chr1    125  129  HAPPY   45.9       4.52  chr1    140  145  HAPPY   39.3       4.1    chr  start  end Gene  Value  MoreData3  chr1    342  355  SAD   34.2       9.04  chr1    360  361  SAD   44.3       8.15  chr1    390  399  SAD   29.0       7.26  chr1    400  411  SAD   35.6       6.5    chr  start  end Gene  Value  MoreData7  chr1    462  470  LEG     20       2.7

Advertisement

Answer