I have a pandas dataframe I would like to iterate over. A simplified example of my dataframe:
chr start end Gene Value MoreData chr1 123 123 HAPPY 41.1 3.4 chr1 125 129 HAPPY 45.9 4.5 chr1 140 145 HAPPY 39.3 4.1 chr1 342 355 SAD 34.2 9.0 chr1 360 361 SAD 44.3 8.1 chr1 390 399 SAD 29.0 7.2 chr1 400 411 SAD 35.6 6.5 chr1 462 470 LEG 20.0 2.7
I would like to iterate over each unique gene and create a new file named:
for Gene in df: ## this is where I need the most help OutFileName = Gene+".pdf"
For the above example I should get three iterations with 3 outfiles and 3 dataframes:
# HAPPY.pdf chr1 123 123 HAPPY 41.1 3.4 chr1 125 129 HAPPY 45.9 4.5 chr1 140 145 HAPPY 39.3 4.1 # SAD.pdf chr1 342 355 SAD 34.2 9.0 chr1 360 361 SAD 44.3 8.1 chr1 390 399 SAD 29.0 7.2 chr1 400 411 SAD 35.6 6.5 # Leg.pdf chr1 462 470 LEG 20.0 2.7
The resulting data frame contents split up by chunks will be sent to another function that will perform the analysis and return the contents to be written to file.
Advertisement
Answer
You can obtain the unique values calling unique
, iterate over this, build the filename and write this out to csv:
genes = df['Gene'].unique() for gene in genes: outfilename = gene + '.pdf' print(outfilename) df[df['Gene'] == gene].to_csv(outfilename) HAPPY.pdf SAD.pdf LEG.pdf
A more pandas-thonic method is to groupby on 'Gene'
and then iterate over the groups:
gp = df.groupby('Gene') # groups() returns a dict with 'Gene':indices as k:v pair for g in gp.groups.items(): print(df.loc[g[1]]) chr start end Gene Value MoreData 0 chr1 123 123 HAPPY 41.1 3.4 1 chr1 125 129 HAPPY 45.9 4.5 2 chr1 140 145 HAPPY 39.3 4.1 chr start end Gene Value MoreData 3 chr1 342 355 SAD 34.2 9.0 4 chr1 360 361 SAD 44.3 8.1 5 chr1 390 399 SAD 29.0 7.2 6 chr1 400 411 SAD 35.6 6.5 chr start end Gene Value MoreData 7 chr1 462 470 LEG 20 2.7