I have a dictionary like this:
no_empty_keys = {'783': [['4gsx', 'ADTQGS', 0.3333333333333333, {'A': ['A224', 'T226'], 'B': ['A224', 'T226']}, 504, 509], ['4gt0', 'ADTQGS', 0.3333333333333333, {'A': ['A224', 'T226'], 'B': ['A224', 'T226']}, 504, 509]],'1062': [['4gsx', 'AELTGY', 0.5, {'A': ['L175', 'T176', 'Y178'], 'B': ['L175', 'T176', 'Y178']}, 453, 458], ['4gt0', 'AELTGY', 0.5, {'A': ['L175', 'T176', 'Y178'], 'B': ['L175', 'T176', 'Y178']}, 453, 458]]}
My function to transform that into a CSV is this one:
epitope_df = pd.DataFrame(columns=['Epitope ID', 'PDB', 'Percent Identity', 'Epitope Mapped', 'Epitope Sequence', 'Starting Position', 'Ending Position'])
for x in no_empty_keys:
for y in no_empty_keys[x]:
epitope_df = epitope_df.append({'Epitope ID': x, 'PDB': y[0], 'Percent Identity': y[2], 'Epitope Mapped' : y[3], 'Epitope Sequence' : y[1], 'Starting Position' : y[4], 'Ending Position' : y[5]}, ignore_index=True)
epitope_df.to_csv('test.csv', index=False)
My output is a csv file like this:

It is working, but it isn’t well optimized. The process is very slow when I run into a dictionary with more than > 10,000 entries. Any ideas on how to speed this process up? Thank you for your time.
Advertisement
Answer
I’d start with getting rid of pandas.append. Appending rows to DataFrames is inefficient. You can create a DataFrame in one go:
result = []
for x in no_empty_keys:
for y in no_empty_keys[x]:
result.append(
{
'Epitope ID': x,
'PDB': y[0],
'Percent Identity': y[2],
'Epitope Mapped': y[3],
'Epitope Sequence': y[1],
'Starting Position': y[4],
'Ending Position': y[5]
}
)
epitope_df = epitope_df.from_records(result)
epitope_df.to_csv('new.csv', index=False)