I have a dictionary like this:
JavaScript
x
2
1
no_empty_keys = {'783': [['4gsx', 'ADTQGS', 0.3333333333333333, {'A': ['A224', 'T226'], 'B': ['A224', 'T226']}, 504, 509], ['4gt0', 'ADTQGS', 0.3333333333333333, {'A': ['A224', 'T226'], 'B': ['A224', 'T226']}, 504, 509]],'1062': [['4gsx', 'AELTGY', 0.5, {'A': ['L175', 'T176', 'Y178'], 'B': ['L175', 'T176', 'Y178']}, 453, 458], ['4gt0', 'AELTGY', 0.5, {'A': ['L175', 'T176', 'Y178'], 'B': ['L175', 'T176', 'Y178']}, 453, 458]]}
2
My function to transform that into a CSV is this one:
JavaScript
1
6
1
epitope_df = pd.DataFrame(columns=['Epitope ID', 'PDB', 'Percent Identity', 'Epitope Mapped', 'Epitope Sequence', 'Starting Position', 'Ending Position'])
2
for x in no_empty_keys:
3
for y in no_empty_keys[x]:
4
epitope_df = epitope_df.append({'Epitope ID': x, 'PDB': y[0], 'Percent Identity': y[2], 'Epitope Mapped' : y[3], 'Epitope Sequence' : y[1], 'Starting Position' : y[4], 'Ending Position' : y[5]}, ignore_index=True)
5
epitope_df.to_csv('test.csv', index=False)
6
My output is a csv file like this:
It is working, but it isn’t well optimized. The process is very slow when I run into a dictionary with more than > 10,000 entries. Any ideas on how to speed this process up? Thank you for your time.
Advertisement
Answer
I’d start with getting rid of pandas.append
. Appending rows to DataFrames is inefficient. You can create a DataFrame in one go:
JavaScript
1
18
18
1
result = []
2
for x in no_empty_keys:
3
for y in no_empty_keys[x]:
4
result.append(
5
{
6
'Epitope ID': x,
7
'PDB': y[0],
8
'Percent Identity': y[2],
9
'Epitope Mapped': y[3],
10
'Epitope Sequence': y[1],
11
'Starting Position': y[4],
12
'Ending Position': y[5]
13
}
14
)
15
16
epitope_df = epitope_df.from_records(result)
17
epitope_df.to_csv('new.csv', index=False)
18