There are around 60.000 dictionaries stored in a list. There is also a dataframe with the same amount of rows of which I want to take one column and insert into the dictionaries as a key value pair. I have created a for loop which is supposed to update dictionary values, which however seems to take forever. I am looking for a more optimal way to succeed considering the amount of rows.
new_dicties = [] for i in list_of_dicts: for x in resultsDf0['created_at']: i['created_at']=x new_dicties.append(i)
Advertisement
Answer
Due to the nested loop and based on the information you’ve given you are doing 60,000 x 60,000 = 3,600,000,000 dictionary updates, most of them in vain because you’re overriding each update 59,999 times.
So I suspect you have the following situation: A dataframe df
and a list of dictionaries list_of_dicts
that have the same length (length of df
= number of rows), for instance:
df = pd.DataFrame({"created_at": ["2023-01-01", "2023-01-02", "2023-01-03"]}) list_of_dicts = [{"key": i} for i in range(1, 4)]
Most likely you’re trying to do:
new_dicties = [] for d, v in zip(list_of_dicts, df["created_at"]): d["created_at"] = v new_dicties.append(d)
Now this gives you the following new_dicties
[{'key': 1, 'created_at': '2023-01-01'}, {'key': 2, 'created_at': '2023-01-02'}, {'key': 3, 'created_at': '2023-01-03'}]
but list_of_dicts
looks the same, because the variables are references (pointers, if you will).
If that’s fine, then you could also just stick with the original list_of_dicts
and do
for d, v in zip(list_of_dicts, df["created_at"]): d["created_at"] = v new_dicties = list_of_dicts # Maybe not needed
If that’s not what you want, then you could do either
new_dicties = [d | {"created_at": v} for d, v in zip(list_of_dicts, df["created_at"])]
in case you have Python 3.9 or higher or
new_dicties = [] for d, v in zip(list_of_dicts, df["created_at"]): d = dict(d) d["created_at"] = v new_dicties.append(d)