using a loop to insert values from a column from a data frame into a dictionary key

Question

There are around 60.000 dictionaries stored in a list. There is also a dataframe with the same amount of rows of which I want to take one column and insert into the dictionaries as a key value pair. I have created a for loop which is supposed to update dictionary values, which however seems to take forever. I am looking

Accepted Answer

Due to the nested loop and based on the information you&#8217;ve given you are doing 60,000 x 60,000 = 3,600,000,000 dictionary updates, most of them in vain because you&#8217;re overriding each update 59,999 times.So I suspect you have the following situation: A dataframe df and a list of dictionaries list_of_dicts that have the same length (length of df = number of rows), for instance:df = pd.DataFrame({"created_at": ["2023-01-01", "2023-01-02", "2023-01-03"]})list_of_dicts = [{"key": i} for i in range(1, 4)]Most likely you&#8217;re trying to do:new_dicties = []for d, v in zip(list_of_dicts, df["created_at"]):    d["created_at"] = v    new_dicties.append(d)Now this gives you the following new_dicties[{'key': 1, 'created_at': '2023-01-01'}, {'key': 2, 'created_at': '2023-01-02'}, {'key': 3, 'created_at': '2023-01-03'}]but list_of_dicts looks the same, because the variables are references (pointers, if you will).If that&#8217;s fine, then you could also just stick with the original list_of_dicts and dofor d, v in zip(list_of_dicts, df["created_at"]):    d["created_at"] = vnew_dicties = list_of_dicts  # Maybe not neededIf that&#8217;s not what you want, then you could do eithernew_dicties = [d | {"created_at": v} for d, v in zip(list_of_dicts, df["created_at"])]in case you have Python 3.9 or higher ornew_dicties = []for d, v in zip(list_of_dicts, df["created_at"]):    d = dict(d)    d["created_at"] = v    new_dicties.append(d)

Advertisement

Answer