Skip to content
Advertisement

duplicated rows in pandas append inside for loop

I am having trouble with a for loop inside a function. I am calculating cosine distances for a list of word vectors. with each vector, I am calculating the cosine distance and then appending it as a new column to the pandas dataframe. the problem is that there are several models, so i am comparing a word vector from model 1, with that word in every other model.

This means that some words are not present in all models. In this case, I use an exception for the KeyError and allow the loop to move on without throwing an error. If this happens, I also ask that a 0 value is added the pandas dataframe. This is causing duplicated indexes and am stuck with moving forward from here. The code is as follows:

JavaScript

The function works, however, for each KeyError – instead of adding a 0 at one row, it creates a new duplicated one with the value 0. With two words this duplicated the dataframe, but the ultimate aim is to have a list of many words. The resulting dataframe is found below:

JavaScript

As you can see for every word that isn’t present, instead of adding a 0 to existing model row (NaN) it is adding a new row with the number 0. it should read: model1, 0, 0.76 etc, instead of the duplicated rows. any help is much appreciated, thank you!

Advertisement

Answer

I can’t quite test it without your model objects, but I think this would address your issue:

JavaScript

It collects the values for the words for each model in a dictionary in the inner loop, and only tacks them into the full dataframe once in the outer loop.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement