Skip to content
Advertisement

Calculating hamming distance in a given year

I have a following dataframe:

JavaScript

I would like to calculate pairwise hamming distance for each pair in a given year and save it into a new dataframe. Example: (Note: I made up the numbers for the hamming distance, and I don’t actually need to Pair column)

JavaScript

I tried something like:

JavaScript

Advertisement

Answer

The function pairwise_distances can take in a matrix, so it might be easier to just provide the features in a year as a matrix, get back a pairwise matrix of distances and just subset on the comparisons we need. For example, a dataset like yours:

JavaScript

Define the pairwise function that takes in the feature column and also row names :

JavaScript

Use groupby and apply the function:

JavaScript

Gives us something like this:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement