Skip to content
Advertisement

looking for the difference between ocurrences in a datframe

I have a dataframe like this (the real one is 7 million records and 345 features) the following image is only a small fraction related to if a cliente make an operation in a month. What I want to do is create a column at the end with the mean difference between each operation. For example in the first record the mean difference (probaly) would be 3

When I said mean difference is like between op1 an op4 there is a distance of 3, then between op4 and op11 is a difference of 7 then between op11 and op15 are 3 of difference an so on. so for this if we sum all the vaues we have 13 divided between the total operations which are op1, op4, op11, op15 (4 operations) we got 3.25. that is what i reffer by mean difference.

enter image description here

Advertisement

Answer

  • numpy.flatnonzero: Identify where the non-zero values are
  • numpy.diff: Find the difference between adjacent values. When passed results from flatnonzero it finds the differences between positions
  • numpy.mean: Find the average of values

Produce a new columns 'MD' with the average positional distance between non-zero values

df.assign(MD=[np.diff(np.flatnonzero(a)).mean() for a in df.to_numpy()])
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement