I have a dataframe like this (the real one is 7 million records and 345 features) the following image is only a small fraction related to if a cliente make an operation in a month. What I want to do is create a column at the end with the mean difference between each operation. For example in the first record the mean difference (probaly) would be 3
When I said mean difference is like between op1 an op4 there is a distance of 3, then between op4 and op11 is a difference of 7 then between op11 and op15 are 3 of difference an so on. so for this if we sum all the vaues we have 13 divided between the total operations which are op1, op4, op11, op15 (4 operations) we got 3.25. that is what i reffer by mean difference.
Advertisement
Answer
numpy.flatnonzero
: Identify where the non-zero values arenumpy.diff
: Find the difference between adjacent values. When passed results fromflatnonzero
it finds the differences between positionsnumpy.mean
: Find the average of values
Produce a new columns 'MD'
with the average positional distance between non-zero values
df.assign(MD=[np.diff(np.flatnonzero(a)).mean() for a in df.to_numpy()])