I have a list of outputs obtained from a cow behavior detection model. Even in a video when a cow is laying, often time it identifies as standing and vice versa. In each video frame, a classification result is given by the model and we are appending it into a list. Let’s assume after 20 frames, we have a series of output as follows –
behavious_cow_1 = ["stand","stand","stand","stand","lying", "stand","stand", "eating", "stand","stand","stand","stand","lying""stand","stand","stand","stand","stand","stand","lying"]
Out of 20 classification results, we have 4 misclassification; 3 lyings, and 1 eating. However, the whole time the cow was sitting at a place. If the list only contained numerical values like – 1,2,3…, I would have opted for moving average to change the misclassification. Is there any Scipy, Pandas, Numpy function that can smooth the categorical output? I am thinking about taking previous 3 and next 3 values to determine the current category.
Advertisement
Answer
I used the following solution –
import scipy.stats window_length = 7 behave = ["stand","stand","stand","stand","lying","lying", "eating"] most_freq_val = lambda x: scipy.stats.mode(x)[0][0] smoothed = [most_freq_val(behave[i:i+window_length]) for i in range(0,len(behave)-window_length+1)]
I tried the solution posted by Hugolmn but it broke at a point. In the rolling mode, the window width is provided by the user (7 here). In a certain width, if more than one values are present in the same number of times, the code does not work. It’s more like – you tried to find the statistical mode (most common item) of a list but it got more than one item with the same highest frequency.