Skip to content
Advertisement

Smoothing Categorical Output

I have a list of outputs obtained from a cow behavior detection model. Even in a video when a cow is laying, often time it identifies as standing and vice versa. In each video frame, a classification result is given by the model and we are appending it into a list. Let’s assume after 20 frames, we have a series of output as follows –

behavious_cow_1 = ["stand","stand","stand","stand","lying", "stand","stand", "eating", "stand","stand","stand","stand","lying""stand","stand","stand","stand","stand","stand","lying"]

Out of 20 classification results, we have 4 misclassification; 3 lyings, and 1 eating. However, the whole time the cow was sitting at a place. If the list only contained numerical values like – 1,2,3…, I would have opted for moving average to change the misclassification. Is there any Scipy, Pandas, Numpy function that can smooth the categorical output? I am thinking about taking previous 3 and next 3 values to determine the current category.

Advertisement

Answer

I used the following solution –

import scipy.stats
window_length = 7
behave = ["stand","stand","stand","stand","lying","lying", "eating"]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
smoothed = [most_freq_val(behave[i:i+window_length]) for i in range(0,len(behave)-window_length+1)]

I tried the solution posted by Hugolmn but it broke at a point. In the rolling mode, the window width is provided by the user (7 here). In a certain width, if more than one values are present in the same number of times, the code does not work. It’s more like – you tried to find the statistical mode (most common item) of a list but it got more than one item with the same highest frequency.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement