Skip to content
Advertisement

How to mark data as anomalies based on specific condition in each interval

I try to search for this problem many places and couldn’t find the right tools.

I have a simple time series data,

JavaScript
JavaScript

For any sequence of data that is = 1 and span over (for example 1000 time instances). I want to mark those as anomalies (true). Else they should be ignore (as false).

How do I achieve this with pandas or numpy?

I also want to plot those anomalies, with the colour red for example, how do we achieve that?

How do I mark those anomalies (values = 1 that expanse for around 1000 time instances) as red? enter image description here

Advertisement

Answer

It is not exactly clear which output you expect. Yet, let’s consider the following dataset similar to yours:

JavaScript
JavaScript

Looking like:

input data

filtering based on consecutive values

First we calculate the length of the stretches of 1s

JavaScript

This works by identifying the first element of the stretches (s-s.shift().fillna(0)).eq(1) (the difference between one element and the precedent is 1 only in case of 1 preceded by 0, see graph #2 below). Then it makes increasing groups (graph #3) that group each stretch of 1s and the successive stretch of 0s. By multiplying by s, only the 1s are kept in the group (graph #4). Now we can group the data per stretch and calculate each one’s length (graph #5). The 0s will be all part of one group, so finally, we remove the zeros by multiplying again by s (graph #6).

Here is the visual representation of the successive steps where (…) denotes the previous step in each graph:

breakdown of stretches length calculation

JavaScript

line+dots

JavaScript

line+dots

other example with 7 as threshold:

line+dots ; 7 as threshold

original answer


You can easily convert to bool to get anomalies

JavaScript

Regarding the plot, depending on what you expect you can do:

JavaScript

output:

data as dots anomalies in red

JavaScript

data as lines anomalies as red dots

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement