I’m using pandas in python and I have a dataframe where one column is a timestamp and others contain data.
The blue line stays constant for a while, then suddenly increases to zero, then at some point descends again to about -98 and stays there until it suddenly goes up to zero. What I need is a new column with the status of the blue color: constant,sudden increase, constant, decrease, constant, sudden increase, constant or somehow an object that describes the data:
blue line{ '08.02.2022 08:30:00.000' : 'sudden increase', '08.02.2022 10:39:30.000' : 'start decrease', '08.02.2022 10:59:40.000' : 'end decrease', '08.02.2022 13:50:30.000' : 'sudden increase' }
Is there a package for something like this? I hope it isn’t too far feched
Kind Regards, Alexander
Advertisement
Answer
constant,sudden increase, constant, decrease, constant, sudden increase, constant
Then I suggest taking looking at numpy.gradient
, consider following simple example
import numpy as np arr = np.array([0,0,0,1,2,3,4,5,6,7,7,7,7,7,5,3,1]) arrg = np.gradient(arr) for a,g in zip(arr,arrg): print(a,g,sep=",")
output
0,0.0 0,0.0 0,0.5 1,1.0 2,1.0 3,1.0 4,1.0 5,1.0 6,1.0 7,0.5 7,0.0 7,0.0 7,0.0 7,-1.0 5,-2.0 3,-2.0 1,-2.0
Observe that constant parts result in zeros (potentially excluding these point adjacent to increase/decrease), increase result in positive values, decrease result in negative values, more increase/decrease the more value is far from zero.
You might need to convert pandas.Series
(column of pandas.DataFrame
) into numpy.array
– for which pandas.Series
sports method .to_numpy
.
Disclaimer: this solution assumes that your data are evenly spaced, e.g. from sensor providing value each second.
Edit in order to detect borders of increase/decrease you might leverage numpy.diff
following way, using arr
and arrg
from example above
arrb = abs(np.diff(arrg)) > 0.01 for a,b in zip(arr,arrb): print(a,b,sep=",")
output
0,False 0,True 0,True 1,False 2,False 3,False 4,False 5,False 6,True 7,True 7,False 7,False 7,True 7,True 5,False 3,False
Beware that due to how numpy.diff
work you will get result shorter by 1 observation.
This does compute difference between adjacent elements then checks if it is absolute value is greater than 0.01
(you might need to adjust value depending on your data), when there is True
(possibly few adjacent True
s) there is kink.