Skip to content
Advertisement

How to “describe” a column in pandas for python

I’m using pandas in python and I have a dataframe where one column is a timestamp and others contain data.Example

The blue line stays constant for a while, then suddenly increases to zero, then at some point descends again to about -98 and stays there until it suddenly goes up to zero. What I need is a new column with the status of the blue color: constant,sudden increase, constant, decrease, constant, sudden increase, constant or somehow an object that describes the data:

blue line{ 
'08.02.2022 08:30:00.000' : 'sudden increase',
'08.02.2022 10:39:30.000' : 'start decrease',
'08.02.2022 10:59:40.000' : 'end decrease',
'08.02.2022 13:50:30.000' : 'sudden increase' 
}

Is there a package for something like this? I hope it isn’t too far feched

Kind Regards, Alexander

Advertisement

Answer

constant,sudden increase, constant, decrease, constant, sudden increase, constant

Then I suggest taking looking at numpy.gradient, consider following simple example

import numpy as np
arr = np.array([0,0,0,1,2,3,4,5,6,7,7,7,7,7,5,3,1])
arrg = np.gradient(arr)
for a,g in zip(arr,arrg):
    print(a,g,sep=",")

output

0,0.0
0,0.0
0,0.5
1,1.0
2,1.0
3,1.0
4,1.0
5,1.0
6,1.0
7,0.5
7,0.0
7,0.0
7,0.0
7,-1.0
5,-2.0
3,-2.0
1,-2.0

Observe that constant parts result in zeros (potentially excluding these point adjacent to increase/decrease), increase result in positive values, decrease result in negative values, more increase/decrease the more value is far from zero.

You might need to convert pandas.Series (column of pandas.DataFrame) into numpy.array – for which pandas.Series sports method .to_numpy.

Disclaimer: this solution assumes that your data are evenly spaced, e.g. from sensor providing value each second.

Edit in order to detect borders of increase/decrease you might leverage numpy.diff following way, using arr and arrg from example above

arrb = abs(np.diff(arrg)) > 0.01
for a,b in zip(arr,arrb):
    print(a,b,sep=",")

output

0,False
0,True
0,True
1,False
2,False
3,False
4,False
5,False
6,True
7,True
7,False
7,False
7,True
7,True
5,False
3,False

Beware that due to how numpy.diff work you will get result shorter by 1 observation.

This does compute difference between adjacent elements then checks if it is absolute value is greater than 0.01 (you might need to adjust value depending on your data), when there is True (possibly few adjacent Trues) there is kink.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement