How to “describe” a column in pandas for python

Question

I'm using pandas in python and I have a dataframe where one column is a timestamp and others contain data. The blue line stays constant for a while, then suddenly increases to zero, then at some point descends again to about -98 and stays there until it suddenly goes up to zero. What I need is a new column with

Accepted Answer

constant,sudden increase, constant, decrease, constant, suddenincrease, constantThen I suggest taking looking at numpy.gradient, consider following simple exampleimport numpy as nparr = np.array([0,0,0,1,2,3,4,5,6,7,7,7,7,7,5,3,1])arrg = np.gradient(arr)for a,g in zip(arr,arrg):    print(a,g,sep=",")output0,0.00,0.00,0.51,1.02,1.03,1.04,1.05,1.06,1.07,0.57,0.07,0.07,0.07,-1.05,-2.03,-2.01,-2.0Observe that constant parts result in zeros (potentially excluding these point adjacent to increase/decrease), increase result in positive values, decrease result in negative values, more increase/decrease the more value is far from zero.You might need to convert pandas.Series (column of pandas.DataFrame) into numpy.array &#8211; for which pandas.Series sports method .to_numpy.Disclaimer: this solution assumes that your data are evenly spaced, e.g. from sensor providing value each second.Edit in order to detect borders of increase/decrease you might leverage numpy.diff following way, using arr and arrg from example abovearrb = abs(np.diff(arrg)) > 0.01for a,b in zip(arr,arrb):    print(a,b,sep=",")output0,False0,True0,True1,False2,False3,False4,False5,False6,True7,True7,False7,False7,True7,True5,False3,FalseBeware that due to how numpy.diff work you will get result shorter by 1 observation.This does compute difference between adjacent elements then checks if it is absolute value is greater than 0.01 (you might need to adjust value depending on your data), when there is True (possibly few adjacent Trues) there is kink.

Advertisement

Answer