I have this data frame
x = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
Update: I want a function If the slope is negetive and the length of the group is more than 2 then it should return True, index of start and end of the group. for this case it should return: result=True
, index=5
, index=8
1- I want to split the data frame based on the slope. This example should have 6 groups.
2- how can I check the length of groups?
I tried to get groups by the below code but I don’t know how can split the data frame and how can check the length of each part
New update: Thanks Matt W. for his code. finally I found the solution.
df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]}) df['diff'] = df.entity.diff().fillna(0) df.loc[df['diff'] < 0, 'diff'] = -1 init = [0] for x in df['diff'] == df['diff'].shift(1): if x: init.append(init[-1]) else: init.append(init[-1]+1) def get_slope(df): x=np.array(df.iloc[:,0].index) y=np.array(df.iloc[:,0]) X = x - x.mean() Y = y - y.mean() slope = (X.dot(Y)) / (X.dot(X)) return slope df['g'] = init[1:] df.groupby('g').apply(get_slope)
Result
0 NaN 1 NaN 2 NaN 3 0.0 4 NaN 5 -1.5 6 NaN
Advertisement
Answer
Take the difference and bfill()
the start so that you have the same number in the 0th element. Then turn all negatives the same so we can imitate them being the same “slope”. Then I shift it to check to see if the next number is the same and iterate through giving us a list of when it changes, assigning that to g
.
df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]}) df['diff'] = df.entity.diff().bfill() df.loc[df['diff'] < 0, 'diff'] = -1 init = [0] for x in df['diff'] == df['diff'].shift(1): if x: init.append(init[-1]) else: init.append(init[-1]+1) df['g'] = init[1:] df entity diff g 0 5 2.0 1 1 7 2.0 1 2 5 -1.0 2 3 5 0.0 3 4 5 0.0 3 5 6 1.0 4 6 3 -1.0 5 7 2 -1.0 5 8 0 -1.0 5 9 5 5.0 6