Skip to content
Advertisement

Pandas: splitting data frame based on the slope of data

I have this data frame

x = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})

enter image description here

Update: I want a function If the slope is negetive and the length of the group is more than 2 then it should return True, index of start and end of the group. for this case it should return: result=True, index=5, index=8

1- I want to split the data frame based on the slope. This example should have 6 groups.

2- how can I check the length of groups?

enter image description here

I tried to get groups by the below code but I don’t know how can split the data frame and how can check the length of each part

New update: Thanks Matt W. for his code. finally I found the solution.

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().fillna(0)
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
def get_slope(df):
    x=np.array(df.iloc[:,0].index)
    y=np.array(df.iloc[:,0])
    X = x - x.mean()
    Y = y - y.mean()
    slope = (X.dot(Y)) / (X.dot(X))
    return slope
df['g'] = init[1:]

df.groupby('g').apply(get_slope)

Result

0    NaN
1    NaN
2    NaN
3    0.0
4    NaN
5   -1.5
6    NaN

Advertisement

Answer

Take the difference and bfill() the start so that you have the same number in the 0th element. Then turn all negatives the same so we can imitate them being the same “slope”. Then I shift it to check to see if the next number is the same and iterate through giving us a list of when it changes, assigning that to g.

df = pd.DataFrame({'entity':[5,7,5,5,5,6,3,2,0,5]})
df['diff'] = df.entity.diff().bfill()
df.loc[df['diff'] < 0, 'diff'] = -1

init = [0]
for x in df['diff'] == df['diff'].shift(1):
    if x:
        init.append(init[-1])
    else:
        init.append(init[-1]+1)
df['g'] = init[1:]
df
   entity  diff  g
0       5   2.0  1
1       7   2.0  1
2       5  -1.0  2
3       5   0.0  3
4       5   0.0  3
5       6   1.0  4
6       3  -1.0  5
7       2  -1.0  5
8       0  -1.0  5
9       5   5.0  6
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement