Skip to content
Advertisement

How do you identify which IDs have an increasing value over time in another column in a Python dataframe?

Lets say I have a data frame with 3 columns:

| id | value |    date   |
+====+=======+===========+
|  1 |   50  |  1-Feb-19 |
+----+-------+-----------+
|  1 |  100  |  5-Feb-19 |
+----+-------+-----------+
|  1 |  200  |  6-Jun-19 |
+----+-------+-----------+
|  1 |  500  |  1-Dec-19 |
+----+-------+-----------+
|  2 |   10  |  6-Jul-19 |
+----+-------+-----------+
|  3 |  500  |  1-Mar-19 |
+----+-------+-----------+
|  3 |  200  |  5-Apr-19 |
+----+-------+-----------+
|  3 |  100  | 30-Jun-19 |
+----+-------+-----------+
|  3 |   10  | 25-Dec-19 |
+----+-------+-----------+

ID column contains the ID of a particular person. Value column contains the value of their transaction. Date column contains the date of their transaction.

Is there a way in Python to identify ID 1 as the ID with the increasing value of transactions over time?

I’m looking for some way I can extract ID 1 as my desired ID with increasing value of transactions, filter out ID 2 because it doesn’t have enough transactions to analyze a trend and also filter out ID 3 as it’s trend of transactions is declining over time.

Advertisement

Answer

df['new'] = df.groupby(['id'])['value'].transform(lambda x : 
                      np.where(x.diff()>0,'incresase',
                      np.where(x.diff()<0,'decrease','--')))

df = df.groupby('id').new.agg(['last'])
df

Output:

      last
id  
1   increase
2   --
3   decrease

Only increasing ID:

increasingList = df[(df['last']=='increase')].index.values
print(increasingList)

Result:

[1]

Assuming this won’t happen

1  50
1  100
1  50

If so, then:

df['new'] = df.groupby(['id'])['value'].transform(lambda x : 
                      np.where(x.diff()>0,'increase',
                      np.where(x.diff()<0,'decrease','--')))
df

Output:

    value   new
id      
1   50  --
1   100 increase
1   200 increase
2   10  --
3   500 --
3   300 decrease
3   100 decrease

Concat strings:

df = df.groupby(['id'])['new'].apply(lambda x: ','.join(x)).reset_index()
df

Intermediate Result:

    id  new
0   1   --,increase,increase
1   2   --
2   3   --,decrease,decrease

Check if decrease exist in a row / only “–” exists. Drop them

df = df.drop(df[df['new'].str.contains("dec")].index.values)
df = df.drop(df[(df['new']=='--')].index.values)
df

Result:

    id  new
0   1   --,increase,increase
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement