Lets say I have a data frame with 3 columns:
JavaScript
x
21
21
1
| id | value | date |
2
+====+=======+===========+
3
| 1 | 50 | 1-Feb-19 |
4
+----+-------+-----------+
5
| 1 | 100 | 5-Feb-19 |
6
+----+-------+-----------+
7
| 1 | 200 | 6-Jun-19 |
8
+----+-------+-----------+
9
| 1 | 500 | 1-Dec-19 |
10
+----+-------+-----------+
11
| 2 | 10 | 6-Jul-19 |
12
+----+-------+-----------+
13
| 3 | 500 | 1-Mar-19 |
14
+----+-------+-----------+
15
| 3 | 200 | 5-Apr-19 |
16
+----+-------+-----------+
17
| 3 | 100 | 30-Jun-19 |
18
+----+-------+-----------+
19
| 3 | 10 | 25-Dec-19 |
20
+----+-------+-----------+
21
ID column contains the ID of a particular person. Value column contains the value of their transaction. Date column contains the date of their transaction.
Is there a way in Python to identify ID 1 as the ID with the increasing value of transactions over time?
I’m looking for some way I can extract ID 1 as my desired ID with increasing value of transactions, filter out ID 2 because it doesn’t have enough transactions to analyze a trend and also filter out ID 3 as it’s trend of transactions is declining over time.
Advertisement
Answer
JavaScript
1
7
1
df['new'] = df.groupby(['id'])['value'].transform(lambda x :
2
np.where(x.diff()>0,'incresase',
3
np.where(x.diff()<0,'decrease','--')))
4
5
df = df.groupby('id').new.agg(['last'])
6
df
7
Output:
JavaScript
1
6
1
last
2
id
3
1 increase
4
2 --
5
3 decrease
6
Only increasing ID:
JavaScript
1
3
1
increasingList = df[(df['last']=='increase')].index.values
2
print(increasingList)
3
Result:
JavaScript
1
2
1
[1]
2
Assuming this won’t happen
JavaScript
1
4
1
1 50
2
1 100
3
1 50
4
If so, then:
JavaScript
1
5
1
df['new'] = df.groupby(['id'])['value'].transform(lambda x :
2
np.where(x.diff()>0,'increase',
3
np.where(x.diff()<0,'decrease','--')))
4
df
5
Output:
JavaScript
1
10
10
1
value new
2
id
3
1 50 --
4
1 100 increase
5
1 200 increase
6
2 10 --
7
3 500 --
8
3 300 decrease
9
3 100 decrease
10
Concat strings:
JavaScript
1
3
1
df = df.groupby(['id'])['new'].apply(lambda x: ','.join(x)).reset_index()
2
df
3
Intermediate Result:
JavaScript
1
5
1
id new
2
0 1 --,increase,increase
3
1 2 --
4
2 3 --,decrease,decrease
5
Check if decrease exist in a row / only “–” exists. Drop them
JavaScript
1
4
1
df = df.drop(df[df['new'].str.contains("dec")].index.values)
2
df = df.drop(df[(df['new']=='--')].index.values)
3
df
4
Result:
JavaScript
1
3
1
id new
2
0 1 --,increase,increase
3