Lets say I have a data frame with 3 columns:
| id | value | date | +====+=======+===========+ | 1 | 50 | 1-Feb-19 | +----+-------+-----------+ | 1 | 100 | 5-Feb-19 | +----+-------+-----------+ | 1 | 200 | 6-Jun-19 | +----+-------+-----------+ | 1 | 500 | 1-Dec-19 | +----+-------+-----------+ | 2 | 10 | 6-Jul-19 | +----+-------+-----------+ | 3 | 500 | 1-Mar-19 | +----+-------+-----------+ | 3 | 200 | 5-Apr-19 | +----+-------+-----------+ | 3 | 100 | 30-Jun-19 | +----+-------+-----------+ | 3 | 10 | 25-Dec-19 | +----+-------+-----------+
ID column contains the ID of a particular person. Value column contains the value of their transaction. Date column contains the date of their transaction.
Is there a way in Python to identify ID 1 as the ID with the increasing value of transactions over time?
I’m looking for some way I can extract ID 1 as my desired ID with increasing value of transactions, filter out ID 2 because it doesn’t have enough transactions to analyze a trend and also filter out ID 3 as it’s trend of transactions is declining over time.
Advertisement
Answer
df['new'] = df.groupby(['id'])['value'].transform(lambda x : np.where(x.diff()>0,'incresase', np.where(x.diff()<0,'decrease','--'))) df = df.groupby('id').new.agg(['last']) df
Output:
last id 1 increase 2 -- 3 decrease
Only increasing ID:
increasingList = df[(df['last']=='increase')].index.values print(increasingList)
Result:
[1]
Assuming this won’t happen
1 50 1 100 1 50
If so, then:
df['new'] = df.groupby(['id'])['value'].transform(lambda x : np.where(x.diff()>0,'increase', np.where(x.diff()<0,'decrease','--'))) df
Output:
value new id 1 50 -- 1 100 increase 1 200 increase 2 10 -- 3 500 -- 3 300 decrease 3 100 decrease
Concat strings:
df = df.groupby(['id'])['new'].apply(lambda x: ','.join(x)).reset_index() df
Intermediate Result:
id new 0 1 --,increase,increase 1 2 -- 2 3 --,decrease,decrease
Check if decrease exist in a row / only “–” exists. Drop them
df = df.drop(df[df['new'].str.contains("dec")].index.values) df = df.drop(df[(df['new']=='--')].index.values) df
Result:
id new 0 1 --,increase,increase