Pandasql Exception with OVER

I tried to use this line of code :

import pandasql as ps
df2["transaction per 30 minutes"] = ps.sqldf("SELECT COUNT(*) FROM df OVER (PARTITION BY 'Card Number' ORDER BY 'Date' RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)", locals())
df2["avg_transaction_30"] = ps.sqldf("SELECT AVG(Amount) FROM df OVER (PARTITION BY 'Card Number' ORDER BY Date RANGE BETWEEN '30 minutes' PRECEDING AND  '0 minutes' FOLLOWING)")

JavaScript
​x
 
import pandasql as ps
df2["transaction per 30 minutes"] = ps.sqldf("SELECT COUNT(*) FROM df OVER (PARTITION BY 'Card Number' ORDER BY 'Date' RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)", locals())
df2["avg_transaction_30"] = ps.sqldf("SELECT AVG(Amount) FROM df OVER (PARTITION BY 'Card Number' ORDER BY Date RANGE BETWEEN '30 minutes' PRECEDING AND  '0 minutes' FOLLOWING)")
​

NB : delete ‘locals()’ or replace by ‘globals()’ don’t solve the problem.

But I have this error :

PandaSQLException: (sqlite3.OperationalError) near "(": syntax error
[SQL: SELECT COUNT(*) FROM df2 OVER (PARTITION BY 'Card Number' ORDER BY date RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)]
(Background on this error at: )

JavaScript
 
PandaSQLException: (sqlite3.OperationalError) near "(": syntax error
[SQL: SELECT COUNT(*) FROM df2 OVER (PARTITION BY 'Card Number' ORDER BY date RANGE BETWEEN '30 minutes' PRECEDING AND '0 minutes' FOLLOWING)]
(Background on this error at: )
​

http://sqlalche.me/e/14/e3q8 (Same error with Average)

I load my dataframe like this :

df2 = pd.read_csv('df.csv')

JavaScript
 
df2 = pd.read_csv('df.csv')
​

My file is in the same directory as my Jupiter notebook file. I follow the link, but I still don’t understand what’s wrong.

And this line work :

ps.sqldf("SELECT * FROM df2")

JavaScript
 
ps.sqldf("SELECT * FROM df2")
​

So I guess is not a problem with my pandasql

My initial goal with this code is :

I want to know how many transactions a card number do for every 30 min interval, and also calculate in average how many transactions a client do per 30 minutes.

For example, I want to have something like (order by Card Number) :

Card Number	interval	#transactions_30min	AVG_transactions_30min
536518******2108	2015-05-11 00:00:01 – 00:30:00	2	5
536518******2108	2015-05-11 00:30:01 – 01:00:00	8	5
536518******2197	2015-05-11 00:00:01 – 00:30:00	6	8
536518******2197	2015-05-11 00:30:01 – 01:00:00	10	8

My database look like this :

id	Card Number	Date	Amount	Fraud
0	536518******2108	2015-05-01 00:01:54	36.54	No
1	536518******2108	2015-05-01 00:03:46	36.54	No
2	453211******1239	2015-05-01 00:08:50	69.00	No
3	548827******1705	2015-05-01 00:27:00	193.43	No
4	531681******9778	2015-05-01 01:32:46	132.00	No
5	515117******4107	2015-05-01 02:10:26	161.00	No
6	432032******9111	2015-05-01 08:09:15	110.00	No
7	544540******7141	2015-05-01 08:30:16	159.50	No
8	554906******0358	2015-05-01 09:13:51	126.50	Yes
9	554906******0358	2015-05-01 09:15:28	126.50	Yes

So is it possible to use pandasql in my case ? If yes, how can I correct my request to make it work ?

Answer

If you don’t mind using pandas for all calculations, here is one approach:

df['Date'] = pd.to_datetime(df['Date'])

g = df.groupby(["Card Number", pd.Grouper(key='Date', freq='30min')], sort=False)
df_out = g['Amount'].agg(['count', 'mean']).add_prefix('transactions30min_').reset_index()

JavaScript
 
df['Date'] = pd.to_datetime(df['Date'])
​
g = df.groupby(["Card Number", pd.Grouper(key='Date', freq='30min')], sort=False)
df_out = g['Amount'].agg(['count', 'mean']).add_prefix('transactions30min_').reset_index()
​

Result:

print(df_out)

        Card Number                Date  transactions30min_count  transactions30min_mean
0  536518******2108 2015-05-01 00:00:00                        2                   36.54
1  453211******1239 2015-05-01 00:00:00                        1                   69.00
2  548827******1705 2015-05-01 00:00:00                        1                  193.43
3  531681******9778 2015-05-01 01:30:00                        1                  132.00
4  515117******4107 2015-05-01 02:00:00                        1                  161.00
5  432032******9111 2015-05-01 08:00:00                        1                  110.00
6  544540******7141 2015-05-01 08:30:00                        1                  159.50
7  554906******0358 2015-05-01 09:00:00                        2                  126.50

JavaScript
 
print(df_out)
​
        Card Number                Date  transactions30min_count  transactions30min_mean
0  536518******2108 2015-05-01 00:00:00                        2                   36.54
1  453211******1239 2015-05-01 00:00:00                        1                   69.00
2  548827******1705 2015-05-01 00:00:00                        1                  193.43
3  531681******9778 2015-05-01 01:30:00                        1                  132.00
4  515117******4107 2015-05-01 02:00:00                        1                  161.00
5  432032******9111 2015-05-01 08:00:00                        1                  110.00
6  544540******7141 2015-05-01 08:30:00                        1                  159.50
7  554906******0358 2015-05-01 09:00:00                        2                  126.50
​

Advertisement

Answer