I working with a forex dataset, trying to fill in my dataframe with open, high, low, close updated every tick.
Here is my code:
JavaScript
x
35
35
1
import pandas as pd
2
3
# pandas settings
4
pd.set_option('display.max_columns', 320)
5
pd.set_option('display.max_rows', 320)
6
pd.set_option('display.width', 320)
7
8
# creating dataframe
9
df = pd.read_csv('https://www.dropbox.com/s/tcek3kmleklgxm5/eur_usd_lastweek.csv?dl=1', names=['timestamp', 'ask', 'bid', 'avol', 'bvol'], parse_dates=[0], header=0)
10
df['spread'] = df.ask - df.bid
11
df['symbol'] = 'EURUSD'
12
times = pd.DatetimeIndex(df.timestamp)
13
14
# parameters for df.groupby()
15
df['date'] = times.date
16
df['hour'] = times.hour
17
18
# 1h candles updated every tick
19
df['candle_number'] = '...'
20
df['1h_open'] = '...'
21
df['1h_high'] = '...'
22
df['1h_low'] = '...'
23
df['1h_close'] = '...'
24
25
# print(df)
26
27
grouped = df.groupby(['date', 'hour'])
28
29
for idx, x in enumerate(grouped):
30
print(idx)
31
print(x)
32
33
34
35
So as you can see, with for loop I’m getting groups.
Now I want to fill the following columns in my dataframe:
- idx be my df[‘candle_number’]
- df[‘1h_open’] must be equal to the very first df.bid in the group
- df[‘1h_high’] = the highest number in df.bid up until current row (so for instance if there are 350 rows in the group, for 20th value we count the highest number from 0-20 span, on 215th value we the highest value from 0-215 span which can be completely different.
- df[‘1h_low’] = lowest value up until the current iteration (same approach as for the above)
I hope it’s not too confusing =) Cheers
Advertisement
Answer
It’s convinient to reindex on date and hour:
df_new = df.set_index(['date', 'hour'])
Then apply groupby functions aggregating by index:
JavaScript
1
5
1
df_new['candle_number'] = df_new.groupby(level=[0,1]).ngroup()
2
df_new['1h_open'] = df_new.groupby(level=[0,1])['bid'].first()
3
df_new['1h_high'] = df_new.groupby(level=[0,1])['bid'].cummax()
4
df_new['1h_low'] = df_new.groupby(level=[0,1])['bid'].cummin()
5
you can reset_index()
back to a flat dataframe.