I working with a forex dataset, trying to fill in my dataframe with open, high, low, close updated every tick.
Here is my code:
import pandas as pd # pandas settings pd.set_option('display.max_columns', 320) pd.set_option('display.max_rows', 320) pd.set_option('display.width', 320) # creating dataframe df = pd.read_csv('https://www.dropbox.com/s/tcek3kmleklgxm5/eur_usd_lastweek.csv?dl=1', names=['timestamp', 'ask', 'bid', 'avol', 'bvol'], parse_dates=[0], header=0) df['spread'] = df.ask - df.bid df['symbol'] = 'EURUSD' times = pd.DatetimeIndex(df.timestamp) # parameters for df.groupby() df['date'] = times.date df['hour'] = times.hour # 1h candles updated every tick df['candle_number'] = '...' df['1h_open'] = '...' df['1h_high'] = '...' df['1h_low'] = '...' df['1h_close'] = '...' # print(df) grouped = df.groupby(['date', 'hour']) for idx, x in enumerate(grouped): print(idx) print(x)
So as you can see, with for loop I’m getting groups.
Now I want to fill the following columns in my dataframe:
- idx be my df[‘candle_number’]
- df[‘1h_open’] must be equal to the very first df.bid in the group
- df[‘1h_high’] = the highest number in df.bid up until current row (so for instance if there are 350 rows in the group, for 20th value we count the highest number from 0-20 span, on 215th value we the highest value from 0-215 span which can be completely different.
- df[‘1h_low’] = lowest value up until the current iteration (same approach as for the above)
I hope it’s not too confusing =) Cheers
Advertisement
Answer
It’s convinient to reindex on date and hour:
df_new = df.set_index(['date', 'hour'])
Then apply groupby functions aggregating by index:
df_new['candle_number'] = df_new.groupby(level=[0,1]).ngroup() df_new['1h_open'] = df_new.groupby(level=[0,1])['bid'].first() df_new['1h_high'] = df_new.groupby(level=[0,1])['bid'].cummax() df_new['1h_low'] = df_new.groupby(level=[0,1])['bid'].cummin()
you can reset_index()
back to a flat dataframe.