I have a dataframe like this:
close formated 2017-01-03 09:30:00 29.9713 2017-01-03 09:31:00 29.0622 2017-01-03 09:32:00 29.0750 2017-01-03 09:33:00 29.0276 2017-01-03 09:34:00 29.0375 ... ... 2022-08-19 09:30:00 173.5500 2022-08-19 09:31:00 173.4494 2022-08-19 09:32:00 173.3400 2022-08-19 09:33:00 173.3900 2022-08-19 09:34:00 173.2600
The df contains a price value for every minute of the day starting from 9:30 through 16:00. I am looking to get the maximum possible profit for each minute of the day.
I am currently doing something like this:
df['Profit'] = (df.groupby(pd.Grouper(freq='D'))['close'].transform('max') - df['close']).div(df['close'])
This gives me the percentage of the profit for each row to the highest value of the day.
However, this approach is flawed, because it also calculates the profit percentage from timestamps after the highest value of the day has already been reached.
But i don’t want the max value for the whole day, but rather the max value per day with only the timestamps that come after the row in the dataframe that we are currently looking at.
Profit is defined as the difference between the value at that minute, and the maximum value across all following minutes in the same day.
Desired Output:
close Profit abs. Profit perc. formated 2017-01-03 09:30:00 29.9713 0.0 0.0 2017-01-03 09:31:00 29.0622 0.0128 0.0004404 2017-01-03 09:32:00 29.0750 0.0 0.0 2017-01-03 09:33:00 29.0276 0.0099 0.0003410 2017-01-03 09:34:00 29.0375 0.0 0.0 ... ... ... ...
Please no solutions that iterate over the dataframe in a for loop as this is extremely slow.
Advertisement
Answer
You can use the cummax
function from pandas, to compute the cumulative max over each day. However, you will need to apply it in reverse.
if we do this on an hourly basis, for the sake of explaining:
>>> example_df date price 2020-01-01 00:00 1 2020-01-01 01:00 2 2020-01-01 03:00 1 2020-01-01 04:00 7 2020-01-01 05:00 5 2020-01-01 06:00 2 2020-01-01 07:00 4 >>> example_df.reverse_cummax() date price reverse_cummax 2020-01-01 00:00 1 7 2020-01-01 01:00 2 7 2020-01-01 03:00 1 7 2020-01-01 04:00 7 7 2020-01-01 05:00 5 5 2020-01-01 06:00 2 4 2020-01-01 07:00 4 4
this is what the output of a reverse_cummax function would look like.
We can’t use directly cummax
from pandas, and there is no easy way to reverse it, besides reversing the whole dataframe. We need to create a subfunction that reverses the values of the columns on the “grouped by” data frames we are interested in directly, and then reverse them again once we are done:
def reverse_cummax(df_day): return df_day.loc[::-1, "close"].cummax()[::-1] # This function produces a MultiLevelIndex, but we want our initial index # back, so we need to drop a level df["reverse_cummax"] = df.groupby(pd.Grouper(freq='D')).apply(reverse_cummax).droplevel(0)
then, you can get your profit by subtracting each value to the daily maximum value to come (reverse_cummax
)
df["profit abs"] = df["reverse_cummax"] - df["close"] df["profit percentage"] = df["profit abs"]/df["close"]
Edited to add the improved solution as suggested by dermen