Skip to content
Advertisement

Python: Calculate max profit by day and after current timestamp

I have a dataframe like this:

                      close
formated                                                               
2017-01-03 09:30:00   29.9713
2017-01-03 09:31:00   29.0622
2017-01-03 09:32:00   29.0750
2017-01-03 09:33:00   29.0276
2017-01-03 09:34:00   29.0375
...                       ...
2022-08-19 09:30:00  173.5500  
2022-08-19 09:31:00  173.4494
2022-08-19 09:32:00  173.3400
2022-08-19 09:33:00  173.3900
2022-08-19 09:34:00  173.2600

The df contains a price value for every minute of the day starting from 9:30 through 16:00. I am looking to get the maximum possible profit for each minute of the day.

I am currently doing something like this:

df['Profit'] = (df.groupby(pd.Grouper(freq='D'))['close'].transform('max') - df['close']).div(df['close'])

This gives me the percentage of the profit for each row to the highest value of the day.
However, this approach is flawed, because it also calculates the profit percentage from timestamps after the highest value of the day has already been reached.
But i don’t want the max value for the whole day, but rather the max value per day with only the timestamps that come after the row in the dataframe that we are currently looking at.

Profit is defined as the difference between the value at that minute, and the maximum value across all following minutes in the same day.

Desired Output:

                      close     Profit abs.  Profit perc.
formated                                                               
2017-01-03 09:30:00   29.9713   0.0          0.0
2017-01-03 09:31:00   29.0622   0.0128       0.0004404
2017-01-03 09:32:00   29.0750   0.0          0.0
2017-01-03 09:33:00   29.0276   0.0099       0.0003410
2017-01-03 09:34:00   29.0375   0.0          0.0   
...                       ...      ...             ...

Please no solutions that iterate over the dataframe in a for loop as this is extremely slow.

Advertisement

Answer

You can use the cummax function from pandas, to compute the cumulative max over each day. However, you will need to apply it in reverse.

if we do this on an hourly basis, for the sake of explaining:

>>> example_df
date              price
2020-01-01 00:00  1
2020-01-01 01:00  2
2020-01-01 03:00  1
2020-01-01 04:00  7
2020-01-01 05:00  5
2020-01-01 06:00  2
2020-01-01 07:00  4

>>> example_df.reverse_cummax()
date              price  reverse_cummax
2020-01-01 00:00  1      7
2020-01-01 01:00  2      7
2020-01-01 03:00  1      7
2020-01-01 04:00  7      7
2020-01-01 05:00  5      5
2020-01-01 06:00  2      4
2020-01-01 07:00  4      4

this is what the output of a reverse_cummax function would look like.

We can’t use directly cummax from pandas, and there is no easy way to reverse it, besides reversing the whole dataframe. We need to create a subfunction that reverses the values of the columns on the “grouped by” data frames we are interested in directly, and then reverse them again once we are done:

def reverse_cummax(df_day):
    return df_day.loc[::-1, "close"].cummax()[::-1]

# This function produces a MultiLevelIndex, but we want our initial index
# back, so we need to drop a level
df["reverse_cummax"] = df.groupby(pd.Grouper(freq='D')).apply(reverse_cummax).droplevel(0)

then, you can get your profit by subtracting each value to the daily maximum value to come (reverse_cummax)

df["profit abs"] = df["reverse_cummax"] - df["close"]
df["profit percentage"] = df["profit abs"]/df["close"]

Edited to add the improved solution as suggested by dermen

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement