Skip to content
Advertisement

How to calculate average returns over separate consecutive ranges determined by another column in Python?

I currently have a Pandas DataFrame which contains a time series of asset prices and a column containing a “state”. There are three states -1, 0, 1 that occur at various points in the data.

I am trying to find the average return on the asset in each of these states, ideally using a vectorised method.

Here is an example of the DataFrame:

JavaScript

I am trying to calculate the average return for each state, so for example for state 1:

JavaScript

Is there a neat, vectorised way to do this?

Advertisement

Answer

Creating and ID for each sequence and using groupby:

JavaScript

The Seq column is computed so that all the rows belonging to a sequence have the same ID:

JavaScript

The main idea is to identify the elements of the state column where the value change of state change using diff: if the state is different w.r.t. the previous column, the difference will be different than zero. Then, you can create an increasing ID using cumsum, i.e. the cumulative sum. This works also because the first element returned by the diff is NaN, that is different from zero.

Once you have the Seq column, the solution of your problem is very simple: firstly, group by state and Seq and get the return for each sequence and then compute the average of returns for each state.

Here the complete code and the result:

JavaScript
JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement