Without iterating row by row through a dataframe, which takes ages, how can I check that a number of rows all meet a condition?

Question

I want to do the following, but obviously I realise that this kind of iterative method is very slow with large DataFrames, what other solutions are there to this problem?: What I would expect the code above to do is: Sub in n from 0 to 1,000 into line 3, with an i of 0, and then if the condition

Accepted Answer

First, let me state how I understand your rule. As near as I can tell you are trying to get a value of &#8220;Buy&#8221; in the &#8220;Strategy 1&#8221; column of the df only if there are 1000 consecutive cases where MA was greater than the Close preceding that time. I think you can get that done simply by using a rolling sum on the comparison:import pandas as pdimport numpy as np# build some repeatable sample datanp.random.seed(1)df = pd.DataFrame({'close': np.cumsum(np.random.randn(10000))})df['MA'] = df['close'].rolling(1000).mean()# Apply strategynpoints = 1000df['Strategy 1'] = float('nan')buypoints = (df['MA'] > df['close']).rolling(npoints).sum() == npointsdf.loc[buypoints, "Strategy 1"] = "Buy"# just for visualisation show where the Buys would bedf['Buypoints'] = buypoints*10df.plot()This comes out like this (with the same seed it should look the same on your machine too)

Advertisement

Answer