I want to do the following, but obviously I realise that this kind of iterative method is very slow with large DataFrames, what other solutions are there to this problem?:
for i in range(len(df)): for n in range(1001): if df["Close"][(i+n)] > df["MA"][i+n]: df["Strategy 1"][i] = "Buy"
What I would expect the code above to do is:
Sub in n from 0 to 1,000 into line 3, with an i of 0, and then if the condition in line 3 held for each n in the range of 0 to 1,000 then it would go on and carry out the operation in line 4.
After this it would take i of 1 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.
After this it would take i of 2 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.
After this it would take i of 3 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.
… …
After this it would take i of len(df) and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.
Regardless of if the code presented above does what i’d expect or not, is there a much faster way to compute this for very large multi Gigabyte DataFrames?
Advertisement
Answer
First, let me state how I understand your rule. As near as I can tell you are trying to get a value of “Buy” in the “Strategy 1” column of the df only if there are 1000 consecutive cases where MA
was greater than the Close
preceding that time. I think you can get that done simply by using a rolling sum on the comparison:
import pandas as pd import numpy as np # build some repeatable sample data np.random.seed(1) df = pd.DataFrame({'close': np.cumsum(np.random.randn(10000))}) df['MA'] = df['close'].rolling(1000).mean() # Apply strategy npoints = 1000 df['Strategy 1'] = float('nan') buypoints = (df['MA'] > df['close']).rolling(npoints).sum() == npoints df.loc[buypoints, "Strategy 1"] = "Buy" # just for visualisation show where the Buys would be df['Buypoints'] = buypoints*10 df.plot()
This comes out like this (with the same seed it should look the same on your machine too)