Skip to content
Advertisement

Without iterating row by row through a dataframe, which takes ages, how can I check that a number of rows all meet a condition?

I want to do the following, but obviously I realise that this kind of iterative method is very slow with large DataFrames, what other solutions are there to this problem?:

for i in range(len(df)):
    for n in range(1001):
        if df["Close"][(i+n)] > df["MA"][i+n]:
            df["Strategy 1"][i] = "Buy"

What I would expect the code above to do is:

Sub in n from 0 to 1,000 into line 3, with an i of 0, and then if the condition in line 3 held for each n in the range of 0 to 1,000 then it would go on and carry out the operation in line 4.

After this it would take i of 1 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.

After this it would take i of 2 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.

After this it would take i of 3 and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.

After this it would take i of len(df) and then sub in n from 0 to 1,000 into line 3, and if the condition held for all n in that range then it would carry out line 4.

Regardless of if the code presented above does what i’d expect or not, is there a much faster way to compute this for very large multi Gigabyte DataFrames?

Advertisement

Answer

First, let me state how I understand your rule. As near as I can tell you are trying to get a value of “Buy” in the “Strategy 1” column of the df only if there are 1000 consecutive cases where MA was greater than the Close preceding that time. I think you can get that done simply by using a rolling sum on the comparison:

import pandas as pd
import numpy as np

# build some repeatable sample data
np.random.seed(1)
df = pd.DataFrame({'close': np.cumsum(np.random.randn(10000))})
df['MA'] = df['close'].rolling(1000).mean()

# Apply strategy
npoints = 1000

df['Strategy 1'] = float('nan')
buypoints = (df['MA'] > df['close']).rolling(npoints).sum() == npoints
df.loc[buypoints, "Strategy 1"] = "Buy"

# just for visualisation show where the Buys would be
df['Buypoints'] = buypoints*10
df.plot()

This comes out like this (with the same seed it should look the same on your machine too)

Sample curves showing buy points

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement