Skip to content
Advertisement

In Pandas dataframe, how to append a new column of True / False based on each row’s value?

I’m trying to create a dataframe of stock prices, and append a True/False column for each row based on certain conditions.

ind = [0,1,2,3,4,5,6,7,8,9]
close = [10,20,30,40,30,20,30,40,50]
open = [11,21,31,41,31,21,31,41,51]
upper = [11,21,31,41,31,21,31,41,51]
mid = [11,21,31,41,31,21,31,41,51]
cond1 = [True,True,True,False,False,True,False,True,True]
cond2 = [True,True,False,False,False,False,False,False,False]
cond3 = [True,True,False,False,False,False,False,False,False]
cond4 = [True,True,False,False,False,False,False,False,False]
cond5 = [True,True,False,False,False,False,False,False,False]

def check_conds(df, latest_price):
    ''''1st set of INT for early breakout of bollinger upper''' 
    df.loc[:, ('cond1')] = df.close.shift(1) > df.upper.shift(1)
    df.loc[:, ('cond2')] = df.open.shift(1) < df.mid.shift(1).rolling(6).min()
    df.loc[:, ('cond3')] = df.close.shift(1).rolling(7).min() <= 21
    df.loc[:, ('cond4')] = df.upper.shift(1) < df.upper.shift(2)
    df.loc[:, ('cond5')] = df.mid.tail(3).max() < 30
    df.loc[:, ('Overall')] = all([df.cond1,df.cond2,df.cond3,df.cond4,df.cond5])    
    return df

The original 9 rows by 4 columns dataframe contains only the close / open / upper / mid columns.

that check_conds functions returns the df nicely with the new cond1-5 columns returning True / False appended for each row, resulting in a dataframe with 9 rows by 9 columns.

However when I tried to apply another logic to provide an ‘Overall’ True / False based on cond1-5 for each row, I receive that “ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().”

df.loc[:, ('Overall')] = all([df.cond1,df.cond2,df.cond3,df.cond4,df.cond5])

So I tried pulling out each of the cond1-5, those are indeed series of True / False. How do I have that last line in the function to check each row’s cond1-5 and return a True if all cond1-5 are True for that row?

Just can’t wrap my head why those cond1-5 lines in the function works ok, just comparing the values within each row, but this above last line (written in similar style) is returning an entire series.

Please advise!

Advertisement

Answer

The error tells you to use pd.DataFrame.all. To check that all values are true per row for all conditions you have to specify the argument axis=1:

df.loc[:, df.columns.str.startswith('cond')].all(axis=1)

Note that df.columns.str.startswith('cond') is just a lazy way of selecting all columns that start with 'cond'. Of course you can achieve the same with df[['cond1', 'cond2', 'cond3', 'cond4', 'cond5']].

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement