Skip to content
Advertisement

Python panda’s dataframe boolean Series/Column based on conditional next columns

I’m having trouble describing exactly what I want to achieve. I’ve tried looking here on stack to find others with the same problem, but are unable to find any. So I will try to describe exactly what I want and give you a sample setup code.

I would like to have a function that gives me a new column/pd.Series. This new column has boolean TRUE values (or int’s) that are based on a certain condition.

The condition being as follows. There are N number of columns (example is 8), each with the same name but ending with one new number. IE, column_1, column_2 etc. The function I need is twofold:

  1. If N is given, look for/through each column row and see if it and the next N columns row are also TRUE/1 ..
  2. If N is NOT given, look for each column row and if all next columns rows are also TRUE/1, with the numbers as ID’s to look at the column.
def get_df_series(df: pd.DataFrame, columns_ids: list, n: int=8) -> pd.Dataframe:

    for i in columns_ids:
        # missing code here .. i dont know if this would be the way to go
        pass

    return df


def create_dataframe(numbers: list) -> pd.DataFrame:
    df = pd.DataFrame()  # empty df

    # create a column for each number with the number as ID and with random boolean values as int's
    for i in numbers:
        df[f'column_{i}'] = np.random.randint(2, size=20)

    return df


if __name__=="__main__":

    numbers = [1, 2, 3, 4, 5, 6, 7, 8]

    df = create_dataframe(numbers=numbers)

    df = get_df_series(df=df, numbers=numbers, n=3)

I have some experience with Pandas dataframes and know how to create IF/ELSE things with np.select for example.

(function) select(condlist: Sequence[ArrayLike], choicelist: Sequence[ArrayLike], default: ArrayLike = ...) -> NDArray


The problem I’m running into is that I don’t know how to make a conditional statement if I don’t know how many columns are ahead. For example, if I want to know for column_5 if the next 3 are also true, I can hardcode this, but I have columns up to id 20 and would love to not have to hardcode everything from column_1 to column_20 if I want to know if all conditions in all those columns are true.

Now the problem is that I don’t know if this is even possible. So any feedback would be appreaciated. Even just giving me a hint on where to look for a way to do this.

EDIT: What I forgot to mention was that there will be random columns in between that obviously cannot be taking into the equation. For example, there will be main_column_1, main_column_2, main_column_3, side_column_1, side_column_2, right_column_1, main_column_3, main_column_4 etc…

The answer Corralien gave is correct, but I should’ve made my question more clearer.

I need to be able to, say, look at main_column and for that one look ahead N amount of columns of the same type: main_column.

Advertisement

Answer

Try:

n = 3
out = (df.rolling(n, min_periods=1, axis=1).sum()
         .shift(-n+1, fill_value=0, axis=1).eq(n).astype(int)
         .rename(columns=lambda x: 'result_' + x.split('_')[1]))

Output:

>>> out
    result_1  result_2  result_3  result_4  result_5  result_6  result_7  result_8
0          1         1         1         1         1         1         0         0
1          0         0         0         0         0         0         0         0
2          0         0         0         0         0         0         0         0
3          0         0         0         0         0         0         0         0
4          0         0         0         1         0         0         0         0
5          0         0         0         0         0         0         0         0
6          0         0         0         0         0         0         0         0
7          0         0         0         0         0         0         0         0
8          0         1         1         1         0         0         0         0
9          0         0         0         0         0         1         0         0
10         0         0         0         0         0         0         0         0
11         0         0         0         0         1         0         0         0
12         0         0         0         0         0         0         0         0
13         0         0         0         1         1         0         0         0
14         0         0         0         0         0         1         0         0
15         0         0         0         0         0         0         0         0
16         0         0         0         0         0         0         0         0
17         0         0         1         0         0         0         0         0
18         0         0         1         0         0         0         0         0
19         0         0         0         0         0         0         0         0

Input:

>>> df
    column_1  column_2  column_3  column_4  column_5  column_6  column_7  column_8
0          1         1         1         1         1         1         1         1
1          0         1         0         0         0         1         1         0
2          1         1         0         1         0         1         1         0
3          1         0         1         0         0         0         0         0
4          1         0         0         1         1         1         0         1
5          1         1         0         1         0         1         1         0
6          1         0         1         0         0         0         0         1
7          0         0         1         0         0         0         0         0
8          0         1         1         1         1         1         0         0
9          1         0         1         1         0         1         1         1
10         0         0         1         1         0         0         1         1
11         1         0         1         0         1         1         1         0
12         0         1         1         0         1         0         1         0
13         0         0         0         1         1         1         1         0
14         0         0         1         1         0         1         1         1
15         1         0         0         1         0         1         0         0
16         1         0         0         0         0         0         0         1
17         0         0         1         1         1         0         0         1
18         0         0         1         1         1         0         0         1
19         0         0         1         0         0         0         1         0
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement