I’m having trouble describing exactly what I want to achieve. I’ve tried looking here on stack to find others with the same problem, but are unable to find any. So I will try to describe exactly what I want and give you a sample setup code.
I would like to have a function that gives me a new column/pd.Series. This new column has boolean TRUE values (or int’s) that are based on a certain condition.
The condition being as follows. There are N number of columns (example is 8), each with the same name but ending with one new number. IE, column_1, column_2 etc. The function I need is twofold:
- If N is given, look for/through each column row and see if it and the next N columns row are also TRUE/1 ..
- If N is NOT given, look for each column row and if all next columns rows are also TRUE/1, with the numbers as ID’s to look at the column.
def get_df_series(df: pd.DataFrame, columns_ids: list, n: int=8) -> pd.Dataframe: for i in columns_ids: # missing code here .. i dont know if this would be the way to go pass return df def create_dataframe(numbers: list) -> pd.DataFrame: df = pd.DataFrame() # empty df # create a column for each number with the number as ID and with random boolean values as int's for i in numbers: df[f'column_{i}'] = np.random.randint(2, size=20) return df if __name__=="__main__": numbers = [1, 2, 3, 4, 5, 6, 7, 8] df = create_dataframe(numbers=numbers) df = get_df_series(df=df, numbers=numbers, n=3)
I have some experience with Pandas dataframes and know how to create IF/ELSE things with np.select for example.
(function) select(condlist: Sequence[ArrayLike], choicelist: Sequence[ArrayLike], default: ArrayLike = ...) -> NDArray
The problem I’m running into is that I don’t know how to make a conditional statement if I don’t know how many columns are ahead. For example, if I want to know for column_5 if the next 3 are also true, I can hardcode this, but I have columns up to id 20 and would love to not have to hardcode everything from column_1 to column_20 if I want to know if all conditions in all those columns are true.
Now the problem is that I don’t know if this is even possible. So any feedback would be appreaciated. Even just giving me a hint on where to look for a way to do this.
EDIT: What I forgot to mention was that there will be random columns in between that obviously cannot be taking into the equation. For example, there will be main_column_1, main_column_2, main_column_3, side_column_1, side_column_2, right_column_1, main_column_3, main_column_4 etc…
The answer Corralien gave is correct, but I should’ve made my question more clearer.
I need to be able to, say, look at main_column and for that one look ahead N amount of columns of the same type: main_column.
Advertisement
Answer
Try:
n = 3 out = (df.rolling(n, min_periods=1, axis=1).sum() .shift(-n+1, fill_value=0, axis=1).eq(n).astype(int) .rename(columns=lambda x: 'result_' + x.split('_')[1]))
Output:
>>> out result_1 result_2 result_3 result_4 result_5 result_6 result_7 result_8 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 5 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 8 0 1 1 1 0 0 0 0 9 0 0 0 0 0 1 0 0 10 0 0 0 0 0 0 0 0 11 0 0 0 0 1 0 0 0 12 0 0 0 0 0 0 0 0 13 0 0 0 1 1 0 0 0 14 0 0 0 0 0 1 0 0 15 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 17 0 0 1 0 0 0 0 0 18 0 0 1 0 0 0 0 0 19 0 0 0 0 0 0 0 0
Input:
>>> df column_1 column_2 column_3 column_4 column_5 column_6 column_7 column_8 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 1 1 0 2 1 1 0 1 0 1 1 0 3 1 0 1 0 0 0 0 0 4 1 0 0 1 1 1 0 1 5 1 1 0 1 0 1 1 0 6 1 0 1 0 0 0 0 1 7 0 0 1 0 0 0 0 0 8 0 1 1 1 1 1 0 0 9 1 0 1 1 0 1 1 1 10 0 0 1 1 0 0 1 1 11 1 0 1 0 1 1 1 0 12 0 1 1 0 1 0 1 0 13 0 0 0 1 1 1 1 0 14 0 0 1 1 0 1 1 1 15 1 0 0 1 0 1 0 0 16 1 0 0 0 0 0 0 1 17 0 0 1 1 1 0 0 1 18 0 0 1 1 1 0 0 1 19 0 0 1 0 0 0 1 0