I have a table that looks like this.
| m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | s | 
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 5 | 0 | 4 | 10 | 4 | 
| 4 | 1 | 8 | 0 | 15 | 0 | 4 | 10 | 10 | 
I need to know at which position or column the row-level cumulative sum for the first six columns (m1 to m6) either equals or exceeds the value at s.
For this table, at row 1, it is 0 + 1 + 0 + 0 + 5 = 6, which is greater than the value at s (4); this  occurs at m5, or the fifth column from the left, so the expected output is 5.
Similarly, for the second row, it is 4 + 1 + 8 = 13, which is greater than the value at s (10); this  occurs at m3, or the third column from the left, so the expected output is 3.
The output also needs to be added as a column to the table, so that the final version looks like this:
| m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | s | output | 
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 5 | 0 | 4 | 10 | 4 | 5 | 
| 4 | 1 | 8 | 0 | 15 | 0 | 4 | 10 | 10 | 3 | 
The table has around 7000 rows. I couldn’t figure out a way to use cumsum() conditionally, either with pandas or numpy.
Is there a way to accomplish without a loop? (Though loops are ok too)
Advertisement
Answer
you can use cumsum on axis=1 with get_indexer on the df.columns:
df['output'] = df.columns.get_indexer(df.drop("s",1).cumsum(axis=1)
                       .ge(df['s'],axis=0).idxmax(axis=1))+1
print(df) m1 m2 m3 m4 m5 m6 m7 m8 s output 0 0 1 0 0 5 0 4 10 4 5 1 4 1 8 0 15 0 4 10 10 3
EDIT:
There can be situations where none of the column in a row satisfies this condition , in that case, you may use a condition to check (expect a -1 where the condition doesnot match for any column in a row):
c = df.drop("s",1).cumsum(axis=1).ge(df['s'],axis=0)
df['output'] = df.columns.get_indexer(c.idxmax(1).where(c.any(1)))+1
