I have a table that looks like this.
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | s |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 5 | 0 | 4 | 10 | 4 |
4 | 1 | 8 | 0 | 15 | 0 | 4 | 10 | 10 |
I need to know at which position or column the row-level cumulative sum for the first six columns (m1 to m6
) either equals or exceeds the value at s
.
For this table, at row 1, it is 0 + 1 + 0 + 0 + 5 = 6, which is greater than the value at s
(4); this occurs at m5
, or the fifth column from the left, so the expected output is 5.
Similarly, for the second row, it is 4 + 1 + 8 = 13, which is greater than the value at s
(10); this occurs at m3
, or the third column from the left, so the expected output is 3.
The output also needs to be added as a column to the table, so that the final version looks like this:
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | s | output |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 5 | 0 | 4 | 10 | 4 | 5 |
4 | 1 | 8 | 0 | 15 | 0 | 4 | 10 | 10 | 3 |
The table has around 7000 rows. I couldn’t figure out a way to use cumsum()
conditionally, either with pandas or numpy.
Is there a way to accomplish without a loop? (Though loops are ok too)
Advertisement
Answer
you can use cumsum
on axis=1
with get_indexer
on the df.columns
:
df['output'] = df.columns.get_indexer(df.drop("s",1).cumsum(axis=1) .ge(df['s'],axis=0).idxmax(axis=1))+1
print(df) m1 m2 m3 m4 m5 m6 m7 m8 s output 0 0 1 0 0 5 0 4 10 4 5 1 4 1 8 0 15 0 4 10 10 3
EDIT:
There can be situations where none of the column in a row satisfies this condition , in that case, you may use a condition to check (expect a -1 where the condition doesnot match for any column in a row):
c = df.drop("s",1).cumsum(axis=1).ge(df['s'],axis=0) df['output'] = df.columns.get_indexer(c.idxmax(1).where(c.any(1)))+1