I have a dataframe as follows:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63 5 6.0 63.0 80 -26 0 33CATA 11 NaN 1 1 19780116 930 3 C21 57 58 105 107.0 61.0 75 18 1 33CATA 11 NaN 2 2 19780116 930 4 B01 -17 -16 21 12.0 28.0 1 -28 0 33CATA 11 NaN 3 3 19780117 750 3 C21 67 69 95 137.0 70.0 72 18 1 44CATA 11 NaN 4 4 19780117 750 6 A01 - 4 - 2.0 53.0 28 23 1 -27 0 44CATA
In the last row (and several others in the full dataframe) the negative signs are misplaced.
My question is if a negative sign is there at i-th
position then it should merge i+1 th
element and shift all other row elements to the left such that it the final output should be
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63.0 5.0 6.0 63.0 80 -26 0 33CATA 11.0 NaN 1 1 19780116 930 3 C21 57 58.0 105.0 107.0 61.0 75 18 1 33CATA 11.0 NaN 2 2 19780116 930 4 B01 -17 -16.0 21.0 12.0 28.0 1 -28 0 33CATA 11.0 NaN 3 3 19780117 750 3 C21 67 69.0 95.0 137.0 70.0 72 18 1 44CATA 11.0 NaN 4 4 19780117 750 6 A01 -4 -2.0 53.0 28.0 23.0 1 -27 0 44CATA NaN NaN
Advertisement
Answer
Setup
>>> df 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63 5 6.0 63.0 80 -26 0 33CATA 11 NaN 1 1 19780116 930 3 C21 57 58 105 107.0 61.0 75 18 1 33CATA 11 NaN 2 2 19780116 930 4 B01 -17 -16 21 12.0 28.0 1 -28 0 33CATA 11 NaN 3 3 19780117 750 3 C21 67 69 95 137.0 70.0 72 18 1 44CATA 11 NaN 4 4 19780117 750 6 A01 - 4 - 2.0 53.0 28 23 1 -27 0 44CATA
Solution
c = df.copy() m1 = c.eq('-') m2 = m1.shift(axis=1, fill_value=False) c.update(c.astype(str)[m2].radd('-').astype(float)) c = c.mask(m1) i = np.argsort(c.isna(), axis=1) c[:] = c.values[np.arange(len(c))[:, None], i]
Explanations
Create a copy of the dataframe named as c
and compare with the character -
to create a boolean mask m1
>>> m1 = c.eq('-') >>> m1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 False False False False False False False False False False False False False False False False 1 False False False False False False False False False False False False False False False False 2 False False False False False False False False False False False False False False False False 3 False False False False False False False False False False False False False False False False 4 False False False False False True False True False False False False False False False False
Shift the boolean mask m1
along axis=1
to create a boolean mask m2
>>> m2 = m1.shift(axis=1, fill_value=False) >>> m2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 False False False False False False False False False False False False False False False False 1 False False False False False False False False False False False False False False False False 2 False False False False False False False False False False False False False False False False 3 False False False False False False False False False False False False False False False False 4 False False False False False False True False True False False False False False False False
Update the values in the dataframe c
by adding the character -
to values indexed by boolean mask m2
>>> c.update(c.astype(str)[m2].radd('-').astype(float)) >>> c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11 NaN 1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11 NaN 2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11 NaN 3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11 NaN 4 4 19780117 750 6 A01 - -4.0 - -2.0 53.0 28 23 1 -27 0 44CATA
Mask the values in dataframe c
corresponding to mask m1
>>> c = c.mask(m1) >>> c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11 NaN 1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11 NaN 2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11 NaN 3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11 NaN 4 4 19780117 750 6 A01 NaN -4.0 NaN -2.0 53.0 28 23 1 -27 0 44CATA
Create a boolean mask with isna
and apply argsort
along axis=1
to get the indices that would sort the dataframe c
while also maintaining the relative order
>>> i = np.argsort(c.isna(), axis=1) >>> i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 0 1 2 3 4 6 8 9 10 11 12 13 14 15 5 7
Sort
the values in the dataframe c
along axis=1
using the above indices obtained using argsort
and assign the sorted values back to the dataframe c
>>> c[:] = c.values[np.arange(len(c))[:, None], i] >>> c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11.0 NaN 1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11.0 NaN 2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11.0 NaN 3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11.0 NaN 4 4 19780117 750 6 A01 -4.0 -2.0 53.0 28.0 23.0 1 -27 0 44CATA NaN NaN