Skip to content
Advertisement

Find ‘next’ non-null value in column

I am trying to create a new column which appends the row index of next non-null value next to the current non-null value using the following df as a starting point:

ts = [
        datetime.strptime('2016-06-19 22:01:22.229', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-19 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 16:59:22.610', '%Y-%m-%d %H:%M:%S.%f')
        ]

score = [ np.nan, np.nan, 77.692, np.nan, 77.709, np.nan, 76.145 ]


df = pd.DataFrame(data={'date': ts, 'jack': score})

The output would look like this:

                     date    jack  nxt_nn_value
0 2016-06-19 22:01:22.229     NaN           NaN
1 2016-06-19 23:32:08.109     NaN           NaN
2 2016-06-20 02:50:22.181  77.692             4
3 2016-06-20 06:12:44.249     NaN           NaN
4 2016-06-20 19:27:22.129  77.709             6
5 2016-06-21 11:39:08.119     NaN           NaN
6 2016-06-21 16:59:22.610  76.145           NaN

I have found a link that touches on the issue but I could not adapt to suit.. How can I get the index of next non-NaN number with series in pandas?

Thank you in advance for any help you can provide.

Advertisement

Answer

We can find the notnull values in jack. Then shift up. Then use loc to assign the results, and values to break index alignment:

m = df['jack'].notnull()
df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values

df:

                     date    jack  nxt_nn_value
0 2016-06-19 22:01:22.229     NaN           NaN
1 2016-06-19 23:32:08.109     NaN           NaN
2 2016-06-20 22:50:22.181  77.692           4.0
3 2016-06-20 06:12:44.249     NaN           NaN
4 2016-06-20 19:27:22.129  77.709           6.0
5 2016-06-21 11:39:08.119     NaN           NaN
6 2016-06-21 16:59:22.610  76.145           NaN

Explanation:

First find where the values in jack are notnull:

m = df['jack'].notnull()

0    False
1    False
2     True
3    False
4     True
5    False
6     True
Name: jack, dtype: bool

Filter to get the index there are many equivalent ways to do this:

m[m].index

Int64Index([2, 4, 6], dtype='int64')

Then convert to a Series so it can be shifted this will give the “next” index:

pd.Series(m[m].index).shift(-1)

0    4.0
1    6.0
2    NaN
dtype: float64

Then assign back to the DataFrame only where values are notnull, values is needed to break index alignment between the Series and the DataFrame:

df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement