I am trying to create a new column which appends the row index of next non-null value next to the current non-null value using the following df as a starting point:
ts = [
        datetime.strptime('2016-06-19 22:01:22.229', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-19 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-20 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'),
        datetime.strptime('2016-06-21 16:59:22.610', '%Y-%m-%d %H:%M:%S.%f')
        ]
score = [ np.nan, np.nan, 77.692, np.nan, 77.709, np.nan, 76.145 ]
df = pd.DataFrame(data={'date': ts, 'jack': score})
The output would look like this:
date jack nxt_nn_value 0 2016-06-19 22:01:22.229 NaN NaN 1 2016-06-19 23:32:08.109 NaN NaN 2 2016-06-20 02:50:22.181 77.692 4 3 2016-06-20 06:12:44.249 NaN NaN 4 2016-06-20 19:27:22.129 77.709 6 5 2016-06-21 11:39:08.119 NaN NaN 6 2016-06-21 16:59:22.610 76.145 NaN
I have found a link that touches on the issue but I could not adapt to suit.. How can I get the index of next non-NaN number with series in pandas?
Thank you in advance for any help you can provide.
Advertisement
Answer
We can find the notnull values in jack. Then shift up. Then use loc to assign the results, and values to break index alignment:
m = df['jack'].notnull() df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values
df:
date jack nxt_nn_value 0 2016-06-19 22:01:22.229 NaN NaN 1 2016-06-19 23:32:08.109 NaN NaN 2 2016-06-20 22:50:22.181 77.692 4.0 3 2016-06-20 06:12:44.249 NaN NaN 4 2016-06-20 19:27:22.129 77.709 6.0 5 2016-06-21 11:39:08.119 NaN NaN 6 2016-06-21 16:59:22.610 76.145 NaN
Explanation:
First find where the values in jack are notnull:
m = df['jack'].notnull() 0 False 1 False 2 True 3 False 4 True 5 False 6 True Name: jack, dtype: bool
Filter to get the index there are many equivalent ways to do this:
m[m].index Int64Index([2, 4, 6], dtype='int64')
Then convert to a Series so it can be shifted this will give the “next” index:
pd.Series(m[m].index).shift(-1) 0 4.0 1 6.0 2 NaN dtype: float64
Then assign back to the DataFrame only where values are notnull, values is needed to break index alignment between the Series and the DataFrame:
df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values