I am trying to create a new column which appends the row index of next non-null value next to the current non-null value using the following df as a starting point:
ts = [ datetime.strptime('2016-06-19 22:01:22.229', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-19 23:32:08.109', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-20 02:50:22.181', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-20 06:12:44.249', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-20 19:27:22.129', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-21 11:39:08.119', '%Y-%m-%d %H:%M:%S.%f'), datetime.strptime('2016-06-21 16:59:22.610', '%Y-%m-%d %H:%M:%S.%f') ] score = [ np.nan, np.nan, 77.692, np.nan, 77.709, np.nan, 76.145 ] df = pd.DataFrame(data={'date': ts, 'jack': score})
The output would look like this:
date jack nxt_nn_value 0 2016-06-19 22:01:22.229 NaN NaN 1 2016-06-19 23:32:08.109 NaN NaN 2 2016-06-20 02:50:22.181 77.692 4 3 2016-06-20 06:12:44.249 NaN NaN 4 2016-06-20 19:27:22.129 77.709 6 5 2016-06-21 11:39:08.119 NaN NaN 6 2016-06-21 16:59:22.610 76.145 NaN
I have found a link that touches on the issue but I could not adapt to suit.. How can I get the index of next non-NaN number with series in pandas?
Thank you in advance for any help you can provide.
Advertisement
Answer
We can find the notnull
values in jack
. Then shift
up. Then use loc
to assign the results, and values
to break index alignment:
m = df['jack'].notnull() df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values
df
:
date jack nxt_nn_value 0 2016-06-19 22:01:22.229 NaN NaN 1 2016-06-19 23:32:08.109 NaN NaN 2 2016-06-20 22:50:22.181 77.692 4.0 3 2016-06-20 06:12:44.249 NaN NaN 4 2016-06-20 19:27:22.129 77.709 6.0 5 2016-06-21 11:39:08.119 NaN NaN 6 2016-06-21 16:59:22.610 76.145 NaN
Explanation:
First find where the values in jack
are notnull
:
m = df['jack'].notnull() 0 False 1 False 2 True 3 False 4 True 5 False 6 True Name: jack, dtype: bool
Filter to get the index there are many equivalent ways to do this:
m[m].index Int64Index([2, 4, 6], dtype='int64')
Then convert to a Series so it can be shifted this will give the “next” index:
pd.Series(m[m].index).shift(-1) 0 4.0 1 6.0 2 NaN dtype: float64
Then assign back to the DataFrame only where values are notnull
, values
is needed to break index alignment between the Series and the DataFrame:
df.loc[m, 'nxt_nn_value'] = pd.Series(m[m].index).shift(-1).values