Skip to content
Advertisement

Python – How to clean time series data

I have a df which looks like this:

df = pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5,'NaN','NaN','NaN','NaN'],
                  'Dec':[2,8,4,7,9,1,5]})

enter image description here

I’m trying to create a new column called ‘First_Contract’:

  1. ‘First_Contract’ needs to take the third-last value of ‘Sep’ column, before ‘Sep’column reaches NaN.
  2. The subsequent values need to be filled with ‘Dec’ column values.

Desired output:

df2= pd.DataFrame({'Date':['2019-09-23','2019-09-24','2019-09-25','2019-09-26','2019-09-27','2019-09-28','2019-09-29'],
                  'Sep':[1,10,5,'NaN','NaN','NaN','NaN'],
                  'Dec':[2,8,4,7,9,1,5],
                  'First_Contract':[1,8,4,7,9,1,5]})

enter image description here

How do I go about to achieve this?

Advertisement

Answer

Let us do it step by step

df.Sep.replace({'NaN': np.nan}, inplace=True)
df['FC'] = df['Dec']
ids = df.Sep.last_valid_index()-2
df.loc[ids,'FC'] = df.Sep[ids]
df
Out[126]: 
         Date   Sep  Dec  First_Contract   FC
0  2019-09-23   1.0    2               1  1.0
1  2019-09-24  10.0    8               8  8.0
2  2019-09-25   5.0    4               4  4.0
3  2019-09-26   NaN    7               7  7.0
4  2019-09-27   NaN    9               9  9.0
5  2019-09-28   NaN    1               1  1.0
6  2019-09-29   NaN    5               5  5.0
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement