Skip to content
Advertisement

Pandas: using column of date to calculate number of days

I am using an AirBnb dataset. I have a column, ‘host_since’. The column contains date objects in the format of ‘DD/MM/YYYY’: for example, 24/09/2008. The columns’ data shows the date that an individual became a host.

I want to create a new column in my dataframe that contains the the number of days since the host first joined. I am aware of the to_datetime method but cant quite understand the documentation.

note: df[‘host_days’] does not exist in the df. I want to create a column then assign the appropriate steps to get my outcome

def add_host_days(df):    
    df['host_days'] = df['host_since'].to_datetime()
  return df

Any ideas on how to do so? Thank you for your input.

Advertisement

Answer

You can try this:

def add_host_days(df):    
    df['host_days'] = (pd.Timestamp.now() - pd.to_datetime(df['host_since'], dayfirst=True)).dt.days

    # If you original date fields have invalid dates and would like this number of days to be in integer:
    df['host_days'] = df['host_days'].astype('Int64')   

    return df

Demo

Suppose you have a dataframe like this:

  guest_id  host_since
0    A0001  24/09/2008
1    A0002  25/09/2008
2    A0003  29/09/2008
3    A8788  20/05/2021

Then you run the code:

new_df = add_host_days(df)

Result:

print(new_df)


  guest_id  host_since  host_days
0    A0001  24/09/2008       4629
1    A0002  25/09/2008       4628
2    A0003  29/09/2008       4624
3    A8788  20/05/2021          8
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement