I am using an AirBnb dataset. I have a column, ‘host_since’. The column contains date objects in the format of ‘DD/MM/YYYY’: for example, 24/09/2008. The columns’ data shows the date that an individual became a host.
I want to create a new column in my dataframe that contains the the number of days since the host first joined. I am aware of the to_datetime
method but cant quite understand the documentation.
note: df[‘host_days’] does not exist in the df. I want to create a column then assign the appropriate steps to get my outcome
def add_host_days(df): df['host_days'] = df['host_since'].to_datetime() return df
Any ideas on how to do so? Thank you for your input.
Advertisement
Answer
You can try this:
def add_host_days(df): df['host_days'] = (pd.Timestamp.now() - pd.to_datetime(df['host_since'], dayfirst=True)).dt.days # If you original date fields have invalid dates and would like this number of days to be in integer: df['host_days'] = df['host_days'].astype('Int64') return df
Demo
Suppose you have a dataframe like this:
guest_id host_since 0 A0001 24/09/2008 1 A0002 25/09/2008 2 A0003 29/09/2008 3 A8788 20/05/2021
Then you run the code:
new_df = add_host_days(df)
Result:
print(new_df) guest_id host_since host_days 0 A0001 24/09/2008 4629 1 A0002 25/09/2008 4628 2 A0003 29/09/2008 4624 3 A8788 20/05/2021 8