I was trying to find difference of a series of dates and a date. for example, the series is from may1 to june1 which is
date = pd.DataFrame() In [0]: date['test'] = pd.date_range("2021-05-01", "2021-06-01", freq = "D") Out[0]: date test 0 2021-05-01 00:00:00 1 2021-05-02 00:00:00 2 2021-05-03 00:00:00 3 2021-05-04 00:00:00 4 2021-05-05 00:00:00 5 2021-05-06 00:00:00 6 2021-05-07 00:00:00 7 2021-05-08 00:00:00 8 2021-05-09 00:00:00 9 2021-05-10 00:00:00 In[1] date['test'] = date['test'].dt.date Out[1]: test 0 2021-05-01 1 2021-05-02 2 2021-05-03 3 2021-05-04 4 2021-05-05 5 2021-05-06 6 2021-05-07 7 2021-05-08 8 2021-05-09 9 2021-05-10 In[2]:date['base'] = dt.strptime("2021-05-01",'%Y-%m-%d') Out[2]: 0 2021-05-01 00:00:00 1 2021-05-01 00:00:00 2 2021-05-01 00:00:00 3 2021-05-01 00:00:00 4 2021-05-01 00:00:00 5 2021-05-01 00:00:00 6 2021-05-01 00:00:00 7 2021-05-01 00:00:00 8 2021-05-01 00:00:00 9 2021-05-01 00:00:00 In[3]:date['base'] = date['base'].dt.date Out[3]: base 0 2021-05-01 1 2021-05-01 2 2021-05-01 3 2021-05-01 4 2021-05-01 5 2021-05-01 6 2021-05-01 7 2021-05-01 8 2021-05-01 9 2021-05-01 In[4]:date['test']-date['base'] Out[4]: diff 0 0 days 00:00:00.000000000 1 1 days 00:00:00.000000000 2 2 days 00:00:00.000000000 3 3 days 00:00:00.000000000 4 4 days 00:00:00.000000000 5 5 days 00:00:00.000000000 6 6 days 00:00:00.000000000 7 7 days 00:00:00.000000000 8 8 days 00:00:00.000000000 9 9 days 00:00:00.000000000 10 10 days 00:00:00.000000000
the only thing i could get is this. I don’t want anything other than the number 1-10 cuz i need them for further numerical calculation but i can’t get rid of those. Also how could i construct a time series which just outputs the date not the hms after it? i don’t want to manually .dt.date for all of those and it sometimes mess things up
Advertisement
Answer
You don’t need to create a column base
for this, simply do:
>>> (date['test'] - pd.to_datetime("2021-05-01", format='%Y-%m-%d')).dt.days 0 0 1 1 2 2 3 3 4 4 ... 27 27 28 28 29 29 30 30 31 31 Name: test, dtype: int64