I was trying to find difference of a series of dates and a date. for example, the series is from may1 to june1 which is
date = pd.DataFrame()
In [0]: date['test'] = pd.date_range("2021-05-01", "2021-06-01", freq = "D")
Out[0]: date
test
0 2021-05-01 00:00:00
1 2021-05-02 00:00:00
2 2021-05-03 00:00:00
3 2021-05-04 00:00:00
4 2021-05-05 00:00:00
5 2021-05-06 00:00:00
6 2021-05-07 00:00:00
7 2021-05-08 00:00:00
8 2021-05-09 00:00:00
9 2021-05-10 00:00:00
In[1]
date['test'] = date['test'].dt.date
Out[1]:
test
0 2021-05-01
1 2021-05-02
2 2021-05-03
3 2021-05-04
4 2021-05-05
5 2021-05-06
6 2021-05-07
7 2021-05-08
8 2021-05-09
9 2021-05-10
In[2]:date['base'] = dt.strptime("2021-05-01",'%Y-%m-%d')
Out[2]:
0 2021-05-01 00:00:00
1 2021-05-01 00:00:00
2 2021-05-01 00:00:00
3 2021-05-01 00:00:00
4 2021-05-01 00:00:00
5 2021-05-01 00:00:00
6 2021-05-01 00:00:00
7 2021-05-01 00:00:00
8 2021-05-01 00:00:00
9 2021-05-01 00:00:00
In[3]:date['base'] = date['base'].dt.date
Out[3]:
base
0 2021-05-01
1 2021-05-01
2 2021-05-01
3 2021-05-01
4 2021-05-01
5 2021-05-01
6 2021-05-01
7 2021-05-01
8 2021-05-01
9 2021-05-01
In[4]:date['test']-date['base']
Out[4]:
diff
0 0 days 00:00:00.000000000
1 1 days 00:00:00.000000000
2 2 days 00:00:00.000000000
3 3 days 00:00:00.000000000
4 4 days 00:00:00.000000000
5 5 days 00:00:00.000000000
6 6 days 00:00:00.000000000
7 7 days 00:00:00.000000000
8 8 days 00:00:00.000000000
9 9 days 00:00:00.000000000
10 10 days 00:00:00.000000000
the only thing i could get is this. I don’t want anything other than the number 1-10 cuz i need them for further numerical calculation but i can’t get rid of those. Also how could i construct a time series which just outputs the date not the hms after it? i don’t want to manually .dt.date for all of those and it sometimes mess things up
Advertisement
Answer
You don’t need to create a column base for this, simply do:
>>> (date['test'] - pd.to_datetime("2021-05-01", format='%Y-%m-%d')).dt.days
0 0
1 1
2 2
3 3
4 4
...
27 27
28 28
29 29
30 30
31 31
Name: test, dtype: int64