below is my dataframe
from pandas import Timestamp df = pd.DataFrame({'Year': [Timestamp('2023-03-14 00:00:00'),Timestamp('2063-03-15 00:00:00'),Timestamp('2043-03-21 00:00:00'),Timestamp('2053-10-09 00:00:00')], 'offset' : [1, 9, 8, 1] })
when I convert my ‘Year” column to list(), they are saved as time stamp
>>> df['Year'].to_list() [Timestamp('2023-03-14 00:00:00'), Timestamp('2063-03-15 00:00:00'), Timestamp('2043-03-21 00:00:00'), Timestamp('2053-10-09 00:00:00')]
However, when I convert to values they are saved as datetime64
>>> df['Year'].values array(['2023-03-14T00:00:00.000000000', '2063-03-15T00:00:00.000000000', '2043-03-21T00:00:00.000000000', '2053-10-09T00:00:00.000000000'], dtype='datetime64[ns]')
How do I get my array in Timestamp
itself (instead of datetime64
format)?
Advertisement
Answer
It’s converted to a datetime64
because numpy arrays only hold certain datatypes. Timestamp
objects are not one of them. This has to do with how numpy arrays are stored as one contiguous block in memory, and handled by numpy’s C-backend.
Starting v1.7, core datatypes datetime64
and timedelta64
were added to support these functionalities, but they still store data in memory as integers citation needed
You can create a numpy array of Timestamp
objects with np.array(df.Year.to_list())
, but that will result in an array having dtype=object
array([Timestamp('2023-03-14 00:00:00'), Timestamp('2063-03-15 00:00:00'), Timestamp('2043-03-21 00:00:00'), Timestamp('2053-10-09 00:00:00')], dtype=object)
For more information on what this entails, see this answer
Creating an array with
dtype=object
is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).