I’ve read an SQL query into Pandas and the values are coming in as dtype ‘object’, although they are strings, dates and integers. I am able to convert the date ‘object’ to a Pandas datetime dtype, but I’m getting an error when trying to convert the string and integers.
Here is an example:
>>> import pandas as pd >>> df = pd.read_sql_query('select * from my_table', conn) >>> df id date purchase 1 abc1 2016-05-22 1 2 abc2 2016-05-29 0 3 abc3 2016-05-22 2 4 abc4 2016-05-22 0 >>> df.dtypes id object date object purchase object dtype: object
Converting the df['date']
to a datetime works:
>>> pd.to_datetime(df['date']) 1 2016-05-22 2 2016-05-29 3 2016-05-22 4 2016-05-22 Name: date, dtype: datetime64[ns]
But I get an error when trying to convert the df['purchase']
to an integer:
>>> df['purchase'].astype(int) .... pandas/lib.pyx in pandas.lib.astype_intsafe (pandas/lib.c:16667)() pandas/src/util.pxd in util.set_value_at (pandas/lib.c:67540)() TypeError: long() argument must be a string or a number, not 'java.lang.Long'
NOTE: I get a similar error when I tried .astype('float')
And when trying to convert to a string, nothing seems to happen.
>>> df['id'].apply(str) 1 abc1 2 abc2 3 abc3 4 abc4 Name: id, dtype: object
Advertisement
Answer
Documenting the answer that worked for me based on the comment by @piRSquared.
I needed to convert to a string first, then an integer.
>>> df['purchase'].astype(str).astype(int)