Trying to keep the same type after saving a dataframe in a csv file

Question

When I try to get my dataframe out of the csv file the type of the data changed. Is there a way I can avoid this? Answer csv files does not have a datatype definition header or something similar. So when your read a csv pandas tries to guess the types and this can change the datatypes. You have two

Accepted Answer

csv files does not have a datatype definition header or something similar.So when your read a csv pandas tries to guess the types and this can change the datatypes.You have two possibile solutions:Provide the datatype list when you do read_csv with dtype and parse_dates keywords (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)Use a different file format that store data with a schema (ex parquet)for example:import pandas as pddate = pd.to_datetime('01-01-2020')df=pd.DataFrame({'col1':[1,2,3,4],'col2':['a','b','b','d'],'col3':[date,date,date,date]})print('original n',df.dtypes)df.to_csv('testtype.csv',index=False)df_csv = pd.read_csv('testtype.csv')print('simple csv read n',df_csv.dtypes)df_csv = pd.read_csv('testtype.csv')print('csv datatypes n',df_csv.dtypes)df_csv = pd.read_csv('testtype.csv',parse_dates=[2])print('csv with parse dates n',df_csv.dtypes)df.to_parquet('testtype.pqt')df_pqt=pd.read_parquet('testtype.pqt')print('parquet  n',df_pqt.dtypes)that output:original  col1             int64col2            objectcol3    datetime64[ns]dtype: objectsimple csv read  col1     int64col2    objectcol3    objectdtype: objectcsv datatypes  col1     int64col2    objectcol3    objectdtype: objectcsv with parse dates  col1             int64col2            objectcol3    datetime64[ns]dtype: objectparquet   col1             int64col2            objectcol3    datetime64[ns]dtype: object

Advertisement

Answer