I have a pandas DataFrame named df. With df.dtypes
I can print on screen:
arrival_time object
departure_time object
drop_off_type int64
extra object
pickup_type int64
stop_headsign object
stop_id object
stop_sequence int64
trip_id object
dtype: object
I want to save this information so that I can compare it with other data, type-cast things elsewhere, etc. I want to save it into to a local file, recover it elsewhere in another program where the data can’t go. But I’m not able to figure out how. Showing the results of various conversions.
df.dtypes.to_dict()
{'arrival_time': dtype('O'),
'departure_time': dtype('O'),
'drop_off_type': dtype('int64'),
'extra': dtype('O'),
'pickup_type': dtype('int64'),
'stop_headsign': dtype('O'),
'stop_id': dtype('O'),
'stop_sequence': dtype('int64'),
'trip_id': dtype('O')}
----
df.dtypes.to_json()
'{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}'
----
json.dumps( df.dtypes.to_dict() )
TypeError: dtype('O') is not JSON serializable
----
list(xdf.dtypes)
[dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('int64'),
dtype('O'),
dtype('O'),
dtype('int64'),
dtype('O')]
How to save and export/archive dtype information of a pandas DataFrame?
Advertisement
Answer
pd.DataFrame.dtypes
returns a pd.Series
object. This means you can manipulate it as you would any regular series in Pandas:
df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]})
res = df.dtypes.to_frame('dtypes').reset_index()
print(res)
index dtypes
0 A object
1 B float64
2 C int64
3 D bool
Output to csv / excel / pickle
You can then use any method you normally would to store a dataframe, such as to_csv
, to_excel
, to_pickle
, etc. Note for distribution pickle is not recommended as it is version dependent.
Output to json
If you wish to easily store and load as a dictionary, a popular format is json
. As you found, you need to convert to str
type first:
import json
# first create dictionary
d = res.set_index('index')['dtypes'].astype(str).to_dict()
with open('types.json', 'w') as f:
json.dump(d, f)
with open('types.json', 'r') as f:
data_types = json.load(f)
print(data_types)
{'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}