I have a pandas DataFrame named df. With df.dtypes
I can print on screen:
arrival_time object departure_time object drop_off_type int64 extra object pickup_type int64 stop_headsign object stop_id object stop_sequence int64 trip_id object dtype: object
I want to save this information so that I can compare it with other data, type-cast things elsewhere, etc. I want to save it into to a local file, recover it elsewhere in another program where the data can’t go. But I’m not able to figure out how. Showing the results of various conversions.
df.dtypes.to_dict() {'arrival_time': dtype('O'), 'departure_time': dtype('O'), 'drop_off_type': dtype('int64'), 'extra': dtype('O'), 'pickup_type': dtype('int64'), 'stop_headsign': dtype('O'), 'stop_id': dtype('O'), 'stop_sequence': dtype('int64'), 'trip_id': dtype('O')} ---- df.dtypes.to_json() '{"arrival_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"departure_time":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"drop_off_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"extra":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"pickup_type":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"stop_headsign":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"},"stop_sequence":{"alignment":4,"byteorder":"=","descr":[["","<i8"]],"flags":0,"isalignedstruct":false,"isnative":true,"kind":"i","name":"int64","ndim":0,"num":9,"str":"<i8"},"trip_id":{"alignment":4,"byteorder":"|","descr":[["","|O"]],"flags":63,"isalignedstruct":false,"isnative":true,"kind":"O","name":"object","ndim":0,"num":17,"str":"|O"}}' ---- json.dumps( df.dtypes.to_dict() ) ... TypeError: dtype('O') is not JSON serializable ---- list(xdf.dtypes) [dtype('O'), dtype('O'), dtype('int64'), dtype('O'), dtype('int64'), dtype('O'), dtype('O'), dtype('int64'), dtype('O')]
How to save and export/archive dtype information of a pandas DataFrame?
Advertisement
Answer
pd.DataFrame.dtypes
returns a pd.Series
object. This means you can manipulate it as you would any regular series in Pandas:
df = pd.DataFrame({'A': [''], 'B': [1.0], 'C': [1], 'D': [True]}) res = df.dtypes.to_frame('dtypes').reset_index() print(res) index dtypes 0 A object 1 B float64 2 C int64 3 D bool
Output to csv / excel / pickle
You can then use any method you normally would to store a dataframe, such as to_csv
, to_excel
, to_pickle
, etc. Note for distribution pickle is not recommended as it is version dependent.
Output to json
If you wish to easily store and load as a dictionary, a popular format is json
. As you found, you need to convert to str
type first:
import json # first create dictionary d = res.set_index('index')['dtypes'].astype(str).to_dict() with open('types.json', 'w') as f: json.dump(d, f) with open('types.json', 'r') as f: data_types = json.load(f) print(data_types) {'A': 'object', 'B': 'float64', 'C': 'int64', 'D': 'bool'}