Is there a method I can use to output the inferred schema on a large CSV using pandas? In addition, any way to have it tell me with that type if it is nullable/blank based off the CSV? File is about 500k rows with 250 columns.
With my new job, I’m constantly being handed CSV files with zero format documentation.
Advertisement
Answer
Is it necessary to load the whole csv file? At least you could use the read_csv function if you know the separator or doing a cat of the file to know the separator. Then use the .info():
df = pd.read_csv(path_to_file,...) df.info()