I have the following code:
import pandas as pd
import numpy as np
import csv
filename = (r"C:UsersZAppDataRoamingMicrosoftWindowsStart MenuProgramsAnaconda3 (64- bit)diabetes.csv")
raw_data = open(filename, 'rb')
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE)
x = list(reader)
data = (np.array(x)).astype('float')
print(data.shape)
But it errors:
----> 7 x = list(reader) Error: iterator should return strings, not bytes (did you open the file in text mode?)
When I change 'rb' to 'rt':
raw_data = open(filename, 'rt')
It says:
----> 8 data = (np.array(x)).astype('float')
ValueError: could not convert string to float: 'Pregnancies'
And when I delete .astype('float'), the result is (769, 9) but the expected result is (768, 9).
It counts the header as data. Can you tell me why?
Advertisement
Answer
Before you do following:
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE) x = list(reader)
try
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE) next(reader) x = list(reader)
which should skip the header of csvfiles.
It is described @ https://docs.python.org/3/library/csv.html#csv.csvreader.__next__