I have the following code:
import pandas as pd import numpy as np import csv filename = (r"C:UsersZAppDataRoamingMicrosoftWindowsStart MenuProgramsAnaconda3 (64- bit)diabetes.csv") raw_data = open(filename, 'rb') reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE) x = list(reader) data = (np.array(x)).astype('float') print(data.shape)
But it errors:
----> 7 x = list(reader) Error: iterator should return strings, not bytes (did you open the file in text mode?)
When I change 'rb'
to 'rt'
:
raw_data = open(filename, 'rt')
It says:
----> 8 data = (np.array(x)).astype('float') ValueError: could not convert string to float: 'Pregnancies'
And when I delete .astype('float')
, the result is (769, 9)
but the expected result is (768, 9)
.
It counts the header as data. Can you tell me why?
Advertisement
Answer
Before you do following:
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE) x = list(reader)
try
reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE) next(reader) x = list(reader)
which should skip the header of csvfiles.
It is described @ https://docs.python.org/3/library/csv.html#csv.csvreader.__next__