Prevent pandas read_csv treating first row as header of column names

Question

I'm reading in a pandas DataFrame using pd.read_csv. I want to keep the first row as data, however it keeps getting converted to column names. I tried header=False but this just deleted it entirely. (Note on my input data: I have a string (st = 'n'.join(lst)) that I convert to a file-like object (io.StringIO(st)), then build the csv from that

Accepted Answer

You want header=None the False gets type promoted to int into 0 see the docs emphasis mine:  header : int or list of ints, default ‘infer’ Row number(s) to use as  the column names, and the start of the data. Default behavior is as if  set to 0 if no names passed, otherwise None. Explicitly pass header=0  to be able to replace existing names. The header can be a list of  integers that specify row locations for a multi-index on the columns  e.g. [0,1,3]. Intervening rows that are not specified will be skipped  (e.g. 2 in this example is skipped). Note that this parameter ignores  commented lines and empty lines if skip_blank_lines=True, so header=0  denotes the first line of data rather than the first line of the file.You can see the difference in behaviour, first with header=0:In [95]:import ioimport pandas as pdt="""a,b,c0,1,23,4,5"""pd.read_csv(io.StringIO(t), header=0)Out[95]:   a  b  c0  0  1  21  3  4  5Now with None:In [96]:pd.read_csv(io.StringIO(t), header=None)Out[96]:   0  1  20  a  b  c1  0  1  22  3  4  5Note that in latest version 0.19.1, this will now raise a TypeError:In [98]:pd.read_csv(io.StringIO(t), header=False)  TypeError: Passing a bool to header is invalid. Use header=None for no  header or header=int or list-like of ints to specify the row(s) making  up the column names

Advertisement

Answer