I’m reading in a pandas DataFrame
using pd.read_csv
. I want to keep the first row as data, however it keeps getting converted to column names.
- I tried
header=False
but this just deleted it entirely.
(Note on my input data: I have a string (st = 'n'.join(lst)
) that I convert to a file-like object (io.StringIO(st)
), then build the csv
from that file object.)
Advertisement
Answer
You want header=None
the False
gets type promoted to int
into 0
see the docs emphasis mine:
header : int or list of ints, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
You can see the difference in behaviour, first with header=0
:
In [95]: import io import pandas as pd t="""a,b,c 0,1,2 3,4,5""" pd.read_csv(io.StringIO(t), header=0) Out[95]: a b c 0 0 1 2 1 3 4 5
Now with None
:
In [96]: pd.read_csv(io.StringIO(t), header=None) Out[96]: 0 1 2 0 a b c 1 0 1 2 2 3 4 5
Note that in latest version 0.19.1
, this will now raise a TypeError
:
In [98]: pd.read_csv(io.StringIO(t), header=False)
TypeError: Passing a bool to header is invalid. Use header=None for no header or header=int or list-like of ints to specify the row(s) making up the column names