I get ValueError: cannot convert float NaN to integer for following:
JavaScript
x
3
1
df = pandas.read_csv('zoom11.csv')
2
df[['x']] = df[['x']].astype(int)
3
- The “x” is a column in the csv file, I cannot spot any float NaN in the file, and I don’t understand the error or why I am getting it.
- When I read the column as String, then it has values like -1,0,1,…2000, all look very nice int numbers to me.
- When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
- I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
- The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
- Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.
Update: Using the hints in comments/answers I got my data clean with this:
JavaScript
1
10
10
1
# x contained NaN
2
df = df[~df['x'].isnull()]
3
4
# Y contained some other garbage, so null check was not enough
5
df = df[df['y'].str.isnumeric()]
6
7
# final conversion now worked
8
df[['x']] = df[['x']].astype(int)
9
df[['y']] = df[['y']].astype(int)
10
Advertisement
Answer
For identifying NaN
values use boolean indexing
:
JavaScript
1
2
1
print(df[df['x'].isnull()])
2
Then for removing all non-numeric values use to_numeric
with parameter errors='coerce'
– to replace non-numeric values to NaN
s:
JavaScript
1
2
1
df['x'] = pd.to_numeric(df['x'], errors='coerce')
2
And for remove all rows with NaN
s in column x
use dropna
:
JavaScript
1
2
1
df = df.dropna(subset=['x'])
2
Last convert values to int
s:
JavaScript
1
2
1
df['x'] = df['x'].astype(int)
2