Skip to content
Advertisement

I am getting ValueError: invalid literal for int() with base 10 with np.where function

I want to change ‘not available’ value in a df column into 0, and for the rest of the values to change them into integers.

Unique values in the column are:

['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']

I run the following code to change values to integers:

df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))

I expect that the above would turn all values into integers, yet I get the value error

ValueError: invalid literal for int() with base 10: 'not available'

Any suggestion why the code does not work?

Advertisement

Answer

Before doing

df[col] = np.where(df[col] == 'not available',0,df[col].astype(int))

it is neccessary to compute

df[col] == 'not available'
0
df[col].astype(int)

Latter meaning int version for all which fails, as not available does not make sense as integer, you might avoid this problem by using pandas.Series.apply combined with lambda holding ternary operator as follows

import pandas as pd
df = pd.DataFrame({"col1":['30', 'not available', '45', '60', '40', '90', '21', '5','75','29', '8', '10']})
col = "col1"
df[col] = df[col].apply(lambda x:0 if x=='not available' else int(x))
print(df)

output

    col1
0     30
1      0
2     45
3     60
4     40
5     90
6     21
7      5
8     75
9     29
10     8
11    10

This way int is applied only to record which is not equal 'not available'

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement