Skip to content
Advertisement

Trying to ignore Nan in csv file throws a typeerror

I’m loading a local csv file that contains data. I’m trying to find the smallest float in a row thats mixed of NaN and numbers.
I have tried using the numpy function called np.nanmin, but it throws:

"TypeError: '<=' not supported between instances of 'str' and 'float'"
database = pd.read_csv('database.csv',quotechar='"',skipinitialspace=True, delimiter=',')

    coun_weight = database[['Country of Operator/Owner', 'Launch Mass (Kilograms)']]
    print(coun_weight)

lightest = np.nanmin(coun_weight['Launch Mass (Kilograms)'])

Any suggestions to why nanmin might not work?

A link to the entire csv file: http://www.sharecsv.com/s/5aea6381d1debf75723a45aacd40abf8/database.csv
Here is a sample of my coun_weight:

                 Country of Operator/Owner Launch Mass (Kilograms)
1390                     China                     NaN
1391                     China                    1040
1392                     China                    1040
1393                     China                    2700
1394                     China                    2700
1395                     China                    1800
1396                     China                    2700
1397                     China                     NaN
1398                     China                     NaN
1399                     China                     NaN
1400                     China                     NaN
1401                     India                      92
1402                    Russia                      45
1403              South Africa                       1
1404                     China                     NaN
1405                     China                       4
1406                     China                       4
1407                     China                      12

Advertisement

Answer

I try test it and all problematic values are:

coun_weight = pd.read_csv('database.csv')

print (coun_weight.loc[pd.to_numeric(coun_weight['Launch Mass (Kilograms)'], errors='coerce').isnull(), 'Launch Mass (Kilograms)'].dropna())
1091    5,000+
1092    5,000+
1093    5,000+
1094    5,000+
1096    5,000+
Name: Launch Mass (Kilograms), dtype: object

And solution is:

coun_weight['Launch Mass (Kilograms)'] = 
coun_weight['Launch Mass (Kilograms)'].replace('5,000+', 5000).astype(float)

print (coun_weight['Launch Mass (Kilograms)'].iloc[1091:1098])
1091    5000.0
1092    5000.0
1093    5000.0
1094    5000.0
1095       NaN
1096    5000.0
1097    6500.0
Name: Launch Mass (Kilograms), dtype: float64

Then if need find minimal values with NaNs – Series.min, where NaNs are skipped:

print (coun_weight['Launch Mass (Kilograms)'].min())
0.0

Testing if some 0 are in column:

a = coun_weight['Launch Mass (Kilograms)']
print (a[a == 0])
912    0.0
Name: Launch Mass (Kilograms), dtype: float64

Another possible solution is replace this values to NaNs:

coun_weight['Launch Mass (Kilograms)'] = 
pd.to_numeric(coun_weight['Launch Mass (Kilograms)'], errors='coerce')

print (coun_weight['Launch Mass (Kilograms)'].iloc[1091:1098])
1091       NaN
1092       NaN
1093       NaN
1094       NaN
1095       NaN
1096       NaN
1097    6500.0
Name: Launch Mass (Kilograms), dtype: float64
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement