Skip to content
Advertisement

How can I convert columns of string in dataset to int?

Some of the data in the dataset are in string format and I should map all of them to the numeric form. I want to convert string data in some columns in the dataset to int int to become usable in the knn method. I wrote this code but It has this error. How can I fix it? thank you for your consideration.

here is the dataset: http://gitlab.rahnemacollege.com/rahnemacollege/tuning-registration-JusticeInWork/raw/master/dataset.csv

this error is in this part of code:

JavaScript

the error is:

JavaScript

the total code is here:

JavaScript

Advertisement

Answer

The NaN values come from empty strings in the original csv file. To leave those as empty strings instead, you could read the csv with df = pd.read_csv(url, keep_default_na=False), although having them as NaN can make it easier to deal with them.

As noted in the comments however, I am skeptical of the correct interpretation of the encoding standard (if any) used in that data.

But if that is as described in the question, then you can use your function string_to_int without change, apply it to all '...Id' columns and skip the NaN (and optionally convert those to another value):

JavaScript

Outcome

JavaScript

(Note: the dtype is still object because the int values are overflowing int64, and are instead Python’s arbitrary-precision int objects; df2.applymap(type).value_counts() shows that all 'id' columns are <class 'int'>).

Original suggestion

Initially I had this other suggestion for string_to_int(). It handles non-str values explicitly with a default value. It also uses struct.unpack() as a basis for more performant decoding, although in this specific case, I doubt it makes much difference.

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement