I started working with a dataset, which is a collection of murder reports.There is a column “Perpetrator Age” in which there are simple integers. But when I looked at his type, it turned out that he was dtype('O')
.
In order to work with this column further, I want to change its type to dtype('int64')
. I tried to do it like this:
data['Perpetrator Age'] = data['Perpetrator Age'].astype(int)
and got this error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-64-50a3c796ab1e> in <module>() ----> 1 data['Perpetrator Age'] = data['Perpetrator Age'].astype(int) 4 frames /usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna) 972 # work around NumPy brokenness, #1987 973 if np.issubdtype(dtype.type, np.integer): --> 974 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) 975 976 # if we have a datetime/timedelta array of objects pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe() ValueError: invalid literal for int() with base 10: ' '
I saw advice for the “object” type, which must first be converted to a string, and then to “int”. Tried it, it didn’t work either, same error appeared. Please tell me how I can fix this?
Advertisement
Answer
As mentioned in the comments, the first row of your df is apparently an empty space (' '
). You can either remove it, replace it with something else, or skip it:
df['column_1'].iloc[1:].astype('int')