I started working with a dataset, which is a collection of murder reports.There is a column “Perpetrator Age” in which there are simple integers. But when I looked at his type, it turned out that he was dtype('O').
In order to work with this column further, I want to change its type to dtype('int64'). I tried to do it like this:
data['Perpetrator Age'] = data['Perpetrator Age'].astype(int)
and got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-64-50a3c796ab1e> in <module>()
----> 1 data['Perpetrator Age'] = data['Perpetrator Age'].astype(int)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
972 # work around NumPy brokenness, #1987
973 if np.issubdtype(dtype.type, np.integer):
--> 974 return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
975
976 # if we have a datetime/timedelta array of objects
pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: ' '
I saw advice for the “object” type, which must first be converted to a string, and then to “int”. Tried it, it didn’t work either, same error appeared. Please tell me how I can fix this?
Advertisement
Answer
As mentioned in the comments, the first row of your df is apparently an empty space (' '). You can either remove it, replace it with something else, or skip it:
df['column_1'].iloc[1:].astype('int')