results_table is a pd.DataFrame
When I
print(type(results_table.loc[0,'Mean recall score']))
it return
<class 'numpy.float64'>
Every items is float
But when I
print(results_table['Mean recall score'].dtype)
it returns
object
Why is there such behavior?
Advertisement
Answer
First note df.loc[0, x]
only considers the value in row label 0
and column label x
, not your entire dataframe. Now let’s consider an example:
df = pd.DataFrame({'A': [1.5, 'hello', 'test', 2]}, dtype=object) print(type(df.loc[0, 'A'])) # type of single element in series # <class 'float'> print(df['A'].dtype) # type of series # object
As you can see, an object
dtype series can hold arbitrary Python objects. You can even, if you wish, extract the type of each element of your series:
print(df['A'].map(type)) # 0 <class 'float'> # 1 <class 'str'> # 2 <class 'str'> # 3 <class 'int'> # Name: A, dtype: object
An object
dtype series is simply a collection of pointers to various objects not held in a contiguous memory block, as may be the case with numeric series. This is comparable to Python list
and explains why performance is poor when you work with object
instead of numeric series.
See also this answer for a visual respresentation of the above.