I can’t seem to get a simple dtype check working with Pandas’ improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False
.
JavaScript
x
11
11
1
import pandas as pd
2
import numpy as np
3
import random
4
5
df = pd.DataFrame({
6
'x': np.linspace(0, 50, 6),
7
'y': np.linspace(0, 20, 6),
8
'cat_column': random.sample('abcdef', 6)
9
})
10
df['cat_column'] = pd.Categorical(df2['cat_column'])
11
We can see that the dtype
for the categorical column is ‘category’:
JavaScript
1
3
1
df.cat_column.dtype
2
Out[20]: category
3
And normally we can do a dtype check by just comparing to the name of the dtype:
JavaScript
1
3
1
df.x.dtype == 'float64'
2
Out[21]: True
3
But this doesn’t seem to work when trying to check if the x
column
is categorical:
JavaScript
1
8
1
df.x.dtype == 'category'
2
---------------------------------------------------------------------------
3
TypeError Traceback (most recent call last)
4
<ipython-input-22-94d2608815c4> in <module>()
5
----> 1 df.x.dtype == 'category'
6
7
TypeError: data type "category" not understood
8
Is there any way to do these types of checks in pandas v0.15+?
Advertisement
Answer
Use the name
property to do the comparison instead, it should always work because it’s just a string:
JavaScript
1
10
10
1
>>> import numpy as np
2
>>> arr = np.array([1, 2, 3, 4])
3
>>> arr.dtype.name
4
'int64'
5
6
>>> import pandas as pd
7
>>> cat = pd.Categorical(['a', 'b', 'c'])
8
>>> cat.dtype.name
9
'category'
10
So, to sum up, you can end up with a simple, straightforward function:
JavaScript
1
3
1
def is_categorical(array_like):
2
return array_like.dtype.name == 'category'
3