I can’t seem to get a simple dtype check working with Pandas’ improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False
.
import pandas as pd import numpy as np import random df = pd.DataFrame({ 'x': np.linspace(0, 50, 6), 'y': np.linspace(0, 20, 6), 'cat_column': random.sample('abcdef', 6) }) df['cat_column'] = pd.Categorical(df2['cat_column'])
We can see that the dtype
for the categorical column is ‘category’:
df.cat_column.dtype Out[20]: category
And normally we can do a dtype check by just comparing to the name of the dtype:
df.x.dtype == 'float64' Out[21]: True
But this doesn’t seem to work when trying to check if the x
column
is categorical:
df.x.dtype == 'category' --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-22-94d2608815c4> in <module>() ----> 1 df.x.dtype == 'category' TypeError: data type "category" not understood
Is there any way to do these types of checks in pandas v0.15+?
Advertisement
Answer
Use the name
property to do the comparison instead, it should always work because it’s just a string:
>>> import numpy as np >>> arr = np.array([1, 2, 3, 4]) >>> arr.dtype.name 'int64' >>> import pandas as pd >>> cat = pd.Categorical(['a', 'b', 'c']) >>> cat.dtype.name 'category'
So, to sum up, you can end up with a simple, straightforward function:
def is_categorical(array_like): return array_like.dtype.name == 'category'