Skip to content
Advertisement

Check if all values in dataframe column are the same

I want to do a quick and easy check if all column values for counts are the same in a dataframe:

In:

JavaScript

Out:

JavaScript

I want just a simple condition that if all counts = same value then print('True').

Is there a fast way to do this?

Advertisement

Answer

An efficient way to do this is by comparing the first value with the rest, and using all:

JavaScript

Although the most intuitive idea could possibly be to count the amount of unique values and check if there is only one, this would have a needlessly high complexity for what we’re trying to do. Numpy’s’ np.unique, called by pandas’ nunique, implements a sorting of the underlying arrays, which has an evarage complexity of O(n·log(n)) using quicksort (default). The above approach is O(n).

The difference in performance becomes more obvious when we’re applying this to an entire dataframe (see below).


For an entire dataframe

In the case of wanting to perform the same task on an entire dataframe, we can extend the above by setting axis=0 in all:

JavaScript

For the shared example, we’d get:

JavaScript

Here’s a benchmark of the above methods compared with some other approaches, such as using nunique (for a pd.Series):

JavaScript

enter image description here


And below are the timings for a pd.DataFrame. Let’s compare too with a numba approach, which is especially useful here since we can take advantage of short-cutting as soon as we see a repeated value in a given column (note: the numba approach will only work with numerical data):

JavaScript

If we compare the three methods:

JavaScript

enter image description here

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement