Skip to content
Advertisement

get all pairs of columns where only one value in third column

I am try to get all pairs of columns where a third column has only one value, such that (given pair a,b and third column c):

a b c
1 1 1
1 1 2
1 2 2
2 1 1

only returns 1,2 and 2,1 (the results from the last two rows). The first two rows are excluded since they describe the same pair but with different values in the third column.

To be extra clear: df.groupby(['a', 'b'])['c'].nunique() above would yield:

a b 
1 1 2
1 2 1
2 1 1

and df.groupby(['a', 'b'])['c'].nunique() == 1 yields:

a b 
1 1 False
1 2 True
2 1 True

I want the a,b values for each pair that is true in that condition. I was thinking:

df[df.groupby(['a', 'b'])['c'].nunique() == 1]['a', 'b']

but this is clearly not right as a condition on the non-grouped df.

/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3013         # Do we have a (boolean) 1d indexer?
   3014         if com.is_bool_indexer(key):
-> 3015             return self._getitem_bool_array(key)
   3016 
   3017         # We are left with two options: a single key, and a collection of keys,

/usr/local/lib/python3.9/site-packages/pandas/core/frame.py in _getitem_bool_array(self, key)
   3066         # check_bool_indexer will throw exception if Series key cannot
   3067         # be reindexed to match DataFrame rows
-> 3068         key = check_bool_indexer(self.index, key)
   3069         indexer = key.nonzero()[0]
   3070         return self._take_with_is_copy(indexer, axis=0)

...

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

It also doesn’t work like this:

group = df.groupby(['a', 'b'])
group[group['c'].nunique() == 1]['a', 'b']

Advertisement

Answer

Once you get the GroupBy result, you can use a boolean condition to filter the index to get a list of tuples:

uniq = df.groupby(['a', 'b'])['c'].nunique()

# as a pandas MultiIndex
uniq.index[uniq == 1]
# MultiIndex([(1, 2),
#             (2, 1)],
#            names=['a', 'b'])

# as a python list of tuples
uniq.index[uniq == 1].tolist()
# [(1, 2), (2, 1)]

This works because the GroupBy result uses the grouper columns (“a” and “b” in this case) as its index (since there are two columns, you get a MultiIndex with two levels in the result)

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement