Skip to content
Advertisement

Filter based on pairs within a group – if value represent at BOTH ends

 Group Code
  1     2
  1     2
  1     4
  1     1
  2     4 
  2     1
  2     2 
  2     3
  2     1
  2     1
  2     3

Within each group there are pairs. In Group 1 for example; the pairs are (2,2),(2,4),(4,1)

I want to filter these pairs based on code numbers 2 AND 4 being present at BOTH ends(not either)

In group 1 for example, only (2,4) will be kept while (2,2) and (4,1) will be filtered out. Excepted Output:

 Group Code
  1     2
  1     4
 

Advertisement

Answer

You can approach by making 2 boolean masks for current row and next row code in 2 or 4. Then, form the required combination condition of present at BOTH ends(not either), as follows:

If you require both 2 AND 4 be present in the pair, then we can make another boolean mask for asserting that these 2 consecutive codes are not equal:

m_curr = df['Code'].isin([2,4])                                # current row code is 2 or 4
m_next = df.groupby("Group")['Code'].shift(-1).isin([2,4])     # next row code in same group is 2 or 4
m_diff = df['Code'].ne(df.groupby("Group")['Code'].shift(-1))  # different row codes in current and next row in the same group

# current row AND next row code in 2 or 4 AND (2 and 4 both present, i.e. the 2 values in pair are diffrent)
mask = m_curr & m_next & m_diff

df[mask | mask.shift()]

Result:

   Group  Code
1      1     2
2      1     4

Another way to do it, may be a little bit simpler for this special case:

m1 = df['Code'].eq(2) & df.groupby("Group")['Code'].shift(-1).eq(4)   # current row is 2 and next row in same group is 4
m2 = df['Code'].eq(4) & df.groupby("Group")['Code'].shift(-1).eq(2)   # current row is 4 and next row in same group is 2

mask = m1 | m2     # either pair of (2, 4) or (4, 2)

df[mask | mask.shift()]

Result:

Same result:

   Group  Code
1      1     2
2      1     4
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement