Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 2 years ago. Improve this question Hi all i have a data frame in unique_id column for all unique unique_id's i need to check Annotation column for those unique unique_id's such

compare unique values of column with corresponding another column values with in a list [closed]

Hi all i have a data frame

my_employees = [
            ('apple', 'credit'),
            ('apple', 'slates'),
            ('apple', 'intro_credit'), 
            ('apple','end_credit'), 
            ('apple', 'logo'), 
            ('apple', 'SMPTE'),
            ('apple','visible_logo'),
            ("mango","credit"), ("mango","intro_credit"), ("mango","end_credit"),("mango","slates"),("mango","SMPTE"),("mango","logo")
            ]

df1 = pd.DataFrame(my_employees, columns = ['Unique_ID','Annotation'])
print(df1)

JavaScript
​x
 
my_employees = [
            ('apple', 'credit'),
            ('apple', 'slates'),
            ('apple', 'intro_credit'), 
            ('apple','end_credit'), 
            ('apple', 'logo'), 
            ('apple', 'SMPTE'),
            ('apple','visible_logo'),
            ("mango","credit"), ("mango","intro_credit"), ("mango","end_credit"),("mango","slates"),("mango","SMPTE"),("mango","logo")
            ]
​
df1 = pd.DataFrame(my_employees, columns = ['Unique_ID','Annotation'])
print(df1)
​

in unique_id column for all unique unique_id’s i need to check Annotation column for those unique unique_id’s such that all those unique unique_id’s have all the values of this

my_annotations = ['credit', 'intro_credit', 'end_credit', 'SMPTE','logo', 'visible_logo','slate'].

JavaScript
 
my_annotations = ['credit', 'intro_credit', 'end_credit', 'SMPTE','logo', 'visible_logo','slate'].
​

can anybody shed some light on this please.

Answer

You could do:

# create unique Annotations for each Unique_ID
annotations = df1.groupby('Unique_ID')['Annotation'].apply(set).reset_index()

# create mask based on my_annotations, True if contains all the annotations in my_annotations
mask = annotations['Annotation'].apply(frozenset(my_annotations).issubset)

# filter based on the above mask
res = annotations[~mask].drop('Annotation', 1)
print(res)

JavaScript
 
# create unique Annotations for each Unique_ID
annotations = df1.groupby('Unique_ID')['Annotation'].apply(set).reset_index()
​
# create mask based on my_annotations, True if contains all the annotations in my_annotations
mask = annotations['Annotation'].apply(frozenset(my_annotations).issubset)
​
# filter based on the above mask
res = annotations[~mask].drop('Annotation', 1)
print(res)
​

Output

  Unique_ID
1     mango

JavaScript
 
  Unique_ID
1     mango
​

A simpler alternative is to do:

res = df1.groupby('Unique_ID')['Annotation'].apply(frozenset(my_annotations).issubset).reset_index()
output = res[~res['Annotation']].drop('Annotation', 1)
print(output)

JavaScript
 
res = df1.groupby('Unique_ID')['Annotation'].apply(frozenset(my_annotations).issubset).reset_index()
output = res[~res['Annotation']].drop('Annotation', 1)
print(output)
​

Output

  Unique_ID
1     mango

JavaScript
 
  Unique_ID
1     mango
​

Advertisement

Answer