Hi all i have a data frame
my_employees = [ ('apple', 'credit'), ('apple', 'slates'), ('apple', 'intro_credit'), ('apple','end_credit'), ('apple', 'logo'), ('apple', 'SMPTE'), ('apple','visible_logo'), ("mango","credit"), ("mango","intro_credit"), ("mango","end_credit"),("mango","slates"),("mango","SMPTE"),("mango","logo") ] df1 = pd.DataFrame(my_employees, columns = ['Unique_ID','Annotation']) print(df1)
in unique_id column for all unique unique_id’s i need to check Annotation column for those unique unique_id’s such that all those unique unique_id’s have all the values of this
my_annotations = ['credit', 'intro_credit', 'end_credit', 'SMPTE','logo', 'visible_logo','slate'].
can anybody shed some light on this please.
Advertisement
Answer
You could do:
# create unique Annotations for each Unique_ID annotations = df1.groupby('Unique_ID')['Annotation'].apply(set).reset_index() # create mask based on my_annotations, True if contains all the annotations in my_annotations mask = annotations['Annotation'].apply(frozenset(my_annotations).issubset) # filter based on the above mask res = annotations[~mask].drop('Annotation', 1) print(res)
Output
Unique_ID 1 mango
A simpler alternative is to do:
res = df1.groupby('Unique_ID')['Annotation'].apply(frozenset(my_annotations).issubset).reset_index() output = res[~res['Annotation']].drop('Annotation', 1) print(output)
Output
Unique_ID 1 mango