Hi all i have a data frame
JavaScript
x
14
14
1
my_employees = [
2
('apple', 'credit'),
3
('apple', 'slates'),
4
('apple', 'intro_credit'),
5
('apple','end_credit'),
6
('apple', 'logo'),
7
('apple', 'SMPTE'),
8
('apple','visible_logo'),
9
("mango","credit"), ("mango","intro_credit"), ("mango","end_credit"),("mango","slates"),("mango","SMPTE"),("mango","logo")
10
]
11
12
df1 = pd.DataFrame(my_employees, columns = ['Unique_ID','Annotation'])
13
print(df1)
14
in unique_id column for all unique unique_id’s i need to check Annotation column for those unique unique_id’s such that all those unique unique_id’s have all the values of this
JavaScript
1
2
1
my_annotations = ['credit', 'intro_credit', 'end_credit', 'SMPTE','logo', 'visible_logo','slate'].
2
can anybody shed some light on this please.
Advertisement
Answer
You could do:
JavaScript
1
10
10
1
# create unique Annotations for each Unique_ID
2
annotations = df1.groupby('Unique_ID')['Annotation'].apply(set).reset_index()
3
4
# create mask based on my_annotations, True if contains all the annotations in my_annotations
5
mask = annotations['Annotation'].apply(frozenset(my_annotations).issubset)
6
7
# filter based on the above mask
8
res = annotations[~mask].drop('Annotation', 1)
9
print(res)
10
Output
JavaScript
1
3
1
Unique_ID
2
1 mango
3
A simpler alternative is to do:
JavaScript
1
4
1
res = df1.groupby('Unique_ID')['Annotation'].apply(frozenset(my_annotations).issubset).reset_index()
2
output = res[~res['Annotation']].drop('Annotation', 1)
3
print(output)
4
Output
JavaScript
1
3
1
Unique_ID
2
1 mango
3