I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.
For example: for each unique ID in df, check if the stats is A then create a result column:
df:
ID | Status |
---|---|
1 | A |
1 | B |
2 | B |
2 | C |
new_df:
ID | Status | Result A? |
---|---|---|
1 | A | YES |
2 | B | NO |
I started off with
JavaScript
x
6
1
def function(df):
2
id = list(df['ID'].unique())
3
status = ['A']
4
for i in id:
5
6
If anyone can share some thoughts I would much appreciate.
Advertisement
Answer
I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.
You can use numpy.where
JavaScript
1
4
1
df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
2
new_df = df.drop_duplicates(subset=["ID"]).copy()
3
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")
4
to get this:
JavaScript
1
4
1
ID Status Result A?
2
0 1 A YES
3
2 2 B NO
4
Edit: Your desired output is ambiguous, two things you can do:
JavaScript
1
2
1
df.groupby("ID")["Status"].apply({"A"}.issubset).replace({True: 'Yes', False: 'No'}).rename("Result A?")
2
gives you:
JavaScript
1
5
1
ID
2
1 Yes
3
2 No
4
Name: Result A?, dtype: object
5
Or:
JavaScript
1
2
1
df["Result A?"] = np.where(df.groupby("ID")["Status"].apply({"A"}.issubset).loc[df.ID], "YES", "NO")
2
which gives you:
JavaScript
1
6
1
ID Status Result A?
2
0 1 A YES
3
1 1 B YES
4
2 2 B NO
5
3 2 C NO
6