Skip to content
Advertisement

Loop through unique values in a column to check another column and create another column – pandas

I would like to create a function to go through each unique value in a column and check if another column contains a value and then create a column that shows the result.

For example: for each unique ID in df, check if the stats is A then create a result column:

df:

ID Status
1 A
1 B
2 B
2 C

new_df:

ID Status Result A?
1 A YES
2 B NO

I started off with

def function(df):
  id = list(df['ID'].unique())
  status = ['A']
  for i in id:

If anyone can share some thoughts I would much appreciate.

Advertisement

Answer

I am not sure if I understand the rules correctly. Should I always take the first occurrence of the ID? Then the second row in your expected output is wrong.

You can use numpy.where

df = pd.DataFrame({'ID': {0: 1, 1: 1, 2: 2, 3: 2}, 'Status': {0: 'A', 1: 'B', 2: 'B', 3: 'C'}})
new_df = df.drop_duplicates(subset=["ID"]).copy()
new_df["Result A?"] = np.where(new_df.Status == "A", "YES", "NO")

to get this:

   ID Status Result A?
0   1      A       YES
2   2      B        NO

Edit: Your desired output is ambiguous, two things you can do:

df.groupby("ID")["Status"].apply({"A"}.issubset).replace({True: 'Yes', False: 'No'}).rename("Result A?")

gives you:

ID
1    Yes
2     No
Name: Result A?, dtype: object

Or:

df["Result A?"] = np.where(df.groupby("ID")["Status"].apply({"A"}.issubset).loc[df.ID], "YES", "NO")

which gives you:

   ID Status Result A?
0   1      A       YES
1   1      B       YES
2   2      B        NO
3   2      C        NO
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement