I have two dataframes in python, one called DayList, with these columns: OrderNr Powder Variant Quantity DueDate, and another one called Planning, with these columns: Order Start End Day Powder Variant Task. Both dataframes will have multiple lines with specific combinations, the column entries for Powder and Variant will be an integer, I want to filter the dataframe DayList into three categories.
The first category is the subset of rows where the exact combination of Powder and Variant also exists somewhere in the dataframe Planning. The second category is the subset of rows where the exact combination of Powder and Variant does not exist in the dataframe Planning, but the entry for Powder is somewhere in the column Powder in the dataframe Planning. The third category is the subset of rows where the entry for Powder is not in the column Powder in the dataframe Planning.
These three categories should not overlap but together consist of all rows of the dataframe DayList. I am quite new to dataframes so I am not sure how to define this. I have been trying with isin() but I seem to get stuck with the right statements for the first category.
Advertisement
Answer
You could write a function to determine the category for a given row of the dataframe, and then use df.apply()
. To avoid having to pick out the right columns within the function, you could apply it only to the reduced dataframe, consisting of just the Powder
and Variant
columns:
import pandas as pd # example dataframes with just the relevant columns, but # the code below also works for dataframes containing additional columns DayList = pd.DataFrame({'Powder': [1, 2, 3, 4, 5, 6], 'Variant': [1, 2, 1, 2, 1, 2]}) Planning = pd.DataFrame({'Powder': [3, 4, 5, 6], 'Variant': [1, 2, 2, 1]}) def determine_category(row): powder, variant = row.values if [powder, variant] in Planning[['Powder', 'Variant']].values.tolist(): return 1 if powder in Planning['Powder'].values: return 2 return 3 DayList['Category'] = DayList[['Powder', 'Variant']].apply( determine_category, axis=1) DayList
Powder Variant Category 0 1 1 3 1 2 2 3 2 3 1 1 3 4 2 1 4 5 1 2 5 6 2 2