Skip to content
Advertisement

What is the best way to filter rows of one dataframe based on column entries of another dataframe

I have two dataframes in python, one called DayList, with these columns: OrderNr Powder Variant Quantity DueDate, and another one called Planning, with these columns: Order Start End Day Powder Variant Task. Both dataframes will have multiple lines with specific combinations, the column entries for Powder and Variant will be an integer, I want to filter the dataframe DayList into three categories.

The first category is the subset of rows where the exact combination of Powder and Variant also exists somewhere in the dataframe Planning. The second category is the subset of rows where the exact combination of Powder and Variant does not exist in the dataframe Planning, but the entry for Powder is somewhere in the column Powder in the dataframe Planning. The third category is the subset of rows where the entry for Powder is not in the column Powder in the dataframe Planning.

These three categories should not overlap but together consist of all rows of the dataframe DayList. I am quite new to dataframes so I am not sure how to define this. I have been trying with isin() but I seem to get stuck with the right statements for the first category.

Advertisement

Answer

You could write a function to determine the category for a given row of the dataframe, and then use df.apply(). To avoid having to pick out the right columns within the function, you could apply it only to the reduced dataframe, consisting of just the Powder and Variant columns:

import pandas as pd

# example dataframes with just the relevant columns, but 
# the code below also works for dataframes containing additional columns
DayList = pd.DataFrame({'Powder': [1, 2, 3, 4, 5, 6],
                        'Variant': [1, 2, 1, 2, 1, 2]})
Planning = pd.DataFrame({'Powder': [3, 4, 5, 6],
                         'Variant': [1, 2, 2, 1]})

def determine_category(row):
    powder, variant = row.values
    if [powder, variant] in Planning[['Powder', 'Variant']].values.tolist():
        return 1
    if powder in Planning['Powder'].values:
        return 2
    return 3

DayList['Category'] = DayList[['Powder', 'Variant']].apply(
                      determine_category, axis=1)
DayList
    Powder  Variant  Category
0   1       1        3
1   2       2        3
2   3       1        1
3   4       2        1
4   5       1        2
5   6       2        2
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement