So I have two dataframes
main_df, about 800 rows
description | category |
---|---|
ABCD | ONE |
XYZ | THREE |
ABC | |
QWE |
keyword_df, it is about 50 rows
keyword | category |
---|---|
AB | FIVE |
What I’m trying to achieve =
main_df
description | category |
---|---|
ABCD | ONE |
XYZ | THREE |
ABC | FIVE |
QWE | 0 |
conditions = [(main_df[‘Description’].str.startswith(‘AB’)) & (main_df[‘category’).isnull()]
values = keyword_df[‘category’].tolist()
main_df[‘category’] = np.select(conditions, values)
I was able to create a list of strings
[“(main_df[‘Description’].str.startswith(‘AB’)) & (main_df[‘category’).isnull()”,
“(main_df[‘Description’].str.startswith(‘CD’)) & (main_df[‘category’).isnull()”, …]
But it needs to be a list of conditions and not a list of strings for it to work. Appreciate any help!
Advertisement
Answer
Since you only have 50 rows in the keyword frame, you could just iterate over those and update the main frame accordingly:
import numpy as np import pandas as pd main_df = pd.DataFrame({'description': ['ABCD', 'XYZ', 'ABC', 'QWE'], 'category': ['ONE', 'THREE', np.nan, np.nan]}) keyword_df = pd.DataFrame({'keyword': ['AB'], 'category': ['FIVE']}) for key in keyword_df.itertuples(index=False): mask = (main_df['description'].str.startswith(key[0]) & main_df['category'].isnull()) main_df.loc[mask, 'category'] = key[1] main_df
description category 0 ABCD ONE 1 XYZ THREE 2 ABC FIVE 3 QWE NaN