I have 2 pandas dataframes in python which are set up as folows:
Dataframe 1: ID Paragraph 1 'A B C D E' 2 'A F G H L' 3 'B J P Q W' 4 'G F D S A'
Where Paragraph is a string of multiple words.
Dataframe 2: ID Name Words 1 First ['A', 'F'] 2 Second ['B', 'Z'] 3 Thrird ['P', 'Q'] 4 Fourth ['H', 'J']
Name is just a string identifying the Words. And Words is a list of strings.
So what I want to do is have an expression that will identify which Paragraphs in Dataframe 1 contain Words from Dataframe 2. And I want to store the Name of the Words in a new column in Dataframe 1. The new column will contain a list of all the Names where a Word from Words occurred in the paragraph. The order does not matter and there must be no duplicates in the list.
For example:
New Dataframe 1: ID Paragraph Names 1 'A B C D E' [First, Second] 2 'A F G H L' [First, Fourth] 3 'B J P Q W' [Second, Third, Fourth] 4 'G F D S A' [First]
I can only make a solution that has deeply nested for loops and takes a very long time to execute. Is there a solution that has a shorter computation time my thinking is maybe using loc and/or lambda functions.
Any help would be greatly appreciated!
Let me know if there is anything I need to clarify.
English is not my first language so I can try explain more if I need to.
Thank you
Here is the code for the dummy dataframes:
data_1 = {'Paragraph': ['A B C D E', 'A F G H L', 'B J P Q W', 'G F D S A']} df_1 = pd.DataFrame(data_1) data_2 = {'Name': ['First', 'Second', 'Third', 'Fourth'], 'Words': [['A', 'F'], ['B', 'Z'], ['P', 'Q'], ['H', 'J']]} df_2 = pd.DataFrame(data_2)
Advertisement
Answer
You can split
and explode
Paragraph. Then map
the names for each word of the exploded df_2. Finally, aggregate as set
to have unique values:
s = df_2.explode('Words').set_index('Words')['Name'] df_1['Names'] = (df_1['Paragraph'].str.split() .explode().map(s).dropna() .groupby(level=0).agg(set) )
output:
Paragraph Names 0 A B C D E {Second, First} 1 A F G H L {Fourth, First} 2 B J P Q W {Third, Second, Fourth} 3 G F D S A {First}