I concatenate two columns for the situation where strings from column ‘words’ are not present in the column ‘sentence’. My code is:
JavaScript
x
17
17
1
def check(row):
2
df['sentence'] = df['sentence'].astype(str)
3
df['words'] = df['words'].astype(str)
4
left = row['sentence'].split()
5
right = row['words'].split()
6
unmatched = []
7
for y in left:
8
word = "".join([x for x in y.lower() if x not in string.punctuation])
9
if word not in [v.lower() for v in right]:
10
unmatched.append(y)
11
return " ".join(unmatched)
12
mask = df['type'] == 'Is there a match with the Words?'
13
df.loc[mask, 'new'] = df.loc[mask, :].apply(check, axis=1)
14
df['new'] = np.where(c, df['new'] + ' ' + df['words'], df['new'])
15
df['new'] = df['new'].str.replace('nan', '')
16
df['new'] = df['new'].fillna("")
17
Additionally, I want to restrict the concatenation per row if, in column ‘words’ I have strings present in this list:
JavaScript
1
2
1
restricted = ['not present', 'for sale', 'unknown']
2
Here is an example of how the result should look like
JavaScript
1
5
1
words sentence output
2
0 unknown This is a new paint This is a new paint
3
1 brown This is a new item This is a new item brown
4
2 for sale The product is new The product is new
5
Output given by the code above is:
JavaScript
1
5
1
output
2
This is a new paint unknown
3
This is a new item brown
4
The product is new for sale
5
Advertisement
Answer
Given:
JavaScript
1
5
1
words sentence
2
0 unknown This is a new paint
3
1 brown This is a new item
4
2 for sale The product is new
5
Doing:
JavaScript
1
5
1
restricted = ['not present', 'for sale', 'unknown']
2
mask = df.words.str.contains('|'.join(restricted))
3
df['output'] = df.sentence.where(mask, df.sentence + ' ' + df.words)
4
print(df)
5
Output:
JavaScript
1
5
1
words sentence output
2
0 unknown This is a new paint This is a new paint
3
1 brown This is a new item This is a new item brown
4
2 for sale The product is new The product is new
5