I concatenate two columns for the situation where strings from column ‘words’ are not present in the column ‘sentence’. My code is:
def check(row): df['sentence'] = df['sentence'].astype(str) df['words'] = df['words'].astype(str) left = row['sentence'].split() right = row['words'].split() unmatched = [] for y in left: word = "".join([x for x in y.lower() if x not in string.punctuation]) if word not in [v.lower() for v in right]: unmatched.append(y) return " ".join(unmatched) mask = df['type'] == 'Is there a match with the Words?' df.loc[mask, 'new'] = df.loc[mask, :].apply(check, axis=1) df['new'] = np.where(c, df['new'] + ' ' + df['words'], df['new']) df['new'] = df['new'].str.replace('nan', '') df['new'] = df['new'].fillna("")
Additionally, I want to restrict the concatenation per row if, in column ‘words’ I have strings present in this list:
restricted = ['not present', 'for sale', 'unknown']
Here is an example of how the result should look like
words sentence output 0 unknown This is a new paint This is a new paint 1 brown This is a new item This is a new item brown 2 for sale The product is new The product is new
Output given by the code above is:
output This is a new paint unknown This is a new item brown The product is new for sale
Advertisement
Answer
Given:
words sentence 0 unknown This is a new paint 1 brown This is a new item 2 for sale The product is new
Doing:
restricted = ['not present', 'for sale', 'unknown'] mask = df.words.str.contains('|'.join(restricted)) df['output'] = df.sentence.where(mask, df.sentence + ' ' + df.words) print(df)
Output:
words sentence output 0 unknown This is a new paint This is a new paint 1 brown This is a new item This is a new item brown 2 for sale The product is new The product is new