I have two python dataframes: one of them have a column has a row ‘AC-2’ another data frame includes columns which has a string row ‘AC-20,AC8,AC-15’
str_match = "({})".format("|".join(df1['column_first_dataframe'])) df2.merge(df2,how='left',left_on=df1['column_first_dataframe'].str.extract(str_match)[0], right_on="column_second_dataframe")
First dataset:
Column1 Column2 AC-2 2 AC-20 1 AC-15 3 AC-1 2 AC-5 5
second dataset:
Column1 AC-2,AC-5,AC-30 AC-20,AC-30,AC11
I found:
Column1 Column2 AC-2 AC-20,AC-30,AC11 AC-2 AC-2,AC-5,AC-30 AC-20 AC-20,AC-30,AC11 AC-15 null AC-1 null AC-5 AC-2,AC-5,AC-30
above there is a matching between AC-2 for dataset1 and AC-20 in string for dataset 2 but they are different thing in my dataset. my desired output:
Column1 Column2 AC-2 AC-2,AC-5,AC-30 AC-20 AC-20,AC-30,AC11 AC-15 null AC-1 null AC-5 AC-2,AC-5,AC-30
I tried to merge data frames with searching and matching BUT my code merged AC-2 to AC-20. I don’t want to do that. Are there any searching and matching way to merge my data frames exactly what I want. I don’t want to merge AC-20 and AC-2. It should merge AC-2 and AC-2 with exactly same thing.
Thank you for your contributions!
Advertisement
Answer
A simple way is to split
and explode
the Column2 of df2 to get one row per full word and perform a simple left merge
.
(df1 .merge(df2.assign(group=df2['Column1'].str.split(',')) .rename(columns={'Column1': 'Column3'}) .explode('group'), left_on='Column1', right_on='group', how='left' ) .drop(columns='group') )
output:
Column1 Column2 Column3 0 AC-2 2 AC-2,AC-5,AC-30 1 AC-20 1 AC-20,AC-30,AC11 2 AC-15 3 NaN 3 AC-1 2 NaN 4 AC-5 5 AC-2,AC-5,AC-30
alternative
NB. I am producing above a slightly different output to keep all columns. If really you want your output, simply drop “Column2” in df1 and rename “Column1” into “Column2” in df2
(df1 .drop(columns='Column2') .merge(df2.rename(columns={'Column1': 'Column2'}) .assign(Column1=df2['Column1'].str.split(',')) .explode('Column1'), on='Column1', how='left' ) )
output:
Column1 Column2 0 AC-2 AC-2,AC-5,AC-30 1 AC-20 AC-20,AC-30,AC11 2 AC-15 NaN 3 AC-1 NaN 4 AC-5 AC-2,AC-5,AC-30