Skip to content
Advertisement

Combine two tables based on certain criteria using python

I have two tables (table1, table2) of the following:

table1:

ID Filename
12345 12345.txt
12346 12346.txt
12347 12347.txt
12348 12348.txt
12349 12349.txt
12350 12350.txt

table2: contains the path where table 1 files are present

Path
/table/text3/12349.txt
/table/text1/12345.txt
/table/text2/12346.txt
/table/text1/12350.txt
/table/text3/12347.txt
/table/text1/12348.txt

How do I combine these two files, such that, the path and filenames are matched. What I tried so far?

pd.concat([table1, table2])

I also tried pd.merge but it does not match with the filename. How do I solve this?

The desired output:

ID Filename Path
12345 12345.txt /table/text1/12345.txt
12346 12346.txt /table/text2/12346.txt
12347 12347.txt /table/text3/12347.txt
12348 12348.txt /table/text1/12348.txt
12349 12349.txt /table/text3/12349.txt
12350 12350.txt /table/text1/12350.txt

Advertisement

Answer

You can make a temporary column Filename on df2 by .assign() and use the resulting copy of df2 with newly added column to merge with df1 using .merge() on the common column Filename, as follows:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')

Result:

print(df_merge)



      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt  /table/text3/12347.txt
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt

Edit

If you still want to show the entries of df1 even when there is no matching entries in df2, you can use left merge with parameter how='left', as follows:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')

Result:

If the path /table/text3/12347.txt is missing:

      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt                     NaN
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement