I have two tables (table1, table2) of the following:
table1:
| ID | Filename | 
|---|---|
| 12345 | 12345.txt | 
| 12346 | 12346.txt | 
| 12347 | 12347.txt | 
| 12348 | 12348.txt | 
| 12349 | 12349.txt | 
| 12350 | 12350.txt | 
table2: contains the path where table 1 files are present
| Path | 
|---|
| /table/text3/12349.txt | 
| /table/text1/12345.txt | 
| /table/text2/12346.txt | 
| /table/text1/12350.txt | 
| /table/text3/12347.txt | 
| /table/text1/12348.txt | 
How do I combine these two files, such that, the path and filenames are matched. What I tried so far?
pd.concat([table1, table2])
I also tried  pd.merge but it does not match with the filename. How do I solve this?
The desired output:
| ID | Filename | Path | 
|---|---|---|
| 12345 | 12345.txt | /table/text1/12345.txt | 
| 12346 | 12346.txt | /table/text2/12346.txt | 
| 12347 | 12347.txt | /table/text3/12347.txt | 
| 12348 | 12348.txt | /table/text1/12348.txt | 
| 12349 | 12349.txt | /table/text3/12349.txt | 
| 12350 | 12350.txt | /table/text1/12350.txt | 
Advertisement
Answer
You can make a temporary column Filename on df2 by .assign() and use the resulting copy of df2 with newly added column to merge with df1 using .merge() on the common column Filename, as follows:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')
Result:
print(df_merge)
      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt  /table/text3/12347.txt
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt
Edit
If you still want to show the entries of df1 even when there is no matching entries in df2, you can use left merge with parameter how='left', as follows:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')
Result:
If the path /table/text3/12347.txt is missing:
ID Filename Path 0 12345 12345.txt /table/text1/12345.txt 1 12346 12346.txt /table/text2/12346.txt 2 12347 12347.txt NaN 3 12348 12348.txt /table/text1/12348.txt 4 12349 12349.txt /table/text3/12349.txt 5 12350 12350.txt /table/text1/12350.txt