I have two tables (table1, table2) of the following:
table1:
ID | Filename |
---|---|
12345 | 12345.txt |
12346 | 12346.txt |
12347 | 12347.txt |
12348 | 12348.txt |
12349 | 12349.txt |
12350 | 12350.txt |
table2: contains the path where table 1 files are present
Path |
---|
/table/text3/12349.txt |
/table/text1/12345.txt |
/table/text2/12346.txt |
/table/text1/12350.txt |
/table/text3/12347.txt |
/table/text1/12348.txt |
How do I combine these two files, such that, the path and filenames are matched. What I tried so far?
pd.concat([table1, table2])
I also tried pd.merge
but it does not match with the filename. How do I solve this?
The desired output:
ID | Filename | Path |
---|---|---|
12345 | 12345.txt | /table/text1/12345.txt |
12346 | 12346.txt | /table/text2/12346.txt |
12347 | 12347.txt | /table/text3/12347.txt |
12348 | 12348.txt | /table/text1/12348.txt |
12349 | 12349.txt | /table/text3/12349.txt |
12350 | 12350.txt | /table/text1/12350.txt |
Advertisement
Answer
You can make a temporary column Filename
on df2
by .assign()
and use the resulting copy of df2
with newly added column to merge with df1
using .merge()
on the common column Filename
, as follows:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')
Result:
print(df_merge) ID Filename Path 0 12345 12345.txt /table/text1/12345.txt 1 12346 12346.txt /table/text2/12346.txt 2 12347 12347.txt /table/text3/12347.txt 3 12348 12348.txt /table/text1/12348.txt 4 12349 12349.txt /table/text3/12349.txt 5 12350 12350.txt /table/text1/12350.txt
Edit
If you still want to show the entries of df1
even when there is no matching entries in df2
, you can use left merge with parameter how='left'
, as follows:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')
Result:
If the path /table/text3/12347.txt is missing:
ID Filename Path 0 12345 12345.txt /table/text1/12345.txt 1 12346 12346.txt /table/text2/12346.txt 2 12347 12347.txt NaN 3 12348 12348.txt /table/text1/12348.txt 4 12349 12349.txt /table/text3/12349.txt 5 12350 12350.txt /table/text1/12350.txt