I’m quite new to pandas dataframes, and I’m experiencing some troubles joining two tables.
The first df has just 3 columns:
DF1
:
JavaScript
x
9
1
item_id position document_id
2
336 1 10
3
337 2 10
4
338 3 10
5
1001 1 11
6
1002 2 11
7
1003 3 11
8
38 10 146
9
And the second has exactly same two columns (and plenty of others):
DF2
:
JavaScript
1
5
1
item_id document_id col1 col2 col3
2
337 10
3
1002 11
4
1003 11
5
What I need is to perform an operation which, in SQL, would look as follows:
JavaScript
1
5
1
DF1 join DF2 on
2
DF1.document_id = DF2.document_id
3
and
4
DF1.item_id = DF2.item_id
5
And, as a result, I want to see DF2, complemented with column ‘position’:
JavaScript
1
2
1
item_id document_id position col1 col2 col3
2
What is a good way to do this using pandas?
Advertisement
Answer
I think you need merge
with default inner
join, but is necessary no duplicated combinations of values in both columns:
JavaScript
1
13
13
1
print (df2)
2
item_id document_id col1 col2 col3
3
0 337 10 s 4 7
4
1 1002 11 d 5 8
5
2 1003 11 f 7 0
6
7
df = pd.merge(df1, df2, on=['document_id','item_id'])
8
print (df)
9
item_id position document_id col1 col2 col3
10
0 337 2 10 s 4 7
11
1 1002 2 11 d 5 8
12
2 1003 3 11 f 7 0
13
But if necessary position
column in position 3
:
JavaScript
1
9
1
df = pd.merge(df2, df1, on=['document_id','item_id'])
2
cols = df.columns.tolist()
3
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
4
print (df)
5
item_id document_id position col1 col2 col3
6
0 337 10 2 s 4 7
7
1 1002 11 2 d 5 8
8
2 1003 11 3 f 7 0
9