I have 2 dataframes and I wish to grab IDs matching with DF2 into df1 merged as separate columns. There are multiple columns to be added due to df2 having many different country names for a specific ID.
df1 looks like below:
ID URL A example.com/1 B example.com/2 C example.com/3 D example.com/4
df2 is like this:
ID country URL A usa example.com/usa-A B uk example.com/uk-B C canada example.com/canada-C A uk example.com/uk-A C usa example.com/usa-C
What I am expecting df1 to look like:
ID URL uk USA Canada A example.com/1 example.com/uk-A example.com/usa-A NaN B example.com/2 example.com/uk-B NaN NaN C example.com/3 NaN example.com/usa-C example.com/canada-C D example.com/4 NaN NaN NaN
I wish to bring if DF1 ID A is found in DF2 ID against a country then bring the country URL up next to df1 ID in a specific country column.
The way I am trying to achieve this is using a for loop with a map call below:
final = pd.DataFrame() for a in countries_list: b = df2.loc[(df2["country"] == a)] df1["country"] = df1['id'].map(df2.set_index('id')['url']) final = pd.concat([final, df1])
It runs for a certain amount of countries and then start throwing InvalidIndexError: Reindexing only valid with uniquely valued Index objects
which I tried to overcome using a reset_index() function on both df1 and df2 but still after a certain amount of iterations, it throws me the same error.
Can someone suggest a more efficient way to do this or any way i could run it over all possible iterations?
Thanks,
Advertisement
Answer
Try as follows:
res = df.merge(df2.pivot(index='ID',columns='country',values='URL'), left_on='ID', right_index=True, how='left') print(res) ID URL canada uk usa 0 A example.com/1 NaN example.com/uk-A example.com/usa-A 1 B example.com/2 NaN example.com/uk-B NaN 2 C example.com/3 example.com/canada-C NaN example.com/usa-C 3 D example.com/4 NaN NaN NaN
Explanation
- First, use
df.pivot
ondf2
. We get:
print(df2.pivot(index='ID',columns='country',values='URL')) country canada uk usa ID A NaN example.com/uk-A example.com/usa-A B NaN example.com/uk-B NaN C example.com/canada-C NaN example.com/usa-C
- Next, use
df.merge
to mergedf
and pivoteddf2
, joining the two onID
, and passingleft
to thehow
parameter to “use only keys from left frame”. (Leavehow='left'
out, if you are not interested in the row forID
with onlyNaN
values.)
If you’re set on a particular column order, use e.g.:
res = res.loc[:,['ID','URL','uk','usa','canada']]