I have a dataframe like as below
data_df = pd.DataFrame({'p_id': ['abc@gmail.com','abc@gmail.com','abc@gmail.com','ace@gmail.com','ace@gmail.com','pqr@gmail.com','pqr@gmail.com'], 'company': ['a','b','c','d','e','f','g'], 'dept_access':['a1','a1','a1','a1','a2','a2','a2']}) key_df = pd.DataFrame({'p_id': ['abc@gmail.com','xyz@gmail.com','pqr@gmail.com'], 'company': ['a','c','b'], 'location':['UK','USA','KOREA']})
I would like to do the below
a) Attach the location
column from key_df
to data_df
based on two fields – p_id
and company
So, I tried the below
loc = key_df.drop_duplicates(['p_id','company']).set_index(['p_id','company'])['location'] data_df['location'] = data_df[['p_id','company']].map(loc)
But this resulted in error like below
KeyError: “None of [Index([‘p_id’,’company’], dtype=’object’)] are in the [columns]”
How can I map based on multiple index columns? I don’t wish to use merge
Advertisement
Answer
Merge can be used for a lot, so let’s first try to use it:
data_df.merge(key_df, on=['p_id', 'company'], how="left")
p_id company dept_access location 0 abc@gmail.com a a1 UK 1 abc@gmail.com b a1 NaN 2 abc@gmail.com c a1 NaN 3 ace@gmail.com d a1 NaN 4 ace@gmail.com e a2 NaN 5 pqr@gmail.com f a2 NaN 6 pqr@gmail.com g a2 NaN
You can also do this by mapping the index like this:
idx = ['p_id', 'company'] data_df.assign(location=data_df.set_index(idx).index.map(key_df.set_index(idx)['location']))
p_id company dept_access location 0 abc@gmail.com a a1 UK 1 abc@gmail.com b a1 NaN 2 abc@gmail.com c a1 NaN 3 ace@gmail.com d a1 NaN 4 ace@gmail.com e a2 NaN 5 pqr@gmail.com f a2 NaN 6 pqr@gmail.com g a2 NaN