Skip to content
Advertisement

Pandas create column of dictionaries based on condition from another column

Let’s say if I have a Pandas df called df_1 like this:

id date_created rank_1 rank_2 rank_3 rank_dict
2223 3/3/21 3:26 www.google.com www.yahoo.com www.ford.com {www.google.com:3, www.yahoo.com:2, www.ford.com:1}
1112 2/25/21 1:35 www.autoblog.com www.motor1.com www.webull.com {www.autoblog.com:3, www.motor1.com:2, www.webull.com:1}

and another df called df_2 that looks like this:

id date_created rank_1 rank_2 rank_3
2223 4/9/21 5:15 www.yahoo.com www.whatever.com www.google.com
1112 8/20/21 2:30 www.gm.com www.motor1.com www.webull.com

I want to create a new column called new_rank_dict in df_2 using URLs in rank_1, rank_2, rank_3 in df_2 as Keys, and Values created using the following criteria:

  • Look up the row in df_1 that has matching id, if the rank_1 URL exists in the Keys of rank_dict from df_1 for that row, assign the same Value as it was seen from that dictionary. If the rank_1 URL doesn’t exist in that dictionary, assign a Value of 0 to it.
  • Do the same for rank_2 and rank_3, and finally will end up with a dictionary for each row in df_2.

For example, since row 1 in df_1 and df_2 share the same id (2223), and rank_1 (www.yahoo.com) in df_2 is a Key in rank_dict in df_1, and that Key has value of 2, then assign Value of 2 to the www.yahoo.com Key. rank_2 (www.whatever.com) doesn’t exist in rank_dict in df_1, so it gets a Value of 0. rank_3 (www.google.com) does exist in rank_dict in df_1 and its Value is 3, so assign the Value 3 to that Key for the new dictionary. At the end, row 1 in df_2 will have the new_rank_dict: {www.yahoo.com:2, www.whatever.com:0, www.google.com:3}

So the ideal result df_2 should look like this:

id date_created rank_1 rank_2 rank_3 rank_dict
2223 4/9/21 5:15 www.yahoo.com www.whatever.com www.google.com {www.yahoo.com:2, www.whatever.com:0, www.google.com:3}
1112 8/20/21 2:30 www.gm.com www.motor1.com www.webull.com {www.gm.com:0, www.motor1.com:2, www.webull.com:1}

I have been struggling to find a Pythonic way to achieve this goal efficiently – have searched on the web and most tutorials point to create a single dictionary from Pandas column, rather than a column of dictionary which is what I need here.

Any suggestion would be greatly appreciated!

Advertisement

Answer

Code

dcts = df2['id'].map(df1.set_index('id')['rank_dict'])
cols = df2.filter(like='rank')

df2['rank_dict'] = [{u: dct.get(u, 0) for u in urls}
                    for urls, dct in zip(cols.values, dcts)]

Explanations

map the column rank_dict from df1 to df2 based on the common id

>>> dcts

0         {'www.google.com': 3, 'www.yahoo.com': 2, 'www.ford.com': 1}
1    {'www.autoblog.com': 3, 'www.motor1.com': 2, 'www.webull.com': 1}
Name: id, dtype: object

Now filter the rank like columns from df2:

>>> cols
          rank_1            rank_2          rank_3
0  www.yahoo.com  www.whatever.com  www.google.com
1     www.gm.com    www.motor1.com  www.webull.com

zip the cols and dcts, then iterate over this zip iterator inside a list comprehension to create a required dictionary that satisfies the given criteria.

>>> df2

     id  date_created         rank_1            rank_2          rank_3                                                         rank_dict
0  2223   4/9/21 5:15  www.yahoo.com  www.whatever.com  www.google.com  {'www.yahoo.com': 2, 'www.whatever.com': 0, 'www.google.com': 3}
1  1112  8/20/21 2:30     www.gm.com    www.motor1.com  www.webull.com       {'www.gm.com': 0, 'www.motor1.com': 2, 'www.webull.com': 1}
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement