Skip to content
Advertisement

How to replace one column value with many in pandas dataframe

Input dataframe

Date            Geo         Shipment
2020-01-01      USA         1000
2020-01-01      BRA         5865
2020-01-01      CHN         4789
2020-01-02      EU1         6541
2020-01-02      EU2         3258
..

dict =  {"EU1":["ALA", "BEL", "AND", "AUT"] , "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , "EU3": ["EST", "HRV", "FRA", "DEU"]}

How to replace value in column Geo so that there is a one to many mapping and Shipment values are duplicated?

Output df

Date            Geo         Shipment
2020-01-01      ALA         1000
2020-01-01      BEL         1000
2020-01-01      AND         1000
2020-01-01      AUT         1000
..
2020-01-01      AUT         5865
2020-01-01      BEL         5865
2020-01-01      HRV         5865
2020-01-01      BGR         5865
2020-01-01      CZE         5865
..
2020-01-01      EST         4789
2020-01-01      HRV         4789
2020-01-01      FRA         4789
2020-01-01      DEU         4789
..

Advertisement

Answer

Use DataFrame by cosntructor, then outer join by DataFrame.merge and reassign column Geo by DataFrame.pop:

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])


df = df.merge(df1, on='Geo', how='outer')
df['Geo'] = df.pop('New')
df = df.sort_values('Date', ignore_index=True)

print (df)
          Date  Geo  Shipment
0   2020-01-01  ALA      1000
1   2020-01-01  HRV      4789
2   2020-01-01  EST      4789
3   2020-01-01  CZE      5865
4   2020-01-01  HRV      5865
5   2020-01-01  FRA      4789
6   2020-01-01  BEL      5865
7   2020-01-01  AUT      5865
8   2020-01-01  BGR      5865
9   2020-01-01  AUT      1000
10  2020-01-01  AND      1000
11  2020-01-01  BEL      1000
12  2020-01-01  DEU      4789
13  2020-01-02  AND      6541
14  2020-01-02  BEL      6541
15  2020-01-02  ALA      6541
16  2020-01-02  AUT      3258
17  2020-01-02  BEL      3258
18  2020-01-02  BGR      3258
19  2020-01-02  HRV      3258
20  2020-01-02  CZE      3258
21  2020-01-02  AUT      6541

Solution with new data:

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])

df = df.merge(df1, on='Geo', how='left')
df['Geo'] = df.pop('New').fillna(df['Geo'])
print (df)
          Date  Geo  Shipment
0   2020-01-01  USA      1000
1   2020-01-01  BRA      5865
2   2020-01-01  CHN      4789
3   2020-01-02  ALA      6541
4   2020-01-02  BEL      6541
5   2020-01-02  AND      6541
6   2020-01-02  AUT      6541
7   2020-01-02  AUT      3258
8   2020-01-02  BEL      3258
9   2020-01-02  BGR      3258
10  2020-01-02  HRV      3258
11  2020-01-02  CZE      3258
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement