I have two time-series data frames that track the same certain countries throughout the same amount of time, but the variables they track for each observation represent vastly different things. For example, the first data frame is like so:
Tracking variable ‘A’:
Country | 01/01/2020 | 01/02/2020 | 01/03/2020 | … | 04/25/2021 |
---|---|---|---|---|---|
AFG | 0 | 0 | 1 | … | 5000 |
CHN | 0 | 20 | 50 | … | 0 |
USA | 1 | 2 | 4 | … | 10000 |
The second data frame tracks a different variable ‘B’ and looks like the following:
Country | 01/01/2020 | 01/02/2020 | 01/03/2020 | … | 04/25/2021 |
---|---|---|---|---|---|
AFG | 10 | 10 | 0 | … | 0 |
CHN | 1 | 1 | 1 | … | 20 |
USA | 0 | 7 | 10 | … | 543 |
I want to merge these two data frames so that the following is produced ({} to represent internal dictionaries):
Country | 01/01/2020 | 01/02/2020 | 01/03/2020 | … | 04/25/2021 |
---|---|---|---|---|---|
AFG | (A: 0, B: 10) | (A: 0, B: 10) | (A: 1, B: 0) | … | (A: 5000, B: 0) |
CHN | (A: 0, B: 1) | (A: 20, B: 1) | (A: 50, B: 1) | … | (A: 0, B: 20) |
USA | (A: 1, B: 0) | (A: 2, B: 7) | (A: 4, B: 10) | … | (A: 10000, B: 543) |
Is there a command I am missing in Pandas or is there some type of way to do this I am missing? Thank you for whatever help you can give me.
Advertisement
Answer
Code
pd.concat([df1, df2]).groupby('Country').agg(list).applymap( lambda x: dict(zip(['A','B'], [*x]))).reset_index()
Output
Country 01/01/2020 01/02/2020 01/03/2020 04/25/2021 0 AFG {'A': 0, 'B': 10} {'A': 0, 'B': 10} {'A': 1, 'B': 0} {'A': 5000, 'B': 0} 1 CHN {'A': 0, 'B': 1} {'A': 20, 'B': 1} {'A': 50, 'B': 1} {'A': 0, 'B': 20} 2 USA {'A': 1, 'B': 0} {'A': 2, 'B': 7} {'A': 4, 'B': 10} {'A': 10000, 'B': 543}
Explanation
We are first concatenating both dfs using pd.concat.
Then we are grouping the resultant df by country using groupby.
On the grouped df is aggregated using agg to create a list of items based on Country
.
Then we are mapping the list items to create a dict
with A and B
as keys using applymap