I have two data frames where first dataframe has index starting from zero. The second dataframe has repeated indexes starting from zero. I want to to join the two dataframes based on their indexes. First dataframe is like this
Start_Year End_Year 0 1500 1500 1 1500 1501 2 1500 1700 3 1500 1800 4 1500 1800 ... ... ... 3409 2018 2018 3410 2018 2018 3411 2019 2019 3412 2019 2022 3413 2020 2020 3414 rows × 2 columns
The second dataframe is
0 [KingdomofPoland, Georgia] 0 [GrandDuchyofLithuania, Georgia] 1 [NorthernYuanDynasty, Georgia] 2 [SpanishEmpire, ChechenRepublic] 2 [CaptaincyGeneralofChile, ChechenRepublic] ... 3411 [SyrianOpposition, SpanishEmpire] 3412 [UnitedStates, SpanishEmpire] 3412 [UnitedKingdom, SpanishEmpire] 3412 [SaudiArabia, SpanishEmpire] 3413 [Turkey, Russia] Length: 31170, dtype: object
I want to join these two dataframes based on index i.e the new dataframe should look like
Start_Year End_Year new_col 0 1500 1500 [KingdomofPoland, Georgia] 0 1500 1500 [GrandDuchyofLithuania, Georgia] 1 1500 1501 [NorthernYuanDynasty, Georgia] 2 1500 1700 [SpanishEmpire, ChechenRepublic] 2 1500 1700 [CaptaincyGeneralofChile, ChechenRepublic] ...... 3411 2019 2019 [SyrianOpposition, SpanishEmpire] 3412 2019 2022 [UnitedStates, SpanishEmpire] 3412 2019 2022 [UnitedKingdom, SpanishEmpire] 3412 2019 2022 [SaudiArabia, SpanishEmpire] .......
What this essentially is I need to replicate the rows of dataframe 1 based on how many times the same index is repeated in the second dataframe. As we can see, in second dataframe, zero index appears twice, so we replicate the rows of zero index of dataframe 1 twice and then join the dataframes and so on. In the end we can reset the index(that I know about).
I am attaching the links of both the dataframes for the reference. Link for first dataframe https://drive.google.com/file/d/1DqxhnMM8R21Olm9zeRJeDgua_ozoRp8P/view?usp=sharing
Link for second dataframe https://drive.google.com/file/d/1sX5xcTeovVqXtZgSZ5cTC5JRdUvaw7gd/view?usp=sharing
I cant figure out how to proceed with such tasks. Please help me out.
Advertisement
Answer
Pandas join would do just what you request.
Consider an example
df1 = pd.DataFrame({'a': [0, 1], 'b': [1500, 1501]}) df2 = pd.DataFrame({'a': [0, 0, 1, 1], 'c': ['a', 'b', 'c', 'd']}) df1.set_index('a', inplace=True) df2.set_index('a', inplace=True)
We have two dataframes:
a b 0 0 1500 1 1 1501
and
a c 0 0 a 1 0 b 2 1 c 3 1 d
Now
df = df2.join(df1)[['b', 'c']]
Would be the dataframe
b c a 0 1500 a 0 1500 b 1 1501 c 1 1501 d