Concat DataFrame Reindexing only valid with uniquely valued Index objects

Question

I am trying to concat the following dataframes: df1 and: df2 With But I get the follwing output: I have removed additional columns and removed duplicates and NA where there could be a conflict - but I simply do not know what's wrong. Answer pd.concat requires that the indices be unique. To remove rows with duplicate indices, use Note there

Accepted Answer

pd.concat requires that the indices be unique. To remove rows with duplicate indices, usedf = df.loc[~df.index.duplicated(keep='first')]import pandas as pdfrom pandas import Timestampdf1 = pd.DataFrame(    {'price': [0.7286, 0.7286, 0.7286, 0.7286],     'side': [2, 2, 2, 2],     'timestamp': [1451865675631331, 1451865675631400,                  1451865675631861, 1451865675631866]},    index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))df2 = pd.DataFrame(    {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],     'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],     'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],     'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},    index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))df1 = df1.loc[~df1.index.duplicated(keep='first')]#              price  side         timestamp# 2000-01-01  0.7286     2  1451865675631331# 2001-01-01  0.7286     2  1451865675631861# 2002-01-01  0.7286     2  1451865675631866df2 = df2.loc[~df2.index.duplicated(keep='first')]#                bid  bid_size   offer  offer_size# 2000-01-01  0.7284   4000000  0.7285     1000000# 2001-01-01  0.7284   4000000  0.7290     4000000# 2002-01-01  0.7284   5000000  0.7286     4000000# 2003-01-01  0.7285   1000000  0.7286     4000000# 2004-01-01  0.7285   4000000  0.7290     4000000result = pd.concat([df1, df2], axis=0)print(result)               bid  bid_size   offer  offer_size   price  side     timestamp2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaN           NaN2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaN           NaN2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaN           NaN2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaN           NaN2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaN           NaNNote there is also pd.join, which can join DataFrames based on their indices,and handle non-unique indices based on the how parameter. Rows with duplicateindex are not removed.In [94]: df1.join(df2)Out[94]:              price  side         timestamp     bid  bid_size   offer  2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285   2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285   2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290   2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286               offer_size  2000-01-01     1000000  2000-01-01     1000000  2001-01-01     4000000  2002-01-01     4000000  In [95]: df1.join(df2, how='outer')Out[95]:              price  side     timestamp     bid  bid_size   offer  offer_size2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     10000002000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     10000002001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     40000002002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     40000002003-01-01     NaN   NaN           NaN  0.7285   1000000  0.7286     40000002004-01-01     NaN   NaN           NaN  0.7285   4000000  0.7290     4000000

Advertisement

Answer