Skip to content
Advertisement

Python Pandas compare two dataframe and keep only data that index appears in both dataframe

I have two dataframe, and would like to keep only row that both matches exactly on index (in this case datetime), and would like to return as two separate dataframe accordingly.

df1:
DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z
2021-10-05 17:04:36,-0.0078125,-0.008544921875,0.9970703125
2021-10-05 17:04:37,0.009521484375,-0.011474609375,0.97900390625
2021-10-05 17:04:38,-0.013427734375,-0.0111083984375,0.9793701171875
2021-10-05 17:04:39,-0.007080078125,0.0028076171875,0.98583984375
2021-10-05 17:04:40,0.012939453125,-0.005615234375,0.996826171875
2021-10-05 17:04:41,-0.0062255859375,0.008056640625,1.0028076171875
2021-10-05 17:04:42,0.0177001953125,-0.001953125,1.0042724609375
2021-10-05 17:04:43,-0.0057373046875,-0.0030517578125,0.99072265625
2021-10-05 17:04:44,0.006103515625,0.0101318359375,1.0035400390625
2021-10-05 17:04:45,0.002197265625,-0.0029296875,0.9827880859375
2021-10-05 17:04:46,-0.001708984375,-0.0045166015625,0.9974365234375
2021-10-05 17:04:47,0.0057373046875,-0.0048828125,0.9930419921875


df2:
DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z
2021-10-05 17:04:37,-0.03488372093023256,-0.018604651162790697,0.046511627906976744
2021-10-05 17:04:38,-0.006730769230769231,0.0,0.0
2021-10-05 17:04:39,0.02,0.0,0.0
2021-10-05 17:04:40,0.0,0.0,0.0
2021-10-05 17:04:41,0.13653846153846153,0.020192307692307693,-0.10480769230769231
2021-10-05 17:04:42,0.08571428571428572,0.0,0.10571428571428573
2021-10-05 17:04:43,-0.12115384615384615,0.0,-0.06346153846153847

Desired output:

df1:
DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z
2021-10-05 17:04:37,0.009521484375,-0.011474609375,0.97900390625
2021-10-05 17:04:38,-0.013427734375,-0.0111083984375,0.9793701171875
2021-10-05 17:04:39,-0.007080078125,0.0028076171875,0.98583984375
2021-10-05 17:04:40,0.012939453125,-0.005615234375,0.996826171875
2021-10-05 17:04:41,-0.0062255859375,0.008056640625,1.0028076171875
2021-10-05 17:04:42,0.0177001953125,-0.001953125,1.0042724609375

df2:
DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z
2021-10-05 17:04:37,-0.03488372093023256,-0.018604651162790697,0.046511627906976744
2021-10-05 17:04:38,-0.006730769230769231,0.0,0.0
2021-10-05 17:04:39,0.02,0.0,0.0
2021-10-05 17:04:40,0.0,0.0,0.0
2021-10-05 17:04:41,0.13653846153846153,0.020192307692307693,-0.10480769230769231
2021-10-05 17:04:42,0.08571428571428572,0.0,0.10571428571428573

Advertisement

Answer

Use align with inner join:

new_df1, new_df2 = df1.align(df2, join='inner')

*Note this will align both index and columns (which works for the provided sample), however, we can also specify the axis to only align the index if needed:

new_df1, new_df2 = df1.align(df2, join='inner', axis=0)

new_df1:

                      ACCEL_X   ACCEL_Y   ACCEL_Z
DATETIME                                         
2021-10-05 17:04:37  0.009521 -0.011475  0.979004
2021-10-05 17:04:38 -0.013428 -0.011108  0.979370
2021-10-05 17:04:39 -0.007080  0.002808  0.985840
2021-10-05 17:04:40  0.012939 -0.005615  0.996826
2021-10-05 17:04:41 -0.006226  0.008057  1.002808
2021-10-05 17:04:42  0.017700 -0.001953  1.004272
2021-10-05 17:04:43 -0.005737 -0.003052  0.990723

new_df2:

                      ACCEL_X   ACCEL_Y   ACCEL_Z
DATETIME                                         
2021-10-05 17:04:37 -0.034884 -0.018605  0.046512
2021-10-05 17:04:38 -0.006731  0.000000  0.000000
2021-10-05 17:04:39  0.020000  0.000000  0.000000
2021-10-05 17:04:40  0.000000  0.000000  0.000000
2021-10-05 17:04:41  0.136538  0.020192 -0.104808
2021-10-05 17:04:42  0.085714  0.000000  0.105714
2021-10-05 17:04:43 -0.121154  0.000000 -0.063462
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement