I have two dataframe, and would like to keep only row that both matches exactly on index (in this case datetime), and would like to return as two separate dataframe accordingly.
df1: DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z 2021-10-05 17:04:36,-0.0078125,-0.008544921875,0.9970703125 2021-10-05 17:04:37,0.009521484375,-0.011474609375,0.97900390625 2021-10-05 17:04:38,-0.013427734375,-0.0111083984375,0.9793701171875 2021-10-05 17:04:39,-0.007080078125,0.0028076171875,0.98583984375 2021-10-05 17:04:40,0.012939453125,-0.005615234375,0.996826171875 2021-10-05 17:04:41,-0.0062255859375,0.008056640625,1.0028076171875 2021-10-05 17:04:42,0.0177001953125,-0.001953125,1.0042724609375 2021-10-05 17:04:43,-0.0057373046875,-0.0030517578125,0.99072265625 2021-10-05 17:04:44,0.006103515625,0.0101318359375,1.0035400390625 2021-10-05 17:04:45,0.002197265625,-0.0029296875,0.9827880859375 2021-10-05 17:04:46,-0.001708984375,-0.0045166015625,0.9974365234375 2021-10-05 17:04:47,0.0057373046875,-0.0048828125,0.9930419921875 df2: DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z 2021-10-05 17:04:37,-0.03488372093023256,-0.018604651162790697,0.046511627906976744 2021-10-05 17:04:38,-0.006730769230769231,0.0,0.0 2021-10-05 17:04:39,0.02,0.0,0.0 2021-10-05 17:04:40,0.0,0.0,0.0 2021-10-05 17:04:41,0.13653846153846153,0.020192307692307693,-0.10480769230769231 2021-10-05 17:04:42,0.08571428571428572,0.0,0.10571428571428573 2021-10-05 17:04:43,-0.12115384615384615,0.0,-0.06346153846153847
Desired output:
df1: DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z 2021-10-05 17:04:37,0.009521484375,-0.011474609375,0.97900390625 2021-10-05 17:04:38,-0.013427734375,-0.0111083984375,0.9793701171875 2021-10-05 17:04:39,-0.007080078125,0.0028076171875,0.98583984375 2021-10-05 17:04:40,0.012939453125,-0.005615234375,0.996826171875 2021-10-05 17:04:41,-0.0062255859375,0.008056640625,1.0028076171875 2021-10-05 17:04:42,0.0177001953125,-0.001953125,1.0042724609375 df2: DATETIME,ACCEL_X,ACCEL_Y,ACCEL_Z 2021-10-05 17:04:37,-0.03488372093023256,-0.018604651162790697,0.046511627906976744 2021-10-05 17:04:38,-0.006730769230769231,0.0,0.0 2021-10-05 17:04:39,0.02,0.0,0.0 2021-10-05 17:04:40,0.0,0.0,0.0 2021-10-05 17:04:41,0.13653846153846153,0.020192307692307693,-0.10480769230769231 2021-10-05 17:04:42,0.08571428571428572,0.0,0.10571428571428573
Advertisement
Answer
Use align
with inner
join:
new_df1, new_df2 = df1.align(df2, join='inner')
*Note this will align both index and columns (which works for the provided sample), however, we can also specify the axis to only align the index if needed:
new_df1, new_df2 = df1.align(df2, join='inner', axis=0)
new_df1
:
ACCEL_X ACCEL_Y ACCEL_Z DATETIME 2021-10-05 17:04:37 0.009521 -0.011475 0.979004 2021-10-05 17:04:38 -0.013428 -0.011108 0.979370 2021-10-05 17:04:39 -0.007080 0.002808 0.985840 2021-10-05 17:04:40 0.012939 -0.005615 0.996826 2021-10-05 17:04:41 -0.006226 0.008057 1.002808 2021-10-05 17:04:42 0.017700 -0.001953 1.004272 2021-10-05 17:04:43 -0.005737 -0.003052 0.990723
new_df2
:
ACCEL_X ACCEL_Y ACCEL_Z DATETIME 2021-10-05 17:04:37 -0.034884 -0.018605 0.046512 2021-10-05 17:04:38 -0.006731 0.000000 0.000000 2021-10-05 17:04:39 0.020000 0.000000 0.000000 2021-10-05 17:04:40 0.000000 0.000000 0.000000 2021-10-05 17:04:41 0.136538 0.020192 -0.104808 2021-10-05 17:04:42 0.085714 0.000000 0.105714 2021-10-05 17:04:43 -0.121154 0.000000 -0.063462