I have two pandas dataframes with the same indexes and lengths. However, they might contain nans at different places. See, example input below:
timestamp,y 2019-08-01 00:00:00, 772.00 2019-08-01 00:15:00, 648.00 2019-08-01 00:30:00, nan
timestamp,y 2019-08-01 00:00:00, 612.00 2019-08-01 00:15:00, nan 2019-08-01 00:30:00, 500.00
I want to do some calculations where I want to remove indexes in both the dataframes even if one of them has a NaN values. So, in those case only the first row is valid.
Is there a way to do this easily in pandas?
Advertisement
Answer
You can merge
to get the common indices, then subset:
keep = df1.merge(df2, on='timestamp').dropna(how='any')['timestamp'] new_df1 = df1[df1['timestamp'].isin(keep)] new_df2 = df2[df2['timestamp'].isin(keep)]
Alternative, you can use python sets to get the intersection:
keep = set(df1.dropna()['timestamp']).intersection(df2.dropna()['timestamp'])
output:
# new_df1 timestamp y 0 2019-08-01 00:00:00 772.0 # new_df2 timestamp y 0 2019-08-01 00:00:00 612.0