Skip to content
Advertisement

Pandas apply same filtering on two dataframes

I have two pandas dataframes with the same indexes and lengths. However, they might contain nans at different places. See, example input below:

timestamp,y
2019-08-01 00:00:00,  772.00
2019-08-01 00:15:00,  648.00
2019-08-01 00:30:00,  nan
timestamp,y
2019-08-01 00:00:00,  612.00
2019-08-01 00:15:00,  nan
2019-08-01 00:30:00,  500.00

I want to do some calculations where I want to remove indexes in both the dataframes even if one of them has a NaN values. So, in those case only the first row is valid.

Is there a way to do this easily in pandas?

Advertisement

Answer

You can merge to get the common indices, then subset:

keep = df1.merge(df2, on='timestamp').dropna(how='any')['timestamp']

new_df1 = df1[df1['timestamp'].isin(keep)]

new_df2 = df2[df2['timestamp'].isin(keep)]

Alternative, you can use python sets to get the intersection:

keep = set(df1.dropna()['timestamp']).intersection(df2.dropna()['timestamp'])

output:

# new_df1
             timestamp      y
0  2019-08-01 00:00:00  772.0

# new_df2
             timestamp      y
0  2019-08-01 00:00:00  612.0
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement