I’m trying to take a query of recent customer transactions and match potential primary phone, cellphone and work phone matches against a particular list of customers I have.
Essentially, I am taking one dataframe column (the list of customers I am trying to see if they had transactions recently) against the overall universe of all recent transactions (dataframe being transaction_data) and remove any row that does not have a match in either the primary phone, cellphone or workphone column,
Here is what I am currently trying to do but it only returns Falses across each column header and does not filter the dataframe by rows as I had hoped,
transaction_data[(transaction_data['phone'].isin(df['phone'])) | (transaction_data['cell'].isin(df['phone'])) | (transaction_data['workphone'].isin(df['phone']))].any()
I’m trying to return a dataframe containing rows of transactional records where there is a match on either primary phone, cellphone or workphone.
Is there a better way to do this perhaps? Or do I need a minor tweak on my code?
Advertisement
Answer
The thing here is that applying the .isin()
method of a Series to another Series will return a boolean Series.
In your example transaction_data['phone']
is a Series, and also df['phone']
. The return of this method will be a boolean Series containing the value True
in a row which the value in transaction_data['phone']
appears in df['phone']
and False
otherwise. This is similar for all applications of isin()
method in your example.
And, good news! This boolean Series is exactly what is needed for slicing the dataframe. Therefor your code just need a small tweak. Just delete the .any()
at the end of the line.
transaction_data[(transaction_data['phone'].isin(df['phone'])) | (transaction_data['cell'].isin(df['phone'])) | (transaction_data['workphone'].isin(df['phone']))]