Skip to content
Advertisement

How to query/filter cells against single values when cells have multiple values?

I have a csv file that follows the following format

Columns one Column two
Key1 Value1,Value2,value3
Key2 value5

I can easily use a list and .isin to filter the data-frame as follows:

JavaScript

Which gives me the second row, but if there are cells with multiple values (like in the first row in the example table above with the Value1,Value2,value3) then the isin filters no longer works for single values like just value1. This makes sense since the "" is turning them into a single string which I missed because spreadsheets remove the "".

For example,When I do this

JavaScript

Then the nothing is returned because the first row has Value1,Value2,value3 as one single string. (or the first row is not produced as output as the desired outcome).

IMPORTANT NOTE: I want to query all columns not just one.

So, how can I set this code up such I can query multiple elements with cells?

Is there a way to do this in pandas?

Advertisement

Answer

You can Stack the dataframe to reshape, then split and explode the strings and use isin to test for occurrence of strings in list_keep, then groupby on level=0 and reduce with any to create a boolean mask:

JavaScript

Alternative approach with applymap and set operations:

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement