Filter DataFrame based on partial matching string from list

Tags: ,



I have a dataframe with lots of categories. Here list of some of them

Bank 

(0827) ОСП                                  
(0283) Банк ВТБ (ПАО)                       
(0822) ОСИП_ПЕНСЫ                           
(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I want to filter dataframe based on string matching. I don’t want to pass entire row name, i wanna pass something like [‘Совкомбанк’, ‘Тинькофф’]. The expecting result of this is :

(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I tried df = df[df[column_name].isin(values)] but i didn’t work.

Answer

.isin will check for exact match. What you are looking for is .str.contains:

match_strs =  ['Совкомбанк', 'Тинькофф']
df = df[df[column_name].str.contains("(" + "|".join(match_strs) + ")")]

You can have custom regular expressions within str.contains(...) to search for whatever you want.



Source: stackoverflow