Skip to content
Advertisement

Filter DataFrame based on partial matching string from list

I have a dataframe with lots of categories. Here list of some of them

Bank 

(0827) ОСП                                  
(0283) Банк ВТБ (ПАО)                       
(0822) ОСИП_ПЕНСЫ                           
(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I want to filter dataframe based on string matching. I don’t want to pass entire row name, i wanna pass something like [‘Совкомбанк’, ‘Тинькофф’]. The expecting result of this is :

(0260) АО Тинькофф Банк                     
(0755) ПАО Совкомбанк

I tried df = df[df[column_name].isin(values)] but i didn’t work.

Advertisement

Answer

.isin will check for exact match. What you are looking for is .str.contains:

match_strs =  ['Совкомбанк', 'Тинькофф']
df = df[df[column_name].str.contains("(" + "|".join(match_strs) + ")")]

You can have custom regular expressions within str.contains(...) to search for whatever you want.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement