Rows That Are Included/Contained In a String

Question

I have a "Pandas Data Frame": There is a bunch of Q&A that explains how to select rows that contain a given substring. But I'm curious about finding how to split rows that are substring of a given string. Unfortunately my datas are huge but suppose we have a column that its entries are single words. For a given sentence

Accepted Answer

apply can be used to apply one function to all the rows (resp. columns) of a dataframe. It should not be used without caution, because as soon as you apply a Python function you lose the vectorization and performances fall down. Yet it is an appropriate tool here.df['Words'] in s should be written: df['Words'].apply(lambda x: x in s), and you end with:print(df[df['Words'].apply(lambda x: x in s)])    Words1    have2       a3  Pandas5   FrameHere we have kept the 'a', because it is indeed a substring of s. I you want to keep words, you should use split and compare full words:s = 'You have one Pandas array Frame'.split()It now gives the expected:    Words1    have3  Pandas5   Frame

Advertisement

Answer