Skip to content
Advertisement

Find a substring in cells across multiple columns in a Pandas dataframe

I have a large DataFrame with 50+ columns which I’m simplifying here below:

students = [('Samurai', 34, '777.0', 'usa--->jp', 'usd--->yen') ,
            ('Jack', 31, '555.5','usa','usd') ,
            ('Mojo', 16,'488.1','n/a','n/a') ,
            ('Jojo', 32,'119.11','uk--->usa','pound--->usd')]

# Create a DataFrame object
df = pd.DataFrame(students, columns=['Name', 'Age', 'Balance', 'Country','Currency'])

enter image description here

I’m trying to find

a) whether there are any instances of ‘—>’ in any of the cells across the DataFrame?

b) if so where? (optional)

So far I’ve tried 2 approaches

boolDf = df.isin(['--->']).any().any()

this only works for strings not substrings

columns = list(df)
for col in columns:    
    df[col].str.find('--->', 0).any()

I get:

AttributeError: Can only use .str accessor with string values!

(I believe this may only work for columns with string types)

Would appreciate any help. Open to other approaches as well.

Advertisement

Answer

You can use .applymap() to test each individual value in a dataframe.

>>> df
      Name  Age Balance    Country      Currency
0  Samurai   34   777.0  usa--->jp    usd--->yen
1     Jack   31   555.5        usa           usd
2     Mojo   16   488.1        n/a           n/a
3     Jojo   32  119.11  uk--->usa  pound--->usd

>>> df.applymap(lambda x: isinstance(x, str) and '--->' in x)
    Name    Age  Balance  Country  Currency
0  False  False    False     True      True
1  False  False    False    False     False
2  False  False    False    False     False
3  False  False    False     True      True

To use the .str accessor you can:

>>> df.select_dtypes(object).apply(lambda col: col.str.contains('--->'))
    Name  Balance  Country  Currency
0  False    False     True      True
1  False    False    False     False
2  False    False    False     False
3  False    False     True      True

The output differs a little – note the Age column is not there.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement