I have a large DataFrame with 50+ columns which I’m simplifying here below:
students = [('Samurai', 34, '777.0', 'usa--->jp', 'usd--->yen') , ('Jack', 31, '555.5','usa','usd') , ('Mojo', 16,'488.1','n/a','n/a') , ('Jojo', 32,'119.11','uk--->usa','pound--->usd')] # Create a DataFrame object df = pd.DataFrame(students, columns=['Name', 'Age', 'Balance', 'Country','Currency'])
I’m trying to find
a) whether there are any instances of ‘—>’ in any of the cells across the DataFrame?
b) if so where? (optional)
So far I’ve tried 2 approaches
boolDf = df.isin(['--->']).any().any()
this only works for strings not substrings
columns = list(df) for col in columns: df[col].str.find('--->', 0).any()
I get:
AttributeError: Can only use .str accessor with string values!
(I believe this may only work for columns with string types)
Would appreciate any help. Open to other approaches as well.
Advertisement
Answer
You can use .applymap()
to test each individual value in a dataframe.
>>> df Name Age Balance Country Currency 0 Samurai 34 777.0 usa--->jp usd--->yen 1 Jack 31 555.5 usa usd 2 Mojo 16 488.1 n/a n/a 3 Jojo 32 119.11 uk--->usa pound--->usd >>> df.applymap(lambda x: isinstance(x, str) and '--->' in x) Name Age Balance Country Currency 0 False False False True True 1 False False False False False 2 False False False False False 3 False False False True True
To use the .str
accessor you can:
>>> df.select_dtypes(object).apply(lambda col: col.str.contains('--->')) Name Balance Country Currency 0 False False True True 1 False False False False 2 False False False False 3 False False True True
The output differs a little – note the Age
column is not there.