I have a large DataFrame with 50+ columns which I’m simplifying here below:
JavaScript
x
8
1
students = [('Samurai', 34, '777.0', 'usa--->jp', 'usd--->yen') ,
2
('Jack', 31, '555.5','usa','usd') ,
3
('Mojo', 16,'488.1','n/a','n/a') ,
4
('Jojo', 32,'119.11','uk--->usa','pound--->usd')]
5
6
# Create a DataFrame object
7
df = pd.DataFrame(students, columns=['Name', 'Age', 'Balance', 'Country','Currency'])
8
I’m trying to find
a) whether there are any instances of ‘—>’ in any of the cells across the DataFrame?
b) if so where? (optional)
So far I’ve tried 2 approaches
JavaScript
1
2
1
boolDf = df.isin(['--->']).any().any()
2
this only works for strings not substrings
JavaScript
1
4
1
columns = list(df)
2
for col in columns:
3
df[col].str.find('--->', 0).any()
4
I get:
JavaScript
1
2
1
AttributeError: Can only use .str accessor with string values!
2
(I believe this may only work for columns with string types)
Would appreciate any help. Open to other approaches as well.
Advertisement
Answer
You can use .applymap()
to test each individual value in a dataframe.
JavaScript
1
14
14
1
>>> df
2
Name Age Balance Country Currency
3
0 Samurai 34 777.0 usa--->jp usd--->yen
4
1 Jack 31 555.5 usa usd
5
2 Mojo 16 488.1 n/a n/a
6
3 Jojo 32 119.11 uk--->usa pound--->usd
7
8
>>> df.applymap(lambda x: isinstance(x, str) and '--->' in x)
9
Name Age Balance Country Currency
10
0 False False False True True
11
1 False False False False False
12
2 False False False False False
13
3 False False False True True
14
To use the .str
accessor you can:
JavaScript
1
7
1
>>> df.select_dtypes(object).apply(lambda col: col.str.contains('--->'))
2
Name Balance Country Currency
3
0 False False True True
4
1 False False False False
5
2 False False False False
6
3 False False True True
7
The output differs a little – note the Age
column is not there.