I have a dataframe and I would like to find the n highest numbers in each column. There are a variety of methods to do this, but all seem to fail as a result of strings also being in the dataframe. I have tried a multitude of ways to get around this but I am always stumped by the presence of strings.
As some cells contain %
a blanket omission of all string type columns wouldn’t work. However, ignoring cells containing A-Z would work.
Example dataframe:
import pandas as pd test_data = { 'Animal': ['Otter', 'Turtle', 'Chicken'], 'Squeak Appeal': [12.8, 1.92, 11.4], 'Richochet Chance': ['8%', '30%', '16%'], } test_df = pd.DataFrame( test_data, columns=['Animal', 'Squeak Appeal','Richochet Chance'] )
i). Attempt using apply
:
test_df.apply( lambda x: pd.Series (x.str.strip('%').astype(float).nlargest(2).index) ) AttributeError: ('Can only use .str accessor with string values!', 'occurred at index Squeak Appeal')
ii). a). attempt using a for-loop
:
headers = list(test_df.columns.values) for header in headers: if not ['a-z'] in test_df[header]: max_value = ( test_df[header] .str.strip('%') # remove the ending % .astype(float) # convert to float .nlargest(10).index # nlargest and index ) TypeError: unhashable type: 'list'
ii). b). I also tried excluding ‘e’ as an experiment to get past the if-statement
:
#... if not 'e' in test_df[header]: #... AttributeError: Can only use .str accessor with string values!
iii). I attempted using numpy
as I had see utilised it elsewhere but don’t really grasp the idea:
import numpy as np N = 3 a = np.argsort(-test_df.values, axis=0)[-1:-1-N:-1] b = pd.DataFrame(df.index[a], columns=df.columns) print (b) TypeError: bad operand type for unary -: 'str'
I could go on but I feel like it would be a waste of text space. Could anyone point me in the right direction?
Example Outcome:
print(richochet_chance_max) Animal Squeak Appeal Richochet Chance 1 Turtle 1.92 30% 2 Chicken 11.40 16% print(squeak_appeal_max) Animal Squeak Appeal Richochet Chance 1 Otter 12.8 8% 2 Chicken 11.4 16%
Advertisement
Answer
You can convert the string column to float
, then convert it back as str
after obtaining the n largest values:
# Convert the string column to float test_df['Richochet Chance'] = test_df['Richochet Chance'].str.strip('%').astype(float) # Get nlargest as you want test_df = test_df.nlargest(2, columns=['Squeak Appeal', 'Richochet Chance']) # Convert the string column back to string test_df['Richochet Chance'] = test_df['Richochet Chance'].map(lambda x: f'{x:.0f}%')
Output for nlargest = 2
:
Animal Squeak Appeal Richochet Chance 0 Otter 12.8 8% 2 Chicken 11.4 16%