I’m new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.
Here is the code I have used:
league_dataframe.sort_values('overall_league_position')
The result that the sort method yields values in column ‘overall league position’ are not sorted in ascending or order which is the default for the method.
What am I doing wrong? Thanks for your patience!
Advertisement
Answer
For whatever reason, you seem to be working with a column of strings, and sort_values
is returning you a lexsorted result.
Here’s an example.
df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']}) df Col 0 1 1 2 2 3 3 10 4 20 5 19 df.sort_values('Col') Col 0 1 3 10 5 19 1 2 4 20 2 3
The remedy is to convert it to numeric, either using .astype
or pd.to_numeric
.
df.Col = df.Col.astype(float)
Or,
df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col') Col 0 1 1 2 2 3 3 10 5 19 4 20
The only difference b/w astype
and pd.to_numeric
is that the latter is more robust at handling non-numeric strings (they’re coerced to NaN
), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).