Using: Python in Google Collab
Thanks in Advance:
I have run this code on other data I have scraped FBREF, so I am unsure why it’s happening now. The only difference is the way I scraped it.
The first time I scraped it:
url_link = 'https://fbref.com/en/comps/Big5/gca/players/Big-5-European-Leagues-Stats'
The second time I scraped it:
url = 'https://fbref.com/en/comps/22/stats/Major-League-Soccer-Stats'
html_content = requests.get(url).text.replace('<!--', '').replace('-->', '')
df = pd.read_html(html_content)
I then convert the data from object to float so I can do a calculation, after I have pulled it into my dataframe:
dfstandard['90s'] = dfstandard['90s'].astype(float)
dfstandard['Gls'] = dfstandard['Gls'].astype(float)
I look and it shows they are both floats:
10 90s 743 non-null float64
11 Gls 743 non-null float64
But when I run the code that as worked previously:
dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
I get the error message “TypeError: ‘<‘ not supported between instances of ‘str’ and ‘int'”
I am fairly new to scraping, I’m stuck and don’t know what to do next.
The full error message is below:
<ipython-input-152-e0ab76715b7d> in <module>() 1 #turn data into p 90 ----> 2 dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s'] 3 dfstandard['Ast'] = dfstandard['Ast'] / dfstandard['90s'] 4 dfstandard['G-PK'] = dfstandard['G-PK'] / dfstandard['90s'] 5 dfstandard['PK'] = dfstandard['PK'] / dfstandard['90s'] 8 frames /usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in _outer_indexer(self, left, right) 261 262 def _outer_indexer(self, left, right): --> 263 return libjoin.outer_join_indexer(left, right)``` 264 265 _typ = "index" pandas/_libs/join.pyx in pandas._libs.join.outer_join_indexer() TypeError: '<' not supported between instances of 'str' and 'int'>
Advertisement
Answer
There are two Gls
columns in your dataframe. I think you converted only one "Gls"
column to float, and when you do dfstandard['Gls'] = dfstandard['Gls'] / dfstandard['90s']
, the other “Gls” column is getting considered?…
Try stripping whitespace from the column names too
df = df.rename(columns=lambda x: x.strip()) df['90s'] = pd.to_numeric(df['90s'], errors='coerce') df['Gls'] = pd.to_numeric(df['Gls'], errors='coerce')
Thus the error.