Skip to content
Advertisement

Pandas’ read_html not reading html tables

I am trying to see if I can use, and only use, Pandas’ read_html function to scrape HTML tables from the following website: https://www.baseball-reference.com/teams/ATL/2021.shtml

I can fulfil my needs using selenium/bs but want to see if I can scrape this site’s tables with just pd.read_html alone.

Currently, pd.read_html returns the first two tables, but is not able to access tables past the second table.

Here is an example of a table ‘id’ that I am trying to access: ‘the40man’

And my code, which returns ‘ValueError: No tables found’:

JavaScript

The following code returns the first two tables, {‘id’: [‘team_batting’, ‘team_pitching’]}, but nothing more:

JavaScript

I am asking this question out of curiosity in case I’m missing something on my end. If not, this issue is likely due to pd.read_html’s limitations.

Thank you in advance for any input/pd.read_html tips!

Advertisement

Answer

The reference.com sites have some of those tables within the comments of the html. To pull those table out, you need to first pull out the comments. Then you can iterate through those to get the table you want:

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement