Skip to content
Advertisement

Web scraping python (beautifull soup) multiple page and subpage

I create my soup with :

JavaScript

I’m trying to create a dataframe from web scraping this site “https://myanimelist.net” et and i would like to get in a first step anime title, eps, type

and secondly in detail of each anime (page like that : https://myanimelist.net/anime/2928/hack__GU_Returner) i would like to gather the score that user assigned contains in (for example :

JavaScript

and

JavaScript

can you help to gather all that information ?

if my request it’s not clear, tell me.

Advertisement

Answer

This can be done directly with pandas using the read_html() function:

JavaScript

This returns a list of ALL tables found at a given URL. In your case, you only need the second table. This would give you a dataframe starting:

JavaScript

To do this using BeautifulSoup, you could use the following approach:

JavaScript

For each film, a list of all the scores found is created and appended to all_scores although it is not clear how you would this added to your main dataframe.

For example, scores could look like:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement