I wrote this code to extract multiple pages of data from this site (base URL – “https://www.goodreads.com/shelf/show/fiction“).
import requests from bs4 import BeautifulSoup import pandas as pd page = 1 book_title = [] while page != 5: url = 'https://www.goodreads.com/shelf/show/fiction?page={page}' response = requests.get(url) page_content = response.text doc = BeautifulSoup(page_content, 'html.parser') a_tags = doc.find_all('a', {'class': 'bookTitle'}) for tag in a_tags: book_title.append(tag.text) page = page + 1
But it’s only showing the first 50 books’ data. How can I extract all fiction books’ names extracting all pages using beautifulsoup?
Advertisement
Answer
You can make the pagination from fiction category of the books from your base base url, you need to input the fiction
keyword in search box and click on search button then you will get this url :https://www.goodreads.com/search?q=fiction&qid=ydDLZMCwDJ and from here you have to collect data and to make the next pages.
import requests from bs4 import BeautifulSoup import pandas as pd book_title = [] url = 'https://www.goodreads.com/search?page={page}&q=fiction&qid=ydDLZMCwDJ&tab=books' for page in range(1,11): response = requests.get(url.format(page=page)) page_content = response.text doc = BeautifulSoup(page_content, 'html.parser') a_tags = doc.find_all('a', {'class': 'bookTitle'}) for tag in a_tags: book_title.append(tag.get_text(strip=True)) df = pd.DataFrame(book_title,columns=['Title']) print(df)
Output:
Title 0 Trigger Warning: Short Fictions and Disturbances 1 You Are Not So Smart: Why You Have Too Many Fr... 2 Smoke and Mirrors: Short Fiction and Illusions 3 Fragile Things: Short Fictions and Wonders 4 Collected Fictions .. ... 195 The Science Fiction Hall of Fame, Volume One, ... 196 The Art of Fiction: Notes on Craft for Young W... 197 Invisible Planets: Contemporary Chinese Scienc... 198 How Fiction Works 199 Monster, She Wrote: The Women Who Pioneered Ho... [200 rows x 1 columns]