I wrote this code to extract multiple pages of data from this site (base URL – “https://www.goodreads.com/shelf/show/fiction“).
JavaScript
x
19
19
1
import requests
2
from bs4 import BeautifulSoup
3
import pandas as pd
4
5
page = 1
6
book_title = []
7
8
while page != 5:
9
url = 'https://www.goodreads.com/shelf/show/fiction?page={page}'
10
response = requests.get(url)
11
page_content = response.text
12
doc = BeautifulSoup(page_content, 'html.parser')
13
14
a_tags = doc.find_all('a', {'class': 'bookTitle'})
15
for tag in a_tags:
16
book_title.append(tag.text)
17
18
page = page + 1
19
But it’s only showing the first 50 books’ data. How can I extract all fiction books’ names extracting all pages using beautifulsoup?
Advertisement
Answer
You can make the pagination from fiction category of the books from your base base url, you need to input the fiction
keyword in search box and click on search button then you will get this url :https://www.goodreads.com/search?q=fiction&qid=ydDLZMCwDJ and from here you have to collect data and to make the next pages.
JavaScript
1
20
20
1
import requests
2
from bs4 import BeautifulSoup
3
import pandas as pd
4
5
book_title = []
6
7
url = 'https://www.goodreads.com/search?page={page}&q=fiction&qid=ydDLZMCwDJ&tab=books'
8
for page in range(1,11):
9
response = requests.get(url.format(page=page))
10
page_content = response.text
11
doc = BeautifulSoup(page_content, 'html.parser')
12
13
a_tags = doc.find_all('a', {'class': 'bookTitle'})
14
for tag in a_tags:
15
book_title.append(tag.get_text(strip=True))
16
17
18
df = pd.DataFrame(book_title,columns=['Title'])
19
print(df)
20
Output:
JavaScript
1
15
15
1
Title
2
0 Trigger Warning: Short Fictions and Disturbances
3
1 You Are Not So Smart: Why You Have Too Many Fr
4
2 Smoke and Mirrors: Short Fiction and Illusions
5
3 Fragile Things: Short Fictions and Wonders
6
4 Collected Fictions
7
..
8
195 The Science Fiction Hall of Fame, Volume One,
9
196 The Art of Fiction: Notes on Craft for Young W
10
197 Invisible Planets: Contemporary Chinese Scienc
11
198 How Fiction Works
12
199 Monster, She Wrote: The Women Who Pioneered Ho
13
14
[200 rows x 1 columns]
15