I want to scrape a certain website weather data but the default page layout gives max of 40 results but when layout changed to simple list gives 100 results and the layout is set to default which is difficult to achieve with selenium. Is there any way to get the cookies saved in chrome to be used with beautiful soup.
JavaScript
x
22
22
1
import requests
2
from bs4 import BeautifulSoup
3
import browser_cookie3
4
cj = browser_cookie3.load()
5
s = requests.Session()
6
url = "https:/something.org/titles/2"
7
i=1
8
print(cj)
9
for c in cj:
10
if 'mangadex' in str(c):
11
12
s.cookies.set_cookie(c)
13
r = s.get(url)
14
soup = BeautifulSoup(r.content, 'lxml')
15
for anime in soup.find_all('div', {'class': 'manga-entry col-lg-6 border-bottom pl-0 my-1'}):
16
det = anime.find('a', {"class": "ml-1 manga_title text-truncate"})
17
anime_name = det.text
18
anime_link = det['href']
19
stars = anime.select("span")[3].text
20
print(anime_name, anime_link, stars,i)
21
i=i+1
22
Advertisement
Answer
Try:
JavaScript
1
9
1
import browser_cookie3
2
import requests
3
cj = browser_cookie3.load()
4
s = requests.Session()
5
for c in cj:
6
if 'sitename' in str(c):
7
s.cookies.set_cookie(c)
8
r = s.get(the_site)
9
This code use the browsers cookies in the requests
module in as Session
. Simply change sitename
to the site you want cookies from.
Your new code:
JavaScript
1
22
22
1
import requests
2
from bs4 import BeautifulSoup
3
import browser_cookie3
4
5
cj = browser_cookie3.load()
6
s = requests.Session()
7
url = "https://something.org/titles/2"
8
i = 1
9
print(cj)
10
for c in cj:
11
if 'mangadex' in str(c):
12
s.cookies.set_cookie(c)
13
r = s.get(url)
14
soup = BeautifulSoup(r.content, 'lxml')
15
for anime in soup.find_all('div', {'class': 'manga-entry row m-0 border-bottom'}):
16
det = anime.find('a', {"class": "ml-1 manga_title text-truncate"})
17
anime_name = det.text
18
anime_link = det['href']
19
stars = anime.select("span")[3].text
20
print(anime_name, anime_link, stars, i)
21
i = i + 1
22
prints:
JavaScript
1
29
29
1
-Hitogatana- /title/540/hitogatana 4 1
2
-PIQUANT- /title/44134/piquant 5 2
3
-Rain- /title/37103/rain 4 3
4
-SINS- /title/1098/sins 4
5
:radical /title/46819/radical 1 5
6
:REverSAL /title/3877/reversal 3 6
7
/title/52206/ 7
8
Curtain. ~Sensei to Kiyoraka ni Dousei~ /title/7829/curtain-sensei-to-kiyoraka-ni-dousei 8
9
Junai no Seinen /title/28947/junai-no-seinen 9
10
no Onna /title/10162/no-onna 2 10
11
Seishunchuu! /title/19186/seishunchuu 11
12
Virgin Love /title/28945/virgin-love 12
13
.flow - Untitled (Doujinshi) /title/27292/flow-untitled-doujinshi 2 13
14
.gohan /title/50410/gohan 14
15
.hack//4koma + Gag Senshuken /title/7750/hack-4koma-gag-senshuken 24 15
16
.hack//Alcor - Hagun no Jokyoku /title/24375/hack-alcor-hagun-no-jokyoku 16
17
.hack//G.U.+ /title/7757/hack-g-u 1 17
18
.hack//GnU /title/7758/hack-gnu 18
19
.hack//Link - Tasogare no Kishidan /title/24374/hack-link-tasogare-no-kishidan 1 19
20
.hack//Tasogare no Udewa Densetsu /title/5817/hack-tasogare-no-udewa-densetsu 20
21
.hack//XXXX /title/7759/hack-xxxx 21
22
.traeH /title/9789/traeh 22
23
(G) Edition /title/886/g-edition 1 23
24
(Not) a Househusband /title/22832/not-a-househusband 6 24
25
(R)estauraNTR /title/37551/r-estaurantr 14 25
26
[ rain ] 1st Story /title/25587/rain-1st-story 3 26
27
[another] Xak /title/24881/another-xak 27
28
[es] ~Eternal Sisters~ /title/4879/es-eternal-sisters 1 28
29
and so on to 100…