Skip to content
Advertisement

Is there any way to get the cookies and cache of a visited website from chrome to beautifulsoup in Python?

I want to scrape a certain website weather data but the default page layout gives max of 40 results but when layout changed to simple list gives 100 results and the layout is set to default which is difficult to achieve with selenium. Is there any way to get the cookies saved in chrome to be used with beautiful soup.

import requests
from bs4 import BeautifulSoup
import browser_cookie3
cj = browser_cookie3.load()
s = requests.Session()
url = "https:/something.org/titles/2"
i=1
print(cj)
for c in cj:
    if 'mangadex' in str(c):
        
        s.cookies.set_cookie(c)
r = s.get(url)
soup = BeautifulSoup(r.content, 'lxml')
for anime in soup.find_all('div', {'class': 'manga-entry col-lg-6 border-bottom pl-0 my-1'}):
    det = anime.find('a', {"class": "ml-1 manga_title text-truncate"})
    anime_name = det.text
    anime_link = det['href']
    stars = anime.select("span")[3].text
    print(anime_name, anime_link, stars,i)
    i=i+1

Advertisement

Answer

Try:

import browser_cookie3
import requests
cj = browser_cookie3.load()
s = requests.Session()
for c in cj:
    if 'sitename' in str(c):
        s.cookies.set_cookie(c)
r = s.get(the_site)

This code use the browsers cookies in the requests module in as Session. Simply change sitename to the site you want cookies from.

Your new code:

import requests
from bs4 import BeautifulSoup
import browser_cookie3

cj = browser_cookie3.load()
s = requests.Session()
url = "https://something.org/titles/2"
i = 1
print(cj)
for c in cj:
    if 'mangadex' in str(c):
        s.cookies.set_cookie(c)
r = s.get(url)
soup = BeautifulSoup(r.content, 'lxml')
for anime in soup.find_all('div', {'class': 'manga-entry row m-0 border-bottom'}):
    det = anime.find('a', {"class": "ml-1 manga_title text-truncate"})
    anime_name = det.text
    anime_link = det['href']
    stars = anime.select("span")[3].text
    print(anime_name, anime_link, stars, i)
    i = i + 1

prints:

-Hitogatana- /title/540/hitogatana  4 1
-PIQUANT- /title/44134/piquant  5 2
-Rain- /title/37103/rain  4 3
-SINS- /title/1098/sins  4
:radical /title/46819/radical  1 5
:REverSAL /title/3877/reversal  3 6
... /title/52206/  7
...Curtain. ~Sensei to Kiyoraka ni Dousei~ /title/7829/curtain-sensei-to-kiyoraka-ni-dousei  8
...Junai no Seinen /title/28947/junai-no-seinen  9
...no Onna /title/10162/no-onna  2 10
...Seishunchuu! /title/19186/seishunchuu  11
...Virgin Love /title/28945/virgin-love  12
.flow - Untitled (Doujinshi) /title/27292/flow-untitled-doujinshi  2 13
.gohan /title/50410/gohan  14
.hack//4koma + Gag Senshuken /title/7750/hack-4koma-gag-senshuken  24 15
.hack//Alcor - Hagun no Jokyoku /title/24375/hack-alcor-hagun-no-jokyoku  16
.hack//G.U.+ /title/7757/hack-g-u  1 17
.hack//GnU /title/7758/hack-gnu  18
.hack//Link - Tasogare no Kishidan /title/24374/hack-link-tasogare-no-kishidan  1 19
.hack//Tasogare no Udewa Densetsu /title/5817/hack-tasogare-no-udewa-densetsu  20
.hack//XXXX /title/7759/hack-xxxx  21
.traeH /title/9789/traeh  22
(G) Edition /title/886/g-edition  1 23
(Not) a Househusband /title/22832/not-a-househusband  6 24
(R)estauraNTR /title/37551/r-estaurantr  14 25
[ rain ] 1st Story /title/25587/rain-1st-story  3 26
[another] Xak /title/24881/another-xak  27
[es] ~Eternal Sisters~ /title/4879/es-eternal-sisters  1 28

and so on to 100…

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement