Skip to content
Advertisement

How to use proxies within browser_cookie3 or any similar library that helps grab cookies?

I’m trying to populate cookies from a domain using this library browser_cookie3. It appears to be doing fine. However, the only and main problem is that I can’t figure out any way how to supply proxies within this library to get cookies from the location the proxy is from.

For example, if I use this domain www.nordstrom.com within that library and execute the script below:

import browser_cookie3

cj = browser_cookie3.chrome(domain_name='www.nordstrom.com')

for item in cj:
    if not 'internationalshippref' in item.name:
        continue
    cookie = f'{item.name}={item.value}'
    break

print(cookie)

I always get the following result as my current location is Bangladesh:

internationalshippref=preferredcountry=BD&preferredcurrency=BDT&preferredcountryname=Bangladesh

How to get cookies from the above site using proxies within browser_cookie3 or any other library?

Advertisement

Answer

The website has some basic security against scraping. But using playwright, I was able to get to their website and get the cookies without much hassle. Follow this self-explanatory small sample to start the browser with proxies enabled and get the cookies:

from playwright.sync_api import sync_playwright


def get_proxy(server, user=None, password=None):
    if user and password:
        return {'server': server, 'username': user, 'password': password}
    else:
        return {'server': server}


def get_cookies(user_agent, proxy=None):
    with sync_playwright() as p:
        browser = p.firefox.launch(headless=True, proxy=proxy)
        context = browser.new_context(no_viewport=True, user_agent=user_agent)
        page = context.new_page()
        page.goto("https://www.nordstrom.com")
        with page.expect_navigation(url="https://www.nordstrom.com/", wait_until='load'):
            pass
        cookies = context.cookies()
        browser.close()
    return cookies


proxy = get_proxy(server='http://my.server.com:8282', user='optional', password='optional')
print(get_cookies('my useragent', proxy))

Output

[{'name': 'rfx-forex-rate', 'value': 'currencyCode=USD&exchangeRate=1&quoteId=0', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1656083650, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'internationalshippref', 'value': 'preferredcountry=US&preferredcurrency=USD&preferredcountryname=United%20States', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'no-track', 'value': 'ccpa=false', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'nordstrom', 'value': 'bagcount=0&firstname=&ispinned=False&isSocial=False&shopperattr=||0|False|-1&shopperid=c38c25da4c2542fd873e7a88d0ba163f&USERNAME=', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'nui', 'value': 'firstVisit=2022-06-24T14%3A14%3A10.457Z&geoLocation=&isModified=false&lme=false', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971440050, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'session', 'value': 'FILTERSTATE=&RESULTBACK=&RETURNURL=http%3A%2F%2Fshop.nordstrom.com&SEARCHRETURNURL=http%3A%2F%2Fshop.nordstrom.com&FLSEmployeeNumber=&FLSRegisterNumber=&FLSStoreNumber=&FLSPOSType=&gctoken=&CookieDomain=&IsStoreModeActive=0', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': -1, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'shoppertoken', 'value': 'shopperToken=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJjMzhjMjVkYTRjMjU0MmZkODczZTdhODhkMGJhMTYzZiIsImF1ZCI6Imd1ZXN0IiwiaXNzIjoibm9yZHN0cm9tLWd1ZXN0LWF1dGgiLCJleHAiOjE5NzE2OTkyNTAsInJlZnJlc2giOjE2NTYwOTQ0NTAsImp0aSI6IjA2YmE5M2I3LTA3ZjMtNDRkMi1iYjc4LTQwYzBjYWFiZWI4MSIsImlhdCI6MTY1NjA4MDA1MH0.jyAHIejISNgu_cGmZh7k9R7iiB7HEwwDLc9g5ek79fz71yQn34kuwERAG4lZf3laZPUXJgakl3L-DScPLJ4FJ9j_kNUxjuw2Eg4rk7hIPvZ35kIqwtwbkrO8XjyhjgxTeXyAV5HCZa8QFO263REuI0gA1y9-MFA2fyGME3uWQruwB_q_6hfeR-Nyq8epBOuBRRqttLY6sV0sXACzRyPciqR3ykochm90DwG3H2PU4cYts6OO0wFqrnM_LhcMzD2AmiK7XegdwwKBlwzJcRqoiXu_OZFoMHPI2_eW3FFfED8A93jPyGYKmFm_Hm4RpItibGG27TJJRY0HmaO_BvqxKA', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1971397793, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'usersession', 'value': 'CookieDomain=nordstrom.com&SessionId=1029b2c9-bbc5-45db-8454-202c6271ad8f', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': -1, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'experiments', 'value': 'ExperimentId=789ff94f-d13c-4ebb-9303-433a542f3ae8', 'domain': '.nordstrom.com', 'path': '/', 'expires': 1971699250, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'Ad34bsY56', 'value': 'AxkjEJaBAQAAfjgIHurZtpHYD2QEPO5pusibS79jQ7brx8HiJfld2cp5Ie3MAUjwyyWcuJMswH8AAEB3AAAAAA|1|1|8421240d3766a87fc796cc577ffbc7cd05a87826', 'domain': '.nordstrom.com', 'path': '/', 'expires': 3233927649, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}, {'name': 'Bd34bsY56', 'value': 'A6AoEJaBAQAANCRRvsL-3aoOFIk1xtv3Y6fYMRV0SY7IjL4nIEPc1ebkqh6SAUjwyyWcuJMswH8AAEB3AAAAAA==', 'domain': 'www.nordstrom.com', 'path': '/', 'expires': 1687637001, 'httpOnly': False, 'secure': True, 'sameSite': 'None'}]

Do keep in mind that calling get_cookies repeatedly is very inefficient, since it spawns a resource-heavy browser every time. If you do need to get cookies repeatedly, I would suggest using something like multiprocessing to spawn another process, which keeps the browser alive inside it, and serves any request to get the cookies at the same time through queues.

Note:

About this line:

with page.expect_navigation(url="https://www.nordstrom.com/", wait_until='load'):
    pass

This is because the website uses automatic redirection through javascript if you visit it without setting the appropriate headers and cookies. Therefore, as soon as we enter the website the first time, we wait for a bit for the redirect to happen. Once it does, we will get the cookies we want.

Update : As per comments below, I updated the code above to pass an additional user-agent parameter.

Advertisement