I am using python, request or aiohttp method to get page, and BeautifulSoup4 for parsing webpage. Server HTML page uses jinja template, so when i get this page using requests or aiohttp, i get something like this:
<a href="/{{username}}" class='pr'>
but if you open this page using browser, code looks like this:
<a href="/gavrilka" class='pr'>
request code:
import requests url = 'MY URL' header = {"MY HEADERS"} payload = {} response = requests.request("GET", url, headers=headers, data = payload) print(response.text.encode('utf8'))
aiohttp code:
import aiohttp url = 'MY URL' header = {"MY HEADERS"} payload = {} async with aiohttp.ClientSession() as session: async with session.get(base_url, headers=headers) as resp: data = await resp.text() print(data) await session.close()
How should i do to get correct page text?
Advertisement
Answer
Used selenium and phantomjs, and now it works.
from selenium import webdriver from bs4 import BeautifulSoup url = "https://yourlink" driver = webdriver.PhantomJS() driver.set_window_size(1024, 768) # optional driver.get(url) page_source = driver.page_source soup = BeautifulSoup(page_source, 'lxml')