I am using python, request or aiohttp method to get page, and BeautifulSoup4 for parsing webpage. Server HTML page uses jinja template, so when i get this page using requests or aiohttp, i get something like this:
JavaScript
x
2
1
<a href="/{{username}}" class='pr'>
2
but if you open this page using browser, code looks like this:
JavaScript
1
2
1
<a href="/gavrilka" class='pr'>
2
request code:
JavaScript
1
7
1
import requests
2
url = 'MY URL'
3
header = {"MY HEADERS"}
4
payload = {}
5
response = requests.request("GET", url, headers=headers, data = payload)
6
print(response.text.encode('utf8'))
7
aiohttp code:
JavaScript
1
10
10
1
import aiohttp
2
url = 'MY URL'
3
header = {"MY HEADERS"}
4
payload = {}
5
async with aiohttp.ClientSession() as session:
6
async with session.get(base_url, headers=headers) as resp:
7
data = await resp.text()
8
print(data)
9
await session.close()
10
How should i do to get correct page text?
Advertisement
Answer
Used selenium and phantomjs, and now it works.
JavaScript
1
11
11
1
from selenium import webdriver
2
from bs4 import BeautifulSoup
3
4
url = "https://yourlink"
5
6
driver = webdriver.PhantomJS()
7
driver.set_window_size(1024, 768) # optional
8
driver.get(url)
9
page_source = driver.page_source
10
soup = BeautifulSoup(page_source, 'lxml')
11