I have to scrape a website, which requires a login token. The value is being replaced later via JS.
document.getElementById('token').value='aa5fedc5decbba3318deab92ffdfbd55d9a2c09ec81a464351ea449dc726ddd5';
The above code is present in the source code after </html>
tag. Like so:
</body> </html> <script> document.getElementById('token').value='aa5fedc5decbba3318deab92ffdfbd55d9a2c09ec81a464351ea449dc726ddd5'; </script>
I have to copy this value and post it to a URL via http request to get request accepted.
But I could not get this value. The code after </html>
tag is invisible when I send a request via python requests library.
My python code is here:
import requests from bs4 import BeautifulSoup session_requests = requests.session() html = session_requests.get("http://lms.uaf.edu.pk/login/index.php") html = html.text soup = BeautifulSoup(html, "lxml") print(soup)
How do I get the login token via python?
Advertisement
Answer
The following code successfully gets the login token from your website:
import requests session_requests = requests.session() html = session_requests.get("http://lms.uaf.edu.pk/login/index.php") html = html.text a = html.split("document.getElementById('token').value='")[1] b = a.split("'")[0]