Skip to content
Advertisement

Unable to fetch tabular content from a site using requests

I’m trying to fetch tabular content from a webpage using the requests module. After navigating to that webpage, when I manually type 0466425389 right next to Company number and hit the search button, the table is produced accordingly. However, when I mimic the same using requests, I get the following response.

<?xml version='1.0' encoding='UTF-8'?>
<partial-response><redirect url="/bc9/web/catalog"></redirect></partial-response>

I’ve tried with:

import requests

link = 'https://cri.nbb.be/bc9/web/catalog?execution=e1s1'

payload = {
    'javax.faces.partial.ajax': 'true',
    'javax.faces.source': 'page_searchForm:actions:0:button',
    'javax.faces.partial.execute': 'page_searchForm',
    'javax.faces.partial.render': 'page_searchForm page_listForm pageMessagesId',
    'page_searchForm:actions:0:button': 'page_searchForm:actions:0:button',
    'page_searchForm': 'page_searchForm',
    'page_searchForm:j_id3:generated_number_2_component': '0466425389',
    'page_searchForm:j_id3:generated_name_4_component': '',
    'page_searchForm:j_id3:generated_address_zipCode_6_component': '',
    'page_searchForm:j_id3_activeIndex': '0',
    'page_searchForm:j_id2_stateholder': 'panel_param_visible;',
    'page_searchForm:j_idt133_stateholder': 'panel_param_visible;',
    'javax.faces.ViewState': 'e1s1'
}
headers = {
    'Faces-Request': 'partial/ajax',
    'X-Requested-With': 'XMLHttpRequest',
    'Origin': 'https://cri.nbb.be',
    'Accept': 'application/xml, text/xml, */*; q=0.01',
    'Accept-Encoding': 'gzip, deflate, br',
    'Host': 'cri.nbb.be',
    'Origin': 'https://cri.nbb.be',
    'Referer': 'https://cri.nbb.be/bc9/web/catalog?execution=e1s1'     
}
with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
    s.get(link)
    s.headers.update(headers)
    res = s.post(link,data=payload)
    print(res.text)

How can I fetch tabular content from that site using requests?

Advertisement

Answer

From looking at the “action” attribute on the search form, the form appears to generate a new JSESSIONID every time it is opened, and this seems to be a required attribute. I had some success by including this in the URL.

You don’t need to explicitly set the headers other than the User-Agent.

I added some code: (a) to pull out the “action” attribute of the form using BeautifulSoup – you could do this with regex if you prefer, (b) to get the url from that redirection XML that you showed at the top of your question.

import re
from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup

...

with requests.Session() as s:
    s.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"

    # GET to get search form
    req1 = s.get(link)
    # Get the form action
    soup = BeautifulSoup(req1.text, "lxml")
    form = soup.select_one("#page_searchForm")
    form_action = urljoin(link, form["action"])

    # POST the form
    req2 = s.post(form_action, data=payload)
    # Extract the target from the redirection xml response
    target = re.search('url="(.*?)"', req2.text).group(1)

    # Final GET to get the search result
    req3 = s.get(urljoin(link, target))

    # Parse and print (some of) the result
    soup = BeautifulSoup(req3.text, "lxml").body
    for detail in soup.select(".company-details tr"):
        columns = detail.select("td")
        if columns:
            print(f"{columns[0].text.strip()}: {columns[1].text.strip()}")

Result:

Company number: 0466.425.389
Name: A en B PARTNERS
Address: Quai de Willebroeck 37
: BE 1000 Bruxelles
Municipality code NIS: 21004 Bruxelles
Legal form: Cooperative company with limited liability
Legal situation: Normal situation
Activity code (NACE-BEL)
The activity code of the company is the statistical activity code in use on the date of consultation, given by the CBSO based on the main activity codes available at the Crossroads Bank for Enterprises and supplementary informations collected  from the companies: 69201 - Accountants and fiscal advisors
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement