Unable to fetch tabular content from a site using requests

Question

I'm trying to fetch tabular content from a webpage using the requests module. After navigating to that webpage, when I manually type 0466425389 right next to Company number and hit the search button, the table is produced accordingly. However, when I mimic the same using requests, I get the following response. I've tried with: How can I fetch tabular content

Accepted Answer

From looking at the &#8220;action&#8221; attribute on the search form, the form appears to generate a new JSESSIONID every time it is opened, and this seems to be a required attribute.  I had some success by including this in the URL.You don&#8217;t need to explicitly set the headers other than the User-Agent.I added some code: (a) to pull out the &#8220;action&#8221; attribute of the form using BeautifulSoup &#8211; you could do this with regex if you prefer, (b) to get the url from that redirection XML that you showed at the top of your question.import refrom urllib.parse import urljoinimport requestsfrom bs4 import BeautifulSoup...with requests.Session() as s:    s.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36"    # GET to get search form    req1 = s.get(link)    # Get the form action    soup = BeautifulSoup(req1.text, "lxml")    form = soup.select_one("#page_searchForm")    form_action = urljoin(link, form["action"])    # POST the form    req2 = s.post(form_action, data=payload)    # Extract the target from the redirection xml response    target = re.search('url="(.*?)"', req2.text).group(1)    # Final GET to get the search result    req3 = s.get(urljoin(link, target))    # Parse and print (some of) the result    soup = BeautifulSoup(req3.text, "lxml").body    for detail in soup.select(".company-details tr"):        columns = detail.select("td")        if columns:            print(f"{columns[0].text.strip()}: {columns[1].text.strip()}")Result:Company number: 0466.425.389Name: A en B PARTNERSAddress: Quai de Willebroeck 37: BE 1000 BruxellesMunicipality code NIS: 21004 BruxellesLegal form: Cooperative company with limited liabilityLegal situation: Normal situationActivity code (NACE-BEL)The activity code of the company is the statistical activity code in use on the date of consultation, given by the CBSO based on the main activity codes available at the Crossroads Bank for Enterprises and supplementary informations collected  from the companies: 69201 - Accountants and fiscal advisors

Advertisement

Answer