Skip to content
Advertisement

How to extract the following src (iframe) from the code using python (BeautifulSoup)

I’m trying to extract the ‘src’ from this , but I’m not succeeding. This page’s is dynamic, it only appears if I search.

Site: http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001

view-source:http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001

r = requests.get("http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001")
arquivo = BeautifulSoup(r.content, "html.parser")
for link in arquivo.find_all("iframe"):
    print(link)

Advertisement

Answer

To simulate POST on this site request you can use this example:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$txtNumero"] = "3001"  # <-- this is your number
data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"

soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
print(soup.iframe["src"])

Prints:

../procuradoriacg/Leis1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf

EDIT: To get multiple pages:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"


for i in range(3000, 3010):
    data["ctl00$MainContent$txtNumero"] = i

    s = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    if s.find("iframe"):
        print(i, s.iframe["src"])
    else:
        print(i, "Not Found")

Prints:

3000 Not Found
3001 ../procuradoriacg/Leis1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf
3002 Not Found
3003 ../procuradoriacg/Leis1994/8279_LEI30031994pag0001_strDocumentoOficial.pdf
3004 Not Found
3005 Not Found
3006 ../procuradoriacg/Leis1994/8282_LEI30061994pag0001_strDocumentoOficial.pdf
3007 Not Found
3008 Not Found
3009 Not Found
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement