Skip to content
Advertisement

Cannot Scrape All Links in Webpage Python BeautifulSoup

I am trying to use beautifulsoup to get the links off of this webpage: https://nfdc.faa.gov/nfdcApps/services/ajv5/fixes.jsp

I need the links to all of the fixes in Arizona (AZ), so I search for AZ, and when I start by hitting ‘A’ under ‘View fixes in alphabetical order:’, I am not able to scrape the links that are shown by hoving over each fix (i.e ‘AALAN’) when I use beautifulsoup in python. How can I do this? Here is my code:

page = requests.get("https://nfdc.faa.gov/nfdcApps/services/ajv5/fix_search.jsp?selectType=state&selectName=AZ&keyword=")
soup = bs(page.content)

links = []
for link in soup.findAll('a'):
    links.append(link.get('href'))

print(links)

And this is what it outputs:

['http://www.faa.gov', 'http://www.faa.gov', 'http://www.faa.gov/privacy/', 'http://www.faa.gov/web_policies/', 'http://www.faa.gov/contact/', 'http://faa.custhelp.com/', 'http://www.faa.gov/viewer_redirect.cfm?viewer=pdf&server_name=employees.faa.gov', 'http://www.faa.gov/viewer_redirect.cfm?viewer=doc&server_name=employees.faa.gov', 'http://www.faa.gov/viewer_redirect.cfm?viewer=ppt&server_name=employees.faa.gov', 'http://www.faa.gov/viewer_redirect.cfm?viewer=xls&server_name=employees.faa.gov', 'http://www.faa.gov/viewer_redirect.cfm?viewer=zip&server_name=employees.faa.gov']

The links to the fixes are not there (i.e https://nfdc.faa.gov/nfdcApps/services/ajv5/fix_detail.jsp?fix=1948394&list=yes is not in the list)

I am looking to compile a list of all the fix links for Arizona so I can aquire the data. Thanks!

Advertisement

Answer

Try:

import requests
from bs4 import BeautifulSoup

url = "https://nfdc.faa.gov/nfdcApps/services/ajv5/fix_search.jsp"

data = {
    "alphabet": "A",
    "selectType": "STATE",
    "selectName": "AZ",
    "keyword": "",
}

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

for data["alphabet"] in alphabet:
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")

    for a in soup.select('[href*="fix_detail.jsp"]'):
        print("{:<10} {}".format(a.text.strip(), a["href"]))

Prints:

...

ITEMM      fix_detail.jsp?fix=17822&list=yes
ITUCO      fix_detail.jsp?fix=56147&list=yes
IVLEC      fix_detail.jsp?fix=11787&list=yes
IVVRY      fix_detail.jsp?fix=20962&list=yes
IWANS      fix_detail.jsp?fix=1948424&list=yes
IWEDU      fix_detail.jsp?fix=13301&list=yes
IXAKE      fix_detail.jsp?fix=585636&list=yes


...
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement