Skip to content
Advertisement

Beautifulsoup : Unable to extract href with several conditions

I’m trying to extract every links with BeautifulSoup from the SEC website such as this one by using the code from this Github. The thing is I do not want to extract every 8-K but only the ones matching the items “2.02” within the column “Description”. So i edited the “Download.py” file and identified the following :

JavaScript

I’ve tried to add another loop to match my regex but it doesn’t work

JavaScript

Any helps would be really appreciated, thanks !

Advertisement

Answer

First find the tr that encapsulates both the a tag and the td tag that contains the items 2.02 text. Then find the url in the tr if the td actually contains the text items 2.02:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement