Skip to content
Advertisement

Python – trying to get beautifulsoup to find words in a list, but it’s unable to find them

I’m working on my first project that isn’t straight out of a book but I’m having trouble getting a function to work.

The function receives a list of strings and a BeautifulSoup object and attempts to find each word in the soup.text. However, the code seems unable to find any words/strings at all even when I am certain it should be finding them. I checked and confirmed that the function is definitely receiving the list properly and that the URL works and returns what I expect it to when I do something like print(urlSoup).

The relevant code:

JavaScript

Things I have tried to fix the fact that the IF statement does not activate (presumably because it doesn’t find any words/strings from the list in the soup.text) include removing the .casefold() bit, changing soup.text to soup.content and changing the IF statement to something like

JavaScript

I also changed the parser for BeautifulSoup to lxml but that didn’t work either. At this point I’m a bit stuck and despite looking around a bit on Stack Overflow and in the bs4 documentation I haven’t managed to crack this yet. I’m sure the solution is painfully obvious but as a beginner I’m afraid that I need a bit of help here.

I hope that I have provided enough information, please feel free to ask if you need me to explain further.

Edit with info as per request by chitown88: Here’s an example of a words_list

JavaScript

I used this list with an appropriate website but the urlSoup is a bit large to post here so here’s a Google Drive link if that’s okay. Please let me know if this is not alright and you’d rather I do something else. https://drive.google.com/file/d/1bhLjNLxHOrNvA3BBfm2Qh8qDrk5fLYp7/view?usp=sharing

Advertisement

Answer

did you use try except block? the problem maybe with file encoding because I got an error with soup.txt

JavaScript

And words_count will always 0 or 1, you need to use .count() or Regex to count how many times the substring is present in it

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement