Skip to content
Advertisement

Delete unwanted elements of python webscraping loop results

I’m currently trying to extract text and labels (Topics) from a webpage with the following code :

JavaScript

No code problem, but here is an extract of what I’ve obtained with the previous code :

JavaScript

As I’m looking for a “clean” text result I tried to add the following code line in my loops in order to only obtain text :

JavaScript

but I got :

AttributeError: ResultSet object has no attribute ‘text’. You’re probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

I’ve also notice that for Topic result I got un unwanted URL, I would like to only obtain Forest and results (without coma between them).

Any idea of what can I add to my code to obtain clean text and topic ?

Advertisement

Answer

This happens because p is a ResultSet object. You can see this by running the following:

JavaScript

Output:

JavaScript

To get the actual text, you can address each item in each ResultSet directly:

JavaScript

Output:

JavaScript

Or even use a list comprehension:

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement