Skip to content
Advertisement

How can I sift through various ‘a’ tags when scraping a website?

I’m trying to scrape athletic.net, a site that stores track and field times, to get a list for a given athlete of each season, each event that they ran, and every time they got for each event.

So far I have printed the season title and the name of each event. I’m now trying to sift through a sea of a tags to find the times. I’ve tried using find_next('a') and find_next_sibling('a') but am struggling to isolate the times.

JavaScript

So far all I can do is print all siblings, which contains all times within it. For example:

JavaScript

This output has all of the times for one event for this athlete in their most recent season.

How can I sift through to isolate only the times when there are various a tags that don’t contain times?

If I use find_next_sibling('a') it only prints None.

Advertisement

Answer

Question needs some improvment, focus and should provide expected output, it is not quite clear.

How can I sift through to isolate only the times when there are various ‘a’ tags that don’t contain times?

You could use css selectors to get all the <a> with time:

JavaScript

or more specific

JavaScript
Example
JavaScript
Output
JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement