Skip to content
Advertisement

Python, extract urls from xml sitemap that contain a certain word

I’m trying to extract all urls from a sitemap that contain the word foo in the url. I’ve managed to extract all the urls but can’t figure out how to only get the ones I want. So in the below example I only want the urls for apples and pears returned.

JavaScript

Advertisement

Answer

I modify the xml to valid format (add <urls> and </urls>), save them into src.xml:

JavaScript

Use xml.etree.ElementTree to parse xml:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement