Skip to content
Advertisement

Extract address from unstructured table row with BeautifulSoup

I have a HTML document where I want to extract the address but I’m unable to. Here is the HTML document. It contains an address that is not enclosed with brackets, and a beginner like me is not able to extract it without it (e.g. with find() or similar).

JavaScript

I would like to extract the address Rose Avenue 33, 4302843 A City.

Here is my attempt but I cannot narrow it down enough.

JavaScript

Advertisement

Answer

The following code will approximate your attempt. It’s based on bs4 (BeautifulSoup), pandas and requests:

JavaScript

This will save a csv file with dr details, looking like this:

JavaScript

There are better, more elegant solutions out there. Have a look over the bs4 documentation, at https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Advertisement