Skip to content
Advertisement

Python beautifulsoup – get all text separated by break tag

I have the following tables:

JavaScript

I can traverse towards this part of the HTML using the following code below:

JavaScript

I am able to get the text using the one below:

JavaScript

The output I want to arrive to is a result of all text then separated by a semicolon (;) like so: ANGARA, EDGARDO J.;ENRILE, JUAN PONCE;MAGSAYSAY JR., RAMON B.;ROXAS, MAR;GORDON, RICHARD “DICK” J.;FLAVIER, JUAN M.;MADRIGAL, M. A.;ARROYO, JOKER P.;RECTO, RALPH G.

But the code above gives me a result like below:

JavaScript

I’ve tried the replace function but to no avail.

JavaScript

and

JavaScript

Even escape rn and n like so:

JavaScript

How do I address my use case?

Advertisement

Answer

Think it is much more simpler to get that result with setting join/delimiter parameter to get_text():

JavaScript

Based on your example you will get:

JavaScript

EDIT

Based on the behaviour, extra semicolons, mentioned in your comment, I suspect that the structure of the element is different from the one in the question and has extra breaks.

In that case, I would change the strategy and recommend to:

  • add additional strip parameter to get_text():

    JavaScript
  • or use a join() from stripped_strings, what is doing almost the same:

    JavaScript
Example HTML

Added additional <br>, spaces and linebreaks to the HTML.

JavaScript
Output
JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement