Python beautifulsoup – get all text separated by break tag

Question

I have the following tables: I can traverse towards this part of the HTML using the following code below: I am able to get the text using the one below: The output I want to arrive to is a result of all text then separated by a semicolon (;) like so: ANGARA, EDGARDO J.;ENRILE, JUAN PONCE;MAGSAYSAY JR., RAMON …

Accepted Answer

Think it is much more simpler to get that result with setting join/delimiter parameter to get_text():soup.find('td').get_text(';')Based on your example you will get:ANGARA, EDGARDO J.;ENRILE, JUAN PONCE;MAGSAYSAY JR., RAMON B.;ROXAS, MAR;GORDON, RICHARD "DICK" J.;FLAVIER, JUAN M.;MADRIGAL, M. A.;ARROYO, JOKER P.;RECTO, RALPH G.EDITBased on the behaviour, extra semicolons, mentioned in your comment, I suspect that the structure of the element is different from the one in the question and has extra breaks.In that case, I would change the strategy and recommend to:add additional strip parameter to get_text():soup.find('td').get_text(';', strip=True)or use a join() from stripped_strings, what is doing almost the same:';'.join(soup.find('td').stripped_strings)Example HTMLAdded additional
, spaces and linebreaks to the HTML.html = '''

ANGARA, EDGARDO J.
ENRILE, JUAN PONCE
MAGSAYSAY JR., RAMON B.
ROXAS, MAR
GORDON, RICHARD "DICK" J.
FLAVIER, JUAN M.
MADRIGAL, M. A.
ARROYO, JOKER P.
RECTO, RALPH G.

'''OutputANGARA, EDGARDO J.;ENRILE, JUAN PONCE;MAGSAYSAY JR., RAMON B.;ROXAS, MAR;GORDON, RICHARD "DICK" J.;FLAVIER, JUAN M.;MADRIGAL, M. A.;ARROYO, JOKER P.;RECTO, RALPH G.

Advertisement

Answer

EDIT

Example HTML

Output