webscraping – get the first and the second value from the div tag with multiple values seperated by coma

Question

I have a tag which looks like this I would like to get only the name and surname so the "Pierre M" and the date "08/18/2018" I was trying this code But it results in taking every value from that tag so I get Answer You could go with find_all(text=True, recursive=False) to get only the first section of text in

Accepted Answer

You could go with find_all(text=True, recursive=False) to get only the first section of text in your specific case:for e in soup.select('div.small'): data.append({ 'reviewer-name':''.join(e.div.find_all(text=True, recursive=False)).split(',')[0].strip(), 'reviewe-date':''.join(e.div.find_all(text=True, recursive=False)).split(',')[-1].strip(), })Alternativ would be to check for child

with updated, save its text if needed and decompose() it from the DOM –use of walrus operator needs python 3.8 or later else use standard if statement):for e in soup.select('div.small'): if (u := e.select_one('div.rounded')): updated = u.text.split('updated')[-1].strip() u.decompose() else: updated = None data.append({ 'reviewer-name':e.div.text.split(',')[0].strip(), 'reviewe-date':e.div.text.split(',')[-1].strip(), 'reviewe-updated':updated })Examplefrom bs4 import BeautifulSouphtml = '''

Pierre M , 08/18/2018

updated 03/11/2021

Long Range 4dr Sedan (electric DD)

'''soup = BeautifulSoup(html)data = []for e in soup.select('div.small'): if (u := e.select_one('div.rounded')): updated = u.text.split('updated')[-1].strip() u.decompose() else: updated = None data.append({ 'reviewer-name':e.div.text.split(',')[0].strip(), 'reviewe-date':e.div.text.split(',')[-1].strip(), 'reviewe-updated':updated })dataOutput[{'reviewer-name': 'Pierre M', 'reviewe-date': '08/18/2018', 'reviewe-updated': '03/11/2021'}]

Advertisement

Answer

Example

Output