Hello everyone I’m scraping a table and separating the headers and the body of the table into separate lists but the body data has a lot of ‘/n’ and I’m trying to remove them but I cant seem to get them out.
code:
soup = BeautifulSoup(driver.page_source,'html.parser') table= soup.find("table") rows= table.find_all("tr") table_contents = [] for tr in rows: if rows.index(tr)== 0: row_cells = [ th.getText().strip() for th in tr.find_all('th') if th.getText().strip() !=''] else: row_cells = ([ tr.find('th').getText() ] if tr.find('th') else [] ) + [ td.getText().strip() for td in tr.find_all('td') if td.getText().strip() != '' ] if len(row_cells) > 1 : table_contents += [ row_cells ] table_head= table_contents[0] table_body= table_contents[1] print (table_head) print (table_body)
Results:
table head= ['Student Number', 'Student Name', 'Placement Date'] table body= ['20808456', 'Sandyn(f) nGurlow', '01/13/2023']
As you can see in the table body results ‘n’ is in the way and I can figure out how to get rid of it. As I have 100’s of samples to pull with the same issue.
Advertisement
Answer
Using str.replace()
and list comprehension:
[i.replace('n', '') for i in table_body]
Output:
['20808456', 'Sandy(f) Gurlow', '01/13/2023']