Skip to content
Advertisement

how to remove /n from list results?

Hello everyone I’m scraping a table and separating the headers and the body of the table into separate lists but the body data has a lot of ‘/n’ and I’m trying to remove them but I cant seem to get them out.

code:

soup = BeautifulSoup(driver.page_source,'html.parser')
table= soup.find("table")
rows= table.find_all("tr")
table_contents = []
for tr in rows:
    if rows.index(tr)== 0:
        row_cells = [ th.getText().strip() for th in tr.find_all('th') if th.getText().strip() !='']
    else:
        row_cells = ([ tr.find('th').getText() ] if tr.find('th') else [] ) + [ td.getText().strip() for td in tr.find_all('td') if td.getText().strip() != '' ] 
    if len(row_cells) > 1 : 
        table_contents += [ row_cells ]
table_head= table_contents[0]
table_body= table_contents[1]
print (table_head)
print (table_body)

Results:

table head= ['Student Number', 'Student Name', 'Placement Date']
table body= ['20808456', 'Sandyn(f) nGurlow', '01/13/2023']

As you can see in the table body results ‘n’ is in the way and I can figure out how to get rid of it. As I have 100’s of samples to pull with the same issue.

Advertisement

Answer

Using str.replace() and list comprehension:

[i.replace('n', '') for i in table_body]

Output:

['20808456', 'Sandy(f) Gurlow', '01/13/2023']
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement