Hello everyone I’m scraping a table and separating the headers and the body of the table into separate lists but the body data has a lot of ‘/n’ and I’m trying to remove them but I cant seem to get them out.
code:
JavaScript
x
16
16
1
soup = BeautifulSoup(driver.page_source,'html.parser')
2
table= soup.find("table")
3
rows= table.find_all("tr")
4
table_contents = []
5
for tr in rows:
6
if rows.index(tr)== 0:
7
row_cells = [ th.getText().strip() for th in tr.find_all('th') if th.getText().strip() !='']
8
else:
9
row_cells = ([ tr.find('th').getText() ] if tr.find('th') else [] ) + [ td.getText().strip() for td in tr.find_all('td') if td.getText().strip() != '' ]
10
if len(row_cells) > 1 :
11
table_contents += [ row_cells ]
12
table_head= table_contents[0]
13
table_body= table_contents[1]
14
print (table_head)
15
print (table_body)
16
Results:
JavaScript
1
3
1
table head= ['Student Number', 'Student Name', 'Placement Date']
2
table body= ['20808456', 'Sandyn(f) nGurlow', '01/13/2023']
3
As you can see in the table body results ‘n’ is in the way and I can figure out how to get rid of it. As I have 100’s of samples to pull with the same issue.
Advertisement
Answer
Using str.replace()
and list comprehension:
JavaScript
1
2
1
[i.replace('n', '') for i in table_body]
2
Output:
JavaScript
1
2
1
['20808456', 'Sandy(f) Gurlow', '01/13/2023']
2