Skip to content
Advertisement

Scraping from beautifulsoup in Python with a for loop just returns the last result

I’m trying to scrape data from a webpage using beautifulsoup and (ultimately) output it into a csv. As a first step in this, I’ve tried to get the text of the relevant table. I managed to do this, but the code no longer gives me the same output when I rerun it: instead of returning all 12372 records when I run the for loop, it just saves the last one.

An abbreviated version of my code is:

from bs4 import BeautifulSoup
BirthsSoup = BeautifulSoup(browser.page_source, features="html.parser")
print(BirthsSoup.prettify()) 
# this confirms that the soup has captured the page as I want it to

birthsTable = BirthsSoup.select('#t2 td')
# selects all the elements in the table I want

birthsLen = len(birthsTable)
# birthsLen: 12372

for i in range(birthsLen):
    print(birthsTable[i].prettify())
# this confirms that the beautifulsoup tag object correctly captured all of the table

for i in range(birthsLen):
    birthsText = birthsTable[i].getText()
# this was supposed to compile the text for every element in the table

But the for loop only saves the text for the last (ie 12372nd) element in the table. Do I need to do something else in order for it to save each element when it loops through? I think my previous (desired) output had the text of each element on a new line.

This is my first time using python, so apologies if I’ve made an obvious mistake.

Advertisement

Answer

What you’re doing is overwriting your birthText string each iteration, so by the time it gets to the end only the last one will be saved. To solve this, create a list and append each line:

birthsLen = len(birthsTable)
birthsText = []

for i in range(birthsLen):
    birthsText.append(birthsTable[i].getText())

Or, more concisely:

birthsText = [line.getText() for line in birthsTable]
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement