Skip to content
Advertisement

Having some difficulty finding out how to detect in Python

from bs4 import BeautifulSoup
import requests

page = requests.get('https://www.capitol.tn.gov/house/members/').text
soup = BeautifulSoup(page, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')
header = rows[0].find_all('th')
header_text = []

for item in header:
  header_text.append(item.get_text(strip=True))
  
# check header results
print(header_text)

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  for item in td:
    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

I’m sorry if this is a stupid question, but I’m having a bit of trouble coming up with how to get the ‘a’s or ‘hrefs’ (aka the emails) to actually appear as the first item in the row. For starters, I’ve tried the insert() method, but it never actually gives me anything.

Advertisement

Answer

This does the job:

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  # print(td)
  for item in td:
    email = item.find("a", {"class": "email"})
    
    if email != None:
      email = email.get("href")
      row_text.append(email)

    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

The code basically checks if any element in a td tag has an a tag in it. If it finds an a tag, it checks if the tag belong so the class email. If it does then it gets the href from the tag and stores it inside a variable by the name email which is later appended to the row_text list.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement