Hello i would like to add each link seperate in the database. When i print out “new_lst” it displays every link so i think it wants to put the whole outcome in 1 row and now seperate. My code:
from bs4 import BeautifulSoup from urllib.request import Request, urlopen import mysql.connector mydb = mysql.connector.connect( host="localhost", user="root", password="", database="webscraper" ) req = Request("https://google.com") html_page = urlopen(req) main_link = "https://google.com" soup = BeautifulSoup(html_page, "html.parser") links = [] for link in soup.findAll('a'): links.append(link.get('href')) new_lst = ('"'.join(links)) mycursor = mydb.cursor() sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)" val = (main_link, new_lst) mycursor.execute(sql, val) mydb.commit()
Advertisement
Answer
You are already iterating over with a for loop.
Yes, it is putting the whole outcome in one line as you are combining them in new_lst = ('"'.join(links))
, this can be avoided by just changing to inserting one item at a time that you are looping over already. Though, this approach doesn’t do any checking or validation before putting it into the database, I would add some extra checks if need be before processing the SQL command.
from bs4 import BeautifulSoup from urllib.request import Request, urlopen import mysql.connector mydb = mysql.connector.connect( host="localhost", user="root", password="", database="webscraper" ) req = Request("https://google.com") html_page = urlopen(req) main_link = "https://google.com" soup = BeautifulSoup(html_page, "html.parser") mycursor = mydb.cursor() for link in soup.findAll('a'): sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)" val = (main_link, link) mycursor.execute(sql, val) mydb.commit()