Skip to content
Advertisement

How to put each link separate in database with beautifulsoup python

Hello i would like to add each link seperate in the database. When i print out “new_lst” it displays every link so i think it wants to put the whole outcome in 1 row and now seperate. My code:

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import mysql.connector


mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  password="",
  database="webscraper"
)

req = Request("https://google.com")
html_page = urlopen(req)
main_link = "https://google.com"

soup = BeautifulSoup(html_page, "html.parser")

links = []
for link in soup.findAll('a'):
    links.append(link.get('href'))
    new_lst = ('"'.join(links))
    mycursor = mydb.cursor()

    sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
    val = (main_link, new_lst)
    mycursor.execute(sql, val)
    mydb.commit()

Advertisement

Answer

You are already iterating over with a for loop. Yes, it is putting the whole outcome in one line as you are combining them in new_lst = ('"'.join(links)), this can be avoided by just changing to inserting one item at a time that you are looping over already. Though, this approach doesn’t do any checking or validation before putting it into the database, I would add some extra checks if need be before processing the SQL command.

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import mysql.connector


mydb = mysql.connector.connect(
  host="localhost",
  user="root",
  password="",
  database="webscraper"
)

req = Request("https://google.com")
html_page = urlopen(req)
main_link = "https://google.com"
soup = BeautifulSoup(html_page, "html.parser")
mycursor = mydb.cursor()

for link in soup.findAll('a'):
        
    sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
    val = (main_link, link)
    mycursor.execute(sql, val)
    mydb.commit()

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement