Hello i would like to add each link seperate in the database. When i print out “new_lst” it displays every link so i think it wants to put the whole outcome in 1 row and now seperate. My code:
JavaScript
x
29
29
1
from bs4 import BeautifulSoup
2
from urllib.request import Request, urlopen
3
import mysql.connector
4
5
6
mydb = mysql.connector.connect(
7
host="localhost",
8
user="root",
9
password="",
10
database="webscraper"
11
)
12
13
req = Request("https://google.com")
14
html_page = urlopen(req)
15
main_link = "https://google.com"
16
17
soup = BeautifulSoup(html_page, "html.parser")
18
19
links = []
20
for link in soup.findAll('a'):
21
links.append(link.get('href'))
22
new_lst = ('"'.join(links))
23
mycursor = mydb.cursor()
24
25
sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
26
val = (main_link, new_lst)
27
mycursor.execute(sql, val)
28
mydb.commit()
29
Advertisement
Answer
You are already iterating over with a for loop.
Yes, it is putting the whole outcome in one line as you are combining them in new_lst = ('"'.join(links))
, this can be avoided by just changing to inserting one item at a time that you are looping over already. Though, this approach doesn’t do any checking or validation before putting it into the database, I would add some extra checks if need be before processing the SQL command.
JavaScript
1
26
26
1
from bs4 import BeautifulSoup
2
from urllib.request import Request, urlopen
3
import mysql.connector
4
5
6
mydb = mysql.connector.connect(
7
host="localhost",
8
user="root",
9
password="",
10
database="webscraper"
11
)
12
13
req = Request("https://google.com")
14
html_page = urlopen(req)
15
main_link = "https://google.com"
16
soup = BeautifulSoup(html_page, "html.parser")
17
mycursor = mydb.cursor()
18
19
for link in soup.findAll('a'):
20
21
sql = "INSERT INTO links (main_link, link_scraped) VALUES (%s, %s)"
22
val = (main_link, link)
23
mycursor.execute(sql, val)
24
mydb.commit()
25
26