How can I webscrape information in the html element and save it to an Excel row using Beautifulsoup and any excel writer(Pandas)? [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed last year. Improve this question I'm new to python and I'm doing it for my project. Can someone help me save it to an excel file?. This is needed for multiple

Accepted Answer

I would suggest you just use openpyxl directly rather than via Pandas, this would give you much greater control over how your Excel file would be formatted.Here is how you could build up multiple row in an Excel file:import requestsfrom bs4 import BeautifulSoupfrom xlwt import Workbookimport openpyxlfrom openpyxl.styles.borders import Border, Sidefrom openpyxl.utils import get_column_letterfrom openpyxl.styles import Alignmentwebsite_url = "https://www.example.com/"res = requests.get(website_url, verify=False)soup = BeautifulSoup(res.text, 'lxml')Links = soup.find_all("a", {"class": "jobTitleLink"},)url = [tag.get('href') for tag in Links]wb = openpyxl.Workbook()# Write a header rowcolumns = [    ("SL No", 10),    ("Job Title", 25),    ("Company Name", 20),    ("Posted on", 13),    ("Closing on", 13),    ("Location", 20),    ("Description", 40),    ("Skills", 70),    ("Link Email", 30),]thin_border = Border(left=Side(style='thin'), right=Side(    style='thin'), top=Side(style='thin'), bottom=Side(style='thin'))ws = wb.activefor col_number, (value, width), in enumerate(columns, start=1):    ws.cell(column=col_number, row=1, value=value).border = thin_border    ws.column_dimensions[get_column_letter(col_number)].width = widthrow_number = 2# get the first link in the entire page# get value of the href attributefor x in url[1:5]:    res = requests.get(f'https://www.example/com/{x}', verify=False)    soup = BeautifulSoup(res.text, 'lxml')    data = []    for div_block in soup.find_all('div', class_='block', style=None):        data.append([line.strip() for line in div_block.stripped_strings])    li_fr = soup.find('li', class_="fr")    company_name = li_fr.a.text    location = list(li_fr.find_next_sibling('li').stripped_strings)[1]    # Write a data row    row = [        '',  # SL No        data[0][0],  # Job title        company_name,  # Company name        data[1][1],        data[2][1],        location,        data[4][1],        'n'.join(data[5][1:]),        data[3][1],    ]    for col_number, value in enumerate(row, start=1):        cell = ws.cell(column=col_number, row=row_number, value=value)        cell.border = thin_border        cell.alignment = Alignment(wrapText=True)    row_number += 1wb.save('output.xlsx')print('Saved all the data')This would give you an Excel sheet looking like:With extra work you can apply any styling you prefer.

Advertisement

Answer