Export results to excel file title and link requests python [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago. Improve this question I am training on how to scrape some data in python and here's my try: The code gets the links

Accepted Answer

You can actually do it with a single list comprehension.Basically, what you have is the right approach, you just need to create a list of lists using your list comprehension.For each match returned by soup.select, you can extract both the text and href together.Then, using the csv module, you can pass this list of lists to csv.writerows to create the CSV file for viewing in Excel or other tools, data processing, etc.You can also optionally prepend a header to the list of lists, if you want, e.g. ['Title', 'URL'].Here is a full working example:from bs4 import BeautifulSoupimport csvimport requestsurl = 'https://learndataanalysis.org/python-tutorial/page/10'r = requests.get(url)soup = BeautifulSoup(r.content, 'lxml')data = [[i.text, i['href']] for i in soup.select('h2.entry-title a')]# optional, if you want to add a header linedata.insert(0, ['Title', 'URL'])with open('output_data.csv', 'w') as output_file:    writer = csv.writer(output_file, delimiter=',', quoting=csv.QUOTE_ALL)    writer.writerows(data)Note that csv.QUOTE_ALL isn&#8217;t strictly necessary, but its often a good idea to force quoting on all fields.If you instead want to export to an XLSX format, its best to use the pandas module instead:import pandas as pddf = pd.DataFrame(data, columns=['Title', 'URL'])df.to_excel('output_data.xlsx')This will by default also export the row numbers. If you prefer to omit them, you can use the pandas.ExcelWriter class, as in this post.Edit:If you want also want to extract the dates, then you can do so with a separate list comprehension (since the date information is in a different HTML element altogether).Then, you can use zip to combine the information together.data = [[i.text, i['href']] for i in soup.select('h2.entry-title a')]dates = [i.text for i in soup.select('span.published')]data = [i + [j] for i, j in zip(data, dates)]

Advertisement

Answer