I would like to know how to save data from my python dictionary (being created) to a CSV file at the same time (i.e. as soon as a python dictionary line is created it should be sent directly to the CSV file)
I’m using the following code :
data = []
with open('urls.txt', 'r') as inf:
for row in inf:
url = row.strip()
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
if response.ok:
try:
soup = BeautifulSoup(response.text,"html.parser")
text = soup.select_one('div.para_content_text').get_text(strip=True)
topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
tags = soup.select_one('div.article_tags_tags').get_text(strip=True)
except AttributeError:
print (" ")
data.append(
{
'text':text,
'topic': topic,
'tags':tags
}
)
pd.DataFrame(data).to_csv('text.csv', index = False, header=True)
time.sleep(3)
I would like to obtain a fisrt column for text, topic and tags Do you have an idea how to change my 2 steps code (=dictionary conception then convert it to CSV) to a dynamic one ?
Advertisement
Answer
I reshuffled your code a bit: 1. I moved data.append to the try block. Otherwise the data would not be appended. 2. I moved df.to_csv to the try block as well, which makes that the csv will be re-saved every time new data is appended to the list.
data = []
with open('urls.txt', 'r') as inf:
for row in inf:
url = row.strip()
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
if response.ok:
try:
soup = BeautifulSoup(response.text,"html.parser")
text = soup.select_one('div.para_content_text').get_text(strip=True)
topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
tags = soup.select_one('div.article_tags_tags').get_text(strip=True)
data.append(
{
'text':text,
'topic': topic,
'tags':tags
}
)
pd.DataFrame(data).to_csv('text.csv', index = False, header=True)
except AttributeError:
print (" ")
time.sleep(3)