I would like to know how to save data from my python dictionary (being created) to a CSV file at the same time (i.e. as soon as a python dictionary line is created it should be sent directly to the CSV file)
I’m using the following code :
JavaScript
x
28
28
1
data = []
2
3
with open('urls.txt', 'r') as inf:
4
for row in inf:
5
url = row.strip()
6
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
7
8
if response.ok:
9
try:
10
soup = BeautifulSoup(response.text,"html.parser")
11
text = soup.select_one('div.para_content_text').get_text(strip=True)
12
topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
13
tags = soup.select_one('div.article_tags_tags').get_text(strip=True)
14
15
except AttributeError:
16
print (" ")
17
18
data.append(
19
{
20
'text':text,
21
'topic': topic,
22
'tags':tags
23
}
24
)
25
26
pd.DataFrame(data).to_csv('text.csv', index = False, header=True)
27
time.sleep(3)
28
I would like to obtain a fisrt column for text, topic and tags Do you have an idea how to change my 2 steps code (=dictionary conception then convert it to CSV) to a dynamic one ?
Advertisement
Answer
I reshuffled your code a bit: 1. I moved data.append to the try block. Otherwise the data would not be appended. 2. I moved df.to_csv to the try block as well, which makes that the csv will be re-saved every time new data is appended to the list.
JavaScript
1
29
29
1
data = []
2
3
with open('urls.txt', 'r') as inf:
4
for row in inf:
5
url = row.strip()
6
response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
7
8
if response.ok:
9
try:
10
soup = BeautifulSoup(response.text,"html.parser")
11
text = soup.select_one('div.para_content_text').get_text(strip=True)
12
topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
13
tags = soup.select_one('div.article_tags_tags').get_text(strip=True)
14
15
data.append(
16
{
17
'text':text,
18
'topic': topic,
19
'tags':tags
20
}
21
)
22
23
pd.DataFrame(data).to_csv('text.csv', index = False, header=True)
24
25
except AttributeError:
26
print (" ")
27
28
time.sleep(3)
29