Skip to content
Advertisement

How to complete a CSV file with data from a Python dictionary in progress

I would like to know how to save data from my python dictionary (being created) to a CSV file at the same time (i.e. as soon as a python dictionary line is created it should be sent directly to the CSV file)

I’m using the following code :

data = []

with open('urls.txt', 'r') as inf:
    for row in inf:
        url = row.strip()
        response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
            
        if response.ok:
            try:
                soup = BeautifulSoup(response.text,"html.parser")
                text = soup.select_one('div.para_content_text').get_text(strip=True)
                topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
                tags = soup.select_one('div.article_tags_tags').get_text(strip=True)

            except AttributeError:
                print (" ")

                data.append(
                    {
                    'text':text,
                    'topic': topic,
                    'tags':tags
                    }
                )

    pd.DataFrame(data).to_csv('text.csv', index = False, header=True)
    time.sleep(3)

I would like to obtain a fisrt column for text, topic and tags Do you have an idea how to change my 2 steps code (=dictionary conception then convert it to CSV) to a dynamic one ?

Advertisement

Answer

I reshuffled your code a bit: 1. I moved data.append to the try block. Otherwise the data would not be appended. 2. I moved df.to_csv to the try block as well, which makes that the csv will be re-saved every time new data is appended to the list.

data = []

with open('urls.txt', 'r') as inf:
    for row in inf:
        url = row.strip()
        response = requests.get(url, headers={'User-agent': 'Mozilla/5.0'})
            
        if response.ok:
            try:
                soup = BeautifulSoup(response.text,"html.parser")
                text = soup.select_one('div.para_content_text').get_text(strip=True)
                topic = soup.select_one('div.article_tags_topics').get_text(strip=True)
                tags = soup.select_one('div.article_tags_tags').get_text(strip=True)

                data.append(
                    {
                    'text':text,
                    'topic': topic,
                    'tags':tags
                    }
                )

                pd.DataFrame(data).to_csv('text.csv', index = False, header=True)

            except AttributeError:
                print (" ")

    time.sleep(3)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement