Skip to content
Advertisement

get contents of all azure blobs via python

I want to list all the blobs in a container and then ultimately store each blobs contents (each blob stores a csv file) into a data frame, it appears that the blob service client is the easiest way to list all the blobs, and this is what I have:

#!/usr/bin/env python3

import os
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
from pathlib import Path
from io import StringIO
import pandas as pd

def main():
    connect_str = os.environ['AZURE_CONNECT_STR']
    container   = os.environ['CONTAINER']

    print(connect_str + "n")
    blob_service_client = BlobServiceClient.from_connection_string(connect_str)
    container_client = blob_service_client.get_container_client(container)
    blob_list = container_client.list_blobs()
    for blob in blob_list:
        print("t" + blob.name)

if __name__ == "__main__":
    main()

However, in the last version of blob storage client there appears to be no method which allows me to get the actual contents of the blob, what code should I be using ? there are other clients in the Python SDK for Azure, but it getting a full list of the blobs in a container using these seems cumbersome.

Advertisement

Answer

What you would need to do is create an instance of BlobClient using the container_client and the blob’s name. You can then call download_blob method to download the blob.

Something like:

for blob in blob_list:
  print("t" + blob.name)
  blob_client = container_client.get_blob_client(blob.name)
  blob_client.download(...)
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement