Skip to content
Advertisement

Access data within the blob storage without downloading

Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service. blob storage

We want to read and work with these files with a computing resource obtained by Azure directly without downloading them into another Azure service, like Azure Machine Learning Studio. Azure ML Studio

Until now, we are unable to access the data within the blob storage without downloading them into Azure Machine Learning Studio for working with them.

Moreover, none of the files which we want to read is a type of these: file types

However, they can be read with the help of a python extension.

How can we access the data within the blob storage without downloading them beforehand?
Is it possible with Azure to mount the blob storage to Machine Learning Studio anyhow?
For your information: We do not need to use Azure Machine Learning Studio, but an Azure online service with a computing resource.

This issue is related to these StackOverflow questions:
Azure Blob – Read using Python
read file from azure blob storage in python

Advertisement

Answer

This is the solution that works for me:

At first register the Blob-Storage-Container as a datastore over Azure Machine Learning Studio.
Then within an Azure Notebook:

from adlfs import AzureBlobFileSystem #pip install adlfs
from azureml.core import Workspace, Datastore, Dataset
from azureml.data.datapath import DataPath

# Load the workspace from the saved config file
ws = Workspace.from_config()

ds = ws.get_default_datastore()
container_name = ds.container_name
storage_options = {"account_name": "Storage account name", "account_key": ds.account_key}

fs = AzureBlobFileSystem(**storage_options)

Then you can use fs.ls(f"blob-storage-container-name") and fs.glob(f"blob-storage-container-name/**/*.png") to search through the Blob-Storage-Container.

fs.isdir('blob-storage-container-name/path/to/folder') and fs.isfile('blob-storage-container-name/path/to/file') is also working as expected.

You can also use os to get information about the file location and its name.

import os
my_path = 'blob-storage-container-name/path/to/file'
print(os.path.split(my_path))

Please note that you cannot create folders as you usually do it with the fs.mkdir()-command!
Instead, when you create a file you can specify the location within the blob-storage-container where the file should be saved.

with fs.open('blob-storage-container-name/path/to/file/Folder1/Folder2/readme.txt', 'w') as f:
    f.write('working')

After you executed the command, you will see that Folder1 and Folder2 have been created.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement