Our customer is using Azure’s blob storage service to save big files so that we can work with them using an Azure online service.
We want to read and work with these files with a computing resource obtained by Azure directly without downloading them into another Azure service, like Azure Machine Learning Studio.
Until now, we are unable to access the data within the blob storage without downloading them into Azure Machine Learning Studio for working with them.
Moreover, none of the files which we want to read is a type of these:
However, they can be read with the help of a python extension.
How can we access the data within the blob storage without downloading them beforehand?
Is it possible with Azure to mount the blob storage to Machine Learning Studio anyhow?
For your information: We do not need to use Azure Machine Learning Studio, but an Azure online service with a computing resource.
This issue is related to these StackOverflow questions:
Azure Blob – Read using Python
read file from azure blob storage in python
Advertisement
Answer
This is the solution that works for me:
At first register the Blob-Storage-Container as a datastore over Azure Machine Learning Studio.
Then within an Azure Notebook:
from adlfs import AzureBlobFileSystem #pip install adlfs from azureml.core import Workspace, Datastore, Dataset from azureml.data.datapath import DataPath # Load the workspace from the saved config file ws = Workspace.from_config() ds = ws.get_default_datastore() container_name = ds.container_name storage_options = {"account_name": "Storage account name", "account_key": ds.account_key} fs = AzureBlobFileSystem(**storage_options)
Then you can use fs.ls(f"blob-storage-container-name")
and fs.glob(f"blob-storage-container-name/**/*.png")
to search through the Blob-Storage-Container.
fs.isdir('blob-storage-container-name/path/to/folder')
and fs.isfile('blob-storage-container-name/path/to/file')
is also working as expected.
You can also use os to get information about the file location and its name.
import os my_path = 'blob-storage-container-name/path/to/file' print(os.path.split(my_path))
Please note that you cannot create folders as you usually do it with the fs.mkdir()
-command!
Instead, when you create a file you can specify the location within the blob-storage-container where the file should be saved.
with fs.open('blob-storage-container-name/path/to/file/Folder1/Folder2/readme.txt', 'w') as f: f.write('working')
After you executed the command, you will see that Folder1 and Folder2 have been created.