I m trying to get creationfile metadata.
File is in: Azure Storage
Accesing data throw: Databricks
right now I m using:
file_path = my_storage_path dbutils.fs.ls(file_path)
but it returns
[FileInfo(path='path_myFile.csv', name='fileName.csv', size=437940)]
I do not have any information about creation time, there is a way to get that information ?
other solutions in Stackoverflow are refering to files that are already in databricks Does databricks dbfs support file metadata such as file/folder create date or modified date in my case we access to the data from Databricks but the data are in Azure Storage.
Advertisement
Answer
It really depends on the version of Databricks Runtime (DBR) that you’re using. For example, modification timestamp is available if you use DBR 10.2 (didn’t test with 10.0/10.1, but definitely not available on 9.1):
If you need to get that information you can use Hadoop FileSystem API via Py4j gateway, like this:
URI = sc._gateway.jvm.java.net.URI Path = sc._gateway.jvm.org.apache.hadoop.fs.Path FileSystem = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration fs = FileSystem.get(URI("/tmp"), Configuration()) status = fs.listStatus(Path('/tmp/')) for fileStatus in status: print(f"path={fileStatus.getPath()}, size={fileStatus.getLen()}, mod_time={fileStatus.getModificationTime()}")