I have a mount point location which is pointing to a blob storage where we have multiple files. We need to find the last modified date for a file along with the file name. I am using the below script and the list of files are as below:
JavaScript
x
5
1
/mnt/schema_id=na/184000-9.jsonl
2
/mnt/schema_id=na/185000-0.jsonl
3
/mnt/schema_id=na/185000-22.jsonl
4
/mnt/schema_id=na/185000-25.jsonl
5
JavaScript
1
13
13
1
import os
2
import time
3
# Path to the file/directory
4
path = "/mnt/schema_id=na"
5
6
ti_c = os.path.getctime(path)
7
ti_m = os.path.getmtime(path)
8
9
c_ti = time.ctime(ti_c)
10
m_ti = time.ctime(ti_m)
11
12
print(f"The file located at the path {path} was created at {c_ti} and was last modified at {m_ti}")
13
Advertisement
Answer
If you’re using operating system-level commands to get file information, then you can’t access that exact location – on Databricks it’s on the Databricks file system (DBFS).
To get that on the Python level, you need to prepend the /dbfs
to the path, so it will be:
JavaScript
1
7
1
2
path = "/dbfs/mnt/schema_id=na"
3
for file_item in os.listdir(path):
4
file_path = os.path.join(path, file_item)[:5]
5
ti_c = os.path.getctime(file_path)
6
7
note the [:5]
– it’s used to strip the /dbfs
prefix from the path to make it compatible with DBFS