Our organisation runs Databricks on Azure that is used by data scientists & analysts primarily for Notebooks in order to do ad-hoc analysis and exploration. We also run Kubernetes clusters for non spark-requiring ETL workflows. We would like to use Delta Lakes as our storage layer where both Databricks an…
Tag: databricks
Databricks – How to pass accessToken to spark._sc._gateway.jvm.java.sql.DriverManager?
I would like to use databricks to run some custom SQL by below function, May I know how to add the “accessToken” as properties? It return: Thanks! Answer It doesn’t work because DriverManager doesn’t have the function that accepts HashMap that is created from Python dict – it has…
How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?
I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download…
script to get the file last modified date and file name pyspark
I have a mount point location which is pointing to a blob storage where we have multiple files. We need to find the last modified date for a file along with the file name. I am using the below script and the list of files are as below: Answer If you’re using operating system-level commands to get file i…
Importing count() data for use within bokeh
I am trying to create a visualisation using the bokeh package which I have imported into the Databricks environment. I have transformed the data from a raw data frame into something resembling the following (albeit much larger): From there, I wish to create a line graph using the bokeh package to show the num…
Filter out specific errors from Flake8 results
We are writing notebooks in databricks. When we put them to git we want to run flake8 on them to check for new problems in the code. As databricks has some predefined variables those are undefined in the code itself. Is it possible to filter our errors like: While keeping errors like I am aware of the –…
localhost refused to connect in a databricks notebook calling the google api
I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the documentation page as follow: The CRED_PATH includes the credential file path in /dbfs/FileStore/shared_uploads. The script prompts me the URL to autho…
Is there a smtp client included in Databricks platform to be able to send emails?
How can you send an email from a Databricks platform? I would like to send an email from a notebook in Databricks with Python. I’m wondering if there’s already an SMTP client already configured that I can use. I tried to do it, but didn’t succeed. Answer The answer is “no”. There…
How to avoid zipfile error with python-pptx saving files
I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time to call prs.save() where “prs” is the Presentation. Doing so leads to a zipfile error re: open handles needing to be closed. I have done …