Our organisation runs Databricks on Azure that is used by data scientists & analysts primarily for Notebooks in order to do ad-hoc analysis and exploration. We also run Kubernetes clusters for non spark-requiring ETL workflows. We would like to use Delta Lakes as our storage layer where both Databricks and Kubernetes are able to read and write as first class
Tag: databricks
Databricks – How to pass accessToken to spark._sc._gateway.jvm.java.sql.DriverManager?
I would like to use databricks to run some custom SQL by below function, May I know how to add the “accessToken” as properties? It return: Thanks! Answer It doesn’t work because DriverManager doesn’t have the function that accepts HashMap that is created from Python dict – it has the function that accepts Properties object. You can create instance of
How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?
I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download path when
script to get the file last modified date and file name pyspark
I have a mount point location which is pointing to a blob storage where we have multiple files. We need to find the last modified date for a file along with the file name. I am using the below script and the list of files are as below: Answer If you’re using operating system-level commands to get file information, then
Importing count() data for use within bokeh
I am trying to create a visualisation using the bokeh package which I have imported into the Databricks environment. I have transformed the data from a raw data frame into something resembling the following (albeit much larger): From there, I wish to create a line graph using the bokeh package to show the number of papers released per month (for
Filter out specific errors from Flake8 results
We are writing notebooks in databricks. When we put them to git we want to run flake8 on them to check for new problems in the code. As databricks has some predefined variables those are undefined in the code itself. Is it possible to filter our errors like: While keeping errors like I am aware of the –ignore parameter, but
localhost refused to connect in a databricks notebook calling the google api
I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the documentation page as follow: The CRED_PATH includes the credential file path in /dbfs/FileStore/shared_uploads. The script prompts me the URL to authorize the application but immediately after allowing access it redirects to the
Is there a smtp client included in Databricks platform to be able to send emails?
How can you send an email from a Databricks platform? I would like to send an email from a notebook in Databricks with Python. I’m wondering if there’s already an SMTP client already configured that I can use. I tried to do it, but didn’t succeed. Answer The answer is “no”. There’s no smtp client included in Databricks. But you
How to avoid zipfile error with python-pptx saving files
I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time to call prs.save() where “prs” is the Presentation. Doing so leads to a zipfile error re: open handles needing to be closed. I have done some research on the history