Tag: databricks

How to write data to Delta Lake from Kubernetes

Our organisation runs Databricks on Azure that is used by data scientists & analysts primarily for Notebooks in order to do ad-hoc analysis and exploration. We also run Kubernetes clusters for non spark-requiring ETL workflows. We would like to use Delta Lakes as our storage layer where both Databricks an…

Databricks – How to pass accessToken to spark._sc._gateway.jvm.java.sql.DriverManager?

apache-spark databricks pyspark python

I would like to use databricks to run some custom SQL by below function, May I know how to add the “accessToken” as properties? It return: Thanks! Answer It doesn’t work because DriverManager doesn’t have the function that accepts HashMap that is created from Python dict – it has…

How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?

azure-databricks databricks pyspark python selenium

I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download…

script to get the file last modified date and file name pyspark

azure-databricks databricks pyspark python

I have a mount point location which is pointing to a blob storage where we have multiple files. We need to find the last modified date for a file along with the file name. I am using the below script and the list of files are as below: Answer If you’re using operating system-level commands to get file i…

Importing count() data for use within bokeh

bokeh databricks dataframe pyspark python

I am trying to create a visualisation using the bokeh package which I have imported into the Databricks environment. I have transformed the data from a raw data frame into something resembling the following (albeit much larger): From there, I wish to create a line graph using the bokeh package to show the num…

Filter out specific errors from Flake8 results

databricks flake8 python

We are writing notebooks in databricks. When we put them to git we want to run flake8 on them to check for new problems in the code. As databricks has some predefined variables those are undefined in the code itself. Is it possible to filter our errors like: While keeping errors like I am aware of the –…

localhost refused to connect in a databricks notebook calling the google api

azure-databricks databricks google-api google-drive-api python

I read the Google API documentation pages (Drive API, pyDrive) and created a databricks notebook to connect to the Google drive. I used the sample code in the documentation page as follow: The CRED_PATH includes the credential file path in /dbfs/FileStore/shared_uploads. The script prompts me the URL to autho…

Is there a smtp client included in Databricks platform to be able to send emails?

databricks python smtp

How can you send an email from a Databricks platform? I would like to send an email from a notebook in Databricks with Python. I’m wondering if there’s already an SMTP client already configured that I can use. I tried to do it, but didn’t succeed. Answer The answer is “no”. There…

How to avoid zipfile error with python-pptx saving files

databricks python python-3.7 python-pptx python-zipfile

I am using the python-pptx package to create a number of .pptx files from a series of dataframes. All works well with adding slides and such until it comes time to call prs.save() where “prs” is the Presentation. Doing so leads to a zipfile error re: open handles needing to be closed. I have done …