Skip to content

Tag: databricks

Python in Databricks

How to even start a basic query in databricks using python? The data I need is in databricks and so far I have been using Juypterhub to pull the data and modify few things. But now I want to eliminate a step of pulling the data in Jupyterhub and directly move my python code in databricks then schedule the job.

SAS Proc Transpose to Pyspark

I am trying to convert a SAS proc transpose statement to pyspark in databricks. With the following data as a sample: I would expect the result to look like this I tried using the pandas pivot_table() function with the following code however I ran into some performance issues with the size of the data: Is there a way to translate

Multi-processing in Azure Databricks

I have been tasked lately, to ingest JSON responses onto Databricks Delta-lake. I have to hit the REST API endpoint URL 6500 times with different parameters and pull the responses. I have tried two modules, ThreadPool and Pool from the multiprocessing library, to make each execution a little quicker. ThreadPool: How to choose the number of threads for ThreadPool, when

How to Send Emails From Databricks

I have used the code from Send email from Databricks Notebook with attachment to attempt sending code from my Databricks Community edition: I have used the following code: As you can see the code is almost identical. However, when I run the code I get the following error: Is this error also because I’m running on Databricks Community edition, as

How to write data to Delta Lake from Kubernetes

Our organisation runs Databricks on Azure that is used by data scientists & analysts primarily for Notebooks in order to do ad-hoc analysis and exploration. We also run Kubernetes clusters for non spark-requiring ETL workflows. We would like to use Delta Lakes as our storage layer where both Databricks and Kubernetes are able to read and write as first class