Skip to content
Advertisement

Tag: azure-databricks

Apache Spark unable to recognize columns in UTF-16 csv file

Question: Why I am getting following error on the last line of the code below, how the issue can be resolved? AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’ CSV File encoding: UTF-16 LE BOM Number of columns: 150 Rows: 5000 Language etc.: Python, Apache Spark, Azure-Databricks MySampleDataFile.txt: Code sample: Output of display(df.limit(4)) It successfully displays the content of df in

json explode – return filtered array of records

I have some JSON I have exploded however I need to filter the return based on where the “locale” is en_GB and I only wish to return that data in the dataframe. I currently have However this obviously does as it says it returns me the rows where en_GB is in locale but I actually only want it to return

Multi-processing in Azure Databricks

I have been tasked lately, to ingest JSON responses onto Databricks Delta-lake. I have to hit the REST API endpoint URL 6500 times with different parameters and pull the responses. I have tried two modules, ThreadPool and Pool from the multiprocessing library, to make each execution a little quicker. ThreadPool: How to choose the number of threads for ThreadPool, when

How to Send Emails From Databricks

I have used the code from Send email from Databricks Notebook with attachment to attempt sending code from my Databricks Community edition: I have used the following code: As you can see the code is almost identical. However, when I run the code I get the following error: Is this error also because I’m running on Databricks Community edition, as

How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?

I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download path when

Advertisement