Tag: azure-databricks

how to use multiple when conditions in pyspark for updating column values

azure-databricks azure-synapse dataframe pyspark python

I am looking for a solution where we can use multiple when conditions for updating a column values in pyspark. I am currently trying to achieve a solution when we have multiple conditions in spark how we can update a column. I have one dataframe in which we have three columns DATE, Flag_values, salary: After this I have to update

Databricks – Autoloader – Not Terminating?

azure-databricks blob databricks-autoloader parquet python

I’m new to databricks and I have several azure blob .parquet locations I’m pulling data from and want to put through the autoloader so I can “create table … using delta location ”” in SQL in another step. (Each parquet file is in its own directory at the parent blob dir, so we will iterate over all dirs in the

Apache Spark unable to recognize columns in UTF-16 csv file

apache-spark azure-databricks python spark-notebook

Question: Why I am getting following error on the last line of the code below, how the issue can be resolved? AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’ CSV File encoding: UTF-16 LE BOM Number of columns: 150 Rows: 5000 Language etc.: Python, Apache Spark, Azure-Databricks MySampleDataFile.txt: Code sample: Output of display(df.limit(4)) It successfully displays the content of df in

Fetch sharepoint list data into python dataframe

azure-databricks pandas python sharepoint

i have created a list in sharepoint-> my lists. Following is the URL While trying to load data from sharepoint using above URL through site() I am getting error as below Please let me know what canI do to get rid of the error and load the sharepoint list data? Answer Currently, SharePoint is not supported in Azure Databricks. Any

json explode – return filtered array of records

azure-databricks python

I have some JSON I have exploded however I need to filter the return based on where the “locale” is en_GB and I only wish to return that data in the dataframe. I currently have However this obviously does as it says it returns me the rows where en_GB is in locale but I actually only want it to return

Multi-processing in Azure Databricks

azure azure-databricks databricks python

I have been tasked lately, to ingest JSON responses onto Databricks Delta-lake. I have to hit the REST API endpoint URL 6500 times with different parameters and pull the responses. I have tried two modules, ThreadPool and Pool from the multiprocessing library, to make each execution a little quicker. ThreadPool: How to choose the number of threads for ThreadPool, when

How to Send Emails From Databricks

azure-databricks databricks email python

I have used the code from Send email from Databricks Notebook with attachment to attempt sending code from my Databricks Community edition: I have used the following code: As you can see the code is almost identical. However, when I run the code I get the following error: Is this error also because I’m running on Databricks Community edition, as

File metadata such as time in Azure Storage from Databricks

azure-databricks azure-storage python

I m trying to get creationfile metadata. File is in: Azure Storage Accesing data throw: Databricks right now I m using: but it returns I do not have any information about creation time, there is a way to get that information ? other solutions in Stackoverflow are refering to files that are already in databricks Does databricks dbfs support file

How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?

azure-databricks databricks pyspark python selenium

I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download path when

How can I generate the same UUID for multiple dataframes in spark?

azure-databricks pyspark python

I have a df that I read from a file Then I give it a UUID column Now I create a view Now I create two new dataframes that take data from the view, both dataframes will use the original UUID column. All 3 dataframes will have different UUIDs, is there a way to keep them the same across each