I am looking for a solution where we can use multiple when conditions for updating a column values in pyspark. I am currently trying to achieve a solution when we have multiple conditions in spark how we can update a column. I have one dataframe in which we have three columns DATE, Flag_values, salary: After this I have to update
Databricks – Autoloader – Not Terminating?
I’m new to databricks and I have several azure blob .parquet locations I’m pulling data from and want to put through the autoloader so I can “create table … using delta location ”” in SQL in another step. (Each parquet file is in its own directory at the parent blob dir, so we will iterate over all dirs in the
Apache Spark unable to recognize columns in UTF-16 csv file
Question: Why I am getting following error on the last line of the code below, how the issue can be resolved? AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’ CSV File encoding: UTF-16 LE BOM Number of columns: 150 Rows: 5000 Language etc.: Python, Apache Spark, Azure-Databricks MySampleDataFile.txt: Code sample: Output of display(df.limit(4)) It successfully displays the content of df in
Fetch sharepoint list data into python dataframe
i have created a list in sharepoint-> my lists. Following is the URL While trying to load data from sharepoint using above URL through site() I am getting error as below Please let me know what canI do to get rid of the error and load the sharepoint list data? Answer Currently, SharePoint is not supported in Azure Databricks. Any
json explode – return filtered array of records
I have some JSON I have exploded however I need to filter the return based on where the “locale” is en_GB and I only wish to return that data in the dataframe. I currently have However this obviously does as it says it returns me the rows where en_GB is in locale but I actually only want it to return
Multi-processing in Azure Databricks
I have been tasked lately, to ingest JSON responses onto Databricks Delta-lake. I have to hit the REST API endpoint URL 6500 times with different parameters and pull the responses. I have tried two modules, ThreadPool and Pool from the multiprocessing library, to make each execution a little quicker. ThreadPool: How to choose the number of threads for ThreadPool, when
How to Send Emails From Databricks
I have used the code from Send email from Databricks Notebook with attachment to attempt sending code from my Databricks Community edition: I have used the following code: As you can see the code is almost identical. However, when I run the code I get the following error: Is this error also because I’m running on Databricks Community edition, as
File metadata such as time in Azure Storage from Databricks
I m trying to get creationfile metadata. File is in: Azure Storage Accesing data throw: Databricks right now I m using: but it returns I do not have any information about creation time, there is a way to get that information ? other solutions in Stackoverflow are refering to files that are already in databricks Does databricks dbfs support file
How to use Selenium in Databricks and accessing and moving downloaded files to mounted storage and keep Chrome and ChromeDriver versions in sync?
I’ve seen a couple of posts on using Selenium in Databricks using %shto install Chrome Drivers and Chrome. This works fine for me, but I had a lot of trouble when I needed to download a file. The file would download, but I could not find it in the filesystem in databricks. Even if I changed the download path when
How can I generate the same UUID for multiple dataframes in spark?
I have a df that I read from a file Then I give it a UUID column Now I create a view Now I create two new dataframes that take data from the view, both dataframes will use the original UUID column. All 3 dataframes will have different UUIDs, is there a way to keep them the same across each