Skip to content
Advertisement

Tag: pyspark

convert date month year time to date format pyspark

I have a file with timestamp column. When I try to read the file with a schema designed by myself it is populating the datetime column with null. Source file has data as below where I am using the below code snippet in the above DF.display() is showing the result as null for all the inputs. However my expected output

Parse JSON string from Pyspark Dataframe

I have a nested JSON dict that I need to convert to spark dataframe. This JSON dict is present in a dataframe column. I have been trying to parse the dict present in dataframe column using “from_json” and “get_json_object”, but have been unable to read the data. Here’s the smallest snippet of the source data that I’ve been trying to

Get tables from AWS Glue using boto3

I need to harvest tables and column names from AWS Glue crawler metadata catalogue. I used boto3 but constantly getting number of 100 tables even though there are more. Setting up NextToken doesn’t help. Please help if possible. Desired results is list as follows: lst = [table_one.col_one, table_one.col_two, table_two.col_one….table_n.col_n] UPDATED code, still need to have tablename+columnname: Answer Adding sub-loop did

Advertisement