I have a file with timestamp column. When I try to read the file with a schema designed by myself it is populating the datetime column with null. Source file has data as below where I am using the below code snippet in the above DF.display() is showing the result as null for all the inputs. However my expected output

convert date month year time to date format pyspark

I have a file with timestamp column. When I try to read the file with a schema designed by myself it is populating the datetime column with null.

Source file has data as below

created_date
31-AUG-2016 02:48:38
31-AUG-2016 10:37:59
31-AUG-2016 23:37:51

JavaScript
​x
 
created_date
31-AUG-2016 02:48:38
31-AUG-2016 10:37:59
31-AUG-2016 23:37:51
​

where I am using the below code snippet

from pyspark.sql.types import *
Raw_Schema = StructType([StructField("created_date",DateType(),True)])

DF = spark.read.csv("csv").option("header","true").schema(Raw_schema).load("path")
DF.display()

created_date
null
null
null

JavaScript
 
from pyspark.sql.types import *
Raw_Schema = StructType([StructField("created_date",DateType(),True)])
​
DF = spark.read.csv("csv").option("header","true").schema(Raw_schema).load("path")
DF.display()
​
created_date
null
null
null
​

in the above DF.display() is showing the result as null for all the inputs. However my expected output is as below:

Created_Date
31-08-2016 
31-08-2016 
31-08-2016

JavaScript
 
Created_Date
31-08-2016 
31-08-2016 
31-08-2016 
​

Answer

You need to provide the date format because the format in the csv file is non-standard.

df = (spark.read
    .format("csv")
    .option("header","true")
    .option("dateFormat", "dd-MMM-yyyy HH:mm:ss")
    .schema(Raw_schema)
    .load("filepath")
)

df.show()
+------------+
|created_date|
+------------+
|  2016-08-31|
|  2016-08-31|
|  2016-08-31|
+------------+

JavaScript
 
df = (spark.read
    .format("csv")
    .option("header","true")
    .option("dateFormat", "dd-MMM-yyyy HH:mm:ss")
    .schema(Raw_schema)
    .load("filepath")
)
​
df.show()
+------------+
|created_date|
+------------+
|  2016-08-31|
|  2016-08-31|
|  2016-08-31|
+------------+
​

Advertisement

Answer