Question: Why I am getting following error on the last line of the code below, how the issue can be resolved?
AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’
CSV File encoding: UTF-16 LE BOM
Number of columns: 150
Rows: 5000
Language etc.: Python, Apache Spark, Azure-Databricks
MySampleDataFile.txt:
FirstName~LastName~OrderID~City~..... Kim~Doe~1234~New York~............... Bob~Mason~456~Seattle~............... ..................
Code sample:
from pyspark.sql.types import DoubleType df = spark.read.option("encoding","UTF-16LE").option("multiline","true").csv("abfss://mycontainder@myAzureStorageAccount.dfs.core.windows.net/myFolder/MySampleDataFile.txt", sep='~', escape=""", header="true", inferSchema="false") display(df.limit(4)) df1 = df.withColumn("OrderID", df.OrderID.cast(DoubleType()))
Output of display(df.limit(4)) It successfully displays the content of df in a tabular format with column header – similar to the example here:
--------------------------------------- |FirstName|LastName|OrderID|City|.....| --------------------------------------- |Kim~Doe|1234|New York|...............| |Bob|Mason|456|Seattle|...............| |................ | ---------------------------------------
Advertisement
Answer
AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’
how the issue can be resolved?
You can try the following way to change the data type
.
df1 = df.withColumn("OrderID", df[“OrderID”].cast(DoubleType()))
OR – Alternative way,
pyspark.sql.functions.col
It will return a column depending on the name of the provided column
.
from pyspark.sql.functions import col df1 = df.withColumn("OrderID", col("OrderID").cast(DoubleType()))