Question: Why I am getting following error on the last line of the code below, how the issue can be resolved?
AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’
CSV File encoding: UTF-16 LE BOM
Number of columns: 150
Rows: 5000
Language etc.: Python, Apache Spark, Azure-Databricks
MySampleDataFile.txt:
JavaScript
x
5
1
FirstName~LastName~OrderID~City~ ..
2
Kim~Doe~1234~New York~
3
Bob~Mason~456~Seattle~
4
5
Code sample:
JavaScript
1
6
1
from pyspark.sql.types import DoubleType
2
df = spark.read.option("encoding","UTF-16LE").option("multiline","true").csv("abfss://mycontainder@myAzureStorageAccount.dfs.core.windows.net/myFolder/MySampleDataFile.txt", sep='~', escape=""", header="true", inferSchema="false")
3
4
display(df.limit(4))
5
df1 = df.withColumn("OrderID", df.OrderID.cast(DoubleType()))
6
Output of display(df.limit(4)) It successfully displays the content of df in a tabular format with column header – similar to the example here:
JavaScript
1
8
1
---------------------------------------
2
|FirstName|LastName|OrderID|City| ..|
3
---------------------------------------
4
|Kim~Doe|1234|New York||
5
|Bob|Mason|456|Seattle||
6
| . |
7
---------------------------------------
8
Advertisement
Answer
AttributeError: ‘DataFrame’ object has no attribute ‘OrderID’
how the issue can be resolved?
You can try the following way to change the data type
.
JavaScript
1
2
1
df1 = df.withColumn("OrderID", df[“OrderID”].cast(DoubleType()))
2
OR – Alternative way,
pyspark.sql.functions.col
It will return a column depending on the name of the provided column
.
JavaScript
1
3
1
from pyspark.sql.functions import col
2
df1 = df.withColumn("OrderID", col("OrderID").cast(DoubleType()))
3