Skip to content
Advertisement

How to add a constant column in a Spark DataFrame?

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). I get an error when I use withColumn as follows:

JavaScript
JavaScript

It seems that I can trick the function into working as I want by adding and subtracting one of the other columns (so they add to zero) and then adding the number I want (10 in this case):

JavaScript
JavaScript

This is supremely hacky, right? I assume there is a more legit way to do this?

Advertisement

Answer

Spark 2.2+

Spark 2.2 introduces typedLit to support Seq, Map, and Tuples (SPARK-19254) and following calls should be supported (Scala):

JavaScript

Spark 1.3+ (lit), 1.4+ (array, struct), 2.0+ (map):

The second argument for DataFrame.withColumn should be a Column so you have to use a literal:

JavaScript

If you need complex columns you can build these using blocks like array:

JavaScript

Exactly the same methods can be used in Scala.

JavaScript

To provide names for structs use either alias on each field:

JavaScript

or cast on the whole object

JavaScript

It is also possible, although slower, to use an UDF.

Note:

The same constructs can be used to pass constant arguments to UDFs or SQL functions.

Advertisement