Skip to content

Tag: apache-spark

How to start a standalone cluster using pyspark?

I am using pyspark under ubuntu with python 2.7 I installed it using And trying to follow the instruction to setup spark cluster I can’t find the script start-master.sh I assume that it has to do with the fact that i installed pyspark and not regular spark I found here that i can connect a worker node t…

Rename nested field in spark dataframe

Having a dataframe df in Spark: How to rename field array_field.a to array_field.a_renamed? [Update]: .withColumnRenamed() does not work with nested fields so I tried this hacky and unsafe method: I know that setting a private attribute is not a good practice but I don’t know other way to set the schema…