Skip to content
Advertisement

Tag: pyspark

NoSuchElementException: Failed to find a default value for layers in MultiLayerPerceptronClassifier

I am having a problem running a prediction using a saved MultiLayerPerceptronClassifier model. It throws error: The original mlpc in the pipeline had layers defined: My attempts to solve it: If I run the pipeline model and do predictions without first saving the model. I works with no error. But saving and re-using the model throws this error. Any help

Calculate the minimum distance to destinations for each origin in pyspark

I have a list of origins and destinations along with their geo coordinates. I need to calculate the minimum distance for each origin to the destinations. Below is my code: I got error like below: my question is: it seems that there is something wrong with withColumn(‘Distance’, haversine_vector(F.col(‘Origin_Geo’), F.col(‘Destination_Geo’))). I do not know why. (I’m new to pyspark..) I have

Sum value between overlapping interval slices per group

I have a pyspark dataframe as below: And I want to sum only consumption on overlapping interval slices per idx: Answer You can use sequence to expand the intervals into single days, explode the list of days and then sum the consumption for each timestamp and idx: Output: Remarks: sequence includes the last value of the interval, so one day

New column comparing dates in PySpark

I am struggling to create a new column based off a simple condition comparing two dates. I have tried the following: Which yields a syntax error. I have also updated as follows: But this yields a Python error that the Column is not callable. How would I create a new column that dynamically adjusts based on whether the date comparator

How can I turn off rounding in Spark?

I have a dataframe and I’m doing this: I want to get just the first four numbers after the dot, without rounding. When I cast to DecimalType, with .cast(DataTypes.createDecimalType(20,4) or even with round function, this number is rounded to 0.4220. The only way that I found without rounding is applying the function format_number(), but this function gives me a string,

Pivotting DataFrame with fixed column names

Let’s say I have below dataframe: and by design each user has 3 rows. I want to turn my DataFrame into: I was trying to groupBy(col(‘user’)) and then pivot by ticker but it returns as many columns as different tickers there are so instead I wish I could have fixed number of columns. Is there any other Spark operator I

Advertisement