Skip to content
Advertisement

Tag: apache-spark

New column comparing dates in PySpark

I am struggling to create a new column based off a simple condition comparing two dates. I have tried the following: Which yields a syntax error. I have also updated as follows: But this yields a Python error that the Column is not callable. How would I create a new column that dynamically adjusts based on whether the date comparator

How can I turn off rounding in Spark?

I have a dataframe and I’m doing this: I want to get just the first four numbers after the dot, without rounding. When I cast to DecimalType, with .cast(DataTypes.createDecimalType(20,4) or even with round function, this number is rounded to 0.4220. The only way that I found without rounding is applying the function format_number(), but this function gives me a string,

Pivotting DataFrame with fixed column names

Let’s say I have below dataframe: and by design each user has 3 rows. I want to turn my DataFrame into: I was trying to groupBy(col(‘user’)) and then pivot by ticker but it returns as many columns as different tickers there are so instead I wish I could have fixed number of columns. Is there any other Spark operator I

PySpark Incremental Count on Condition

Given a Spark dataframe with the following columns I am trying to construct an incremental/running count for each id based on when the contents of the event column evaluate to True. Here a new column called results would be created that contained the incremental count. I’ve tried using window functions but am stumped at this point. Ideally, the solution would

Advertisement