I have a dataframe and I’m doing this:
df = dataframe.withColumn("test", lit(0.4219759403))
I want to get just the first four numbers after the dot, without rounding.
When I cast to DecimalType, with .cast(DataTypes.createDecimalType(20,4)
or even with round
function, this number is rounded to 0.4220
.
The only way that I found without rounding is applying the function format_number()
, but this function gives me a string, and when I cast this string to DecimalType(20,4)
, the framework rounds the number again to 0.4220
.
I need to convert this number to DecimalType(20,4)
without rounding, and I expect to see 0.4219
.
Advertisement
Answer
If you have numbers with more than 1 digit before the decimal point, the substr
is not adapt. Instead, you can use a regex to always extract the first 4 decimal digits (if present).
You can do this using regexp_extract
df = dataframe.withColumn('rounded', F.regexp_extract(F.col('test'), 'd+.d{0,4}', 0))
Example
import pyspark.sql.functions as F dataframe = spark.createDataFrame([ (0.4219759403, ), (0.4, ), (1.0, ), (0.5431293, ), (123.769859, ) ], ['test']) df = dataframe.withColumn('rounded', F.regexp_extract(F.col('test'), 'd+.d{0,4}', 0)) df.show() +------------+--------+ | test| rounded| +------------+--------+ |0.4219759403| 0.4219| | 0.4| 0.4| | 1.0| 1.0| | 0.5431293| 0.5431| | 123.769859|123.7698| +------------+--------+