Interpolation in PySpark throws java.lang.IllegalArgumentException

Question

I don&#8217;t know how to interpolate in PySpark when the DataFrame contains many columns. Let me xplain. I need to group by webID and interpolate counts values at 1 minute interval. However, when I apply the below-shown code, Error: Answer Set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1. https://spa…

Accepted Answer

Set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1.https://spark.apache.org/docs/3.0.0-preview/sql-pyspark-pandas-with-arrow.html#compatibiliy-setting-for-pyarrow&#8211;0150-and-spark-23x-24xdef resample(schema, freq, timestamp_col = "timestamp",**kwargs):    @pandas_udf(        StructType(sorted(schema, key=attrgetter("name"))),         PandasUDFType.GROUPED_MAP)    def _(pdf):        import os                                      # add this line        os.environ['ARROW_PRE_0_15_IPC_FORMAT']='1'    # add this line        pdf.set_index(timestamp_col, inplace=True)        pdf = pdf.resample(freq).interpolate()        pdf.ffill(inplace=True)        pdf.reset_index(drop=False, inplace=True)        pdf.sort_index(axis=1, inplace=True)        return pdf    return _

Advertisement

Answer