I have the following data frame
+----+-------+ |item| path| +----+-------+ | -a-| a-b-c| | -b-| e-b-f| | -d-|e-b-d-h| | -c-| g-h-c| +----+-------+
i want it to split path column with value of the item column in the same index
+----+--------+ |item| path| +----+--------+ | -b-| [a, c]| | -b-| [e, f]| | -d-|[e-b, h]| | -c-|[g-h, b]| +----+--------+
i’ve used this udf function
split_udf = udf(lambda a,b: a.split(b),T.ArrayType(T.StringType())) org = org.withColumn('crb_url', split_udf('path','item')[0])
it worked very well But, i was wondering if there’s another way to do it with pyspark function because i can’t use in anyway the “org” to join with another dataframe or save it as a delta table it gives me this error
AttributeError: 'NoneType' object has no attribute 'split'
Advertisement
Answer
using .fillna("")
to fill null value to “”. Like this:org = org.fillna("").withColumn('crb_url', split_udf('path','item')[0])