I have the following data frame
JavaScript
x
9
1
+----+-------+
2
|item| path|
3
+----+-------+
4
| -a-| a-b-c|
5
| -b-| e-b-f|
6
| -d-|e-b-d-h|
7
| -c-| g-h-c|
8
+----+-------+
9
i want it to split path column with value of the item column in the same index
JavaScript
1
9
1
+----+--------+
2
|item| path|
3
+----+--------+
4
| -b-| [a, c]|
5
| -b-| [e, f]|
6
| -d-|[e-b, h]|
7
| -c-|[g-h, b]|
8
+----+--------+
9
i’ve used this udf function
JavaScript
1
3
1
split_udf = udf(lambda a,b: a.split(b),T.ArrayType(T.StringType()))
2
org = org.withColumn('crb_url', split_udf('path','item')[0])
3
it worked very well But, i was wondering if there’s another way to do it with pyspark function because i can’t use in anyway the “org” to join with another dataframe or save it as a delta table it gives me this error
JavaScript
1
2
1
AttributeError: 'NoneType' object has no attribute 'split'
2
Advertisement
Answer
using .fillna("")
to fill null value to “”. Like this:org = org.fillna("").withColumn('crb_url', split_udf('path','item')[0])