Skip to content
Advertisement

Tag: rdd

Comma separated data in rdd (pyspark) indices out of bound problem

I have a csv file which is comma separated. One of the columns has data which is again comma separated. Each row in that specific column has different no of words , hence different number of commas. When I access or perform any sort of operation like filtering (after splitting the data) it throws errors in pyspark. How shall I

Advertisement