Skip to content
Advertisement

Adjusting incorrect data of a CSV file data in a Pyspark dataframe

I am trying to read CSV file into a dataframe in Pyspark but I have a CSV file which has mixed data. Part of its data belongs to its adjacent column. Is there any way to modify the dataframe in python to get the output dataframe as expected.

Sample CSV

JavaScript

Expected Output

JavaScript

Advertisement

Answer

You can do this by making use of regexp_extract from pyspark.sql.functions.

My approach would be something like this:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement