Skip to content
Advertisement

Tag: pyspark-dataframes

Remove duplicates from a dataframe in PySpark

I’m messing around with dataframes in pyspark 1.4 locally and am having issues getting the dropDuplicates method to work. It keeps returning the error: “AttributeError: ‘list’ object has no attribute ‘dropDuplicates'” Not quite sure why as I seem to be following the syntax in the latest documentation. Answer It is not an import problem. You simply call .dropDuplicates() on a

Advertisement