PySpark reversing StringIndexer in nested array

Question

I'm using PySpark to do collaborative filtering using ALS. My original user and item id's are strings, so I used StringIndexer to convert them to numeric indices (PySpark's ALS model obliges us to do so). After I've fitted the model, I can get the top 3 recommendations for each user like so: The recs dataframe looks like so: I want

Accepted Answer

In both cases you&#8217;ll need an access to the list of labels. This can be accessed using either a StringIndexerModeluser_indexer_model = ...  # type: StringIndexerModeluser_labels = user_indexer_model.labelsproduct_indexer_model = ...  # type: StringIndexerModelproduct_labels = product_indexer_model.labelsor column metadata.For userIdIndex you can just apply IndexToString:from pyspark.ml.feature import IndexToStringuser_id_to_label = IndexToString(    inputCol="userIdIndex", outputCol="userId", labels=user_labels)user_id_to_label.transform(recs)For recommendations you&#8217;ll need either udf or expression like this:from pyspark.sql.functions import array, col, lit, structn = 3  # Same as numItemsproduct_labels_ = array(*[lit(x) for x in product_labels])recommendations = array(*[struct(    product_labels_[col("recommendations")[i]["productIdIndex"]].alias("productId"),    col("recommendations")[i]["rating"].alias("rating")) for i in range(n)])recs.withColumn("recommendations", recommendations)

Advertisement

Answer