Pyspark: regex search with text in a list withColumn

Question

I am new to Spark and I am having a silly "what's-the-best-approach" issue. Basically, I have a map(dict) that I would like to loop over. During each iteration, I want to search through a column in a spark dataframe using rlike regex and assign the key of the dict to a new column using withColumn The data sample is shown

Accepted Answer

With you situation, I will turn the map into a dataframe. I assume the resultant dataframe will be relatively small. Use abroadcast join. What this does is that it distribute the small df to each worker node avoiding a shuffle.#Create df from maps    df_ref = spark.createDataFrame(maps.items(), schema =('class','item_bought')).withColumn('item_bought',explode('item_bought')).withColumn('item_bought', initcap('item_bought'))#Broadcast join        df.join(broadcast(df_ref), how='left', on='item_bought').show()+-----------+--------------------+---------+|item_bought|                  id|    class|+-----------+--------------------+---------+|       Soap|uiq7Zq52Bww4pZXc3xri|  laundry||  Detergent|fpJatwxTeObcbuJH25UI|  laundry||       Milk|MdK1q5gBygIGFYyvbz8J|groceries|+-----------+--------------------+---------+Following your editdf_ref = spark.createDataFrame(maps.items(), schema =('class','item_bought1')).withColumn('item_bought1',explode('item_bought1')).withColumn('item_bought1', initcap('item_bought1'))df.withColumn('item_bought1',regexp_extract('item_bought','^[A-Za-z]+',0)).join(broadcast(df_ref), how='left', on='item_bought1').show()+------------+--------------------+----------------+---------+|item_bought1|                  id|     item_bought|    class|+------------+--------------------+----------------+---------+|        Soap|uiq7Zq52Bww4pZXc3xri|            Soap|  laundry||   Detergent|fpJatwxTeObcbuJH25UI|       Detergent|  laundry||        Milk|MdK1q5gBygIGFYyvbz8J|            Milk|groceries||        Soap|uiq7Zq52Bww4pZXc3xri|   Soap -&ju10kg|  laundry||   Detergent|fpJatwxTeObcbuJH25UI|Detergent x.ju2i|  laundry||        Milk|MdK1q5gBygIGFYyvbz8J|            Milk|groceries|+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;+

Advertisement

Answer