Skip to content
Advertisement

Tag: apache-kafka

PySpark 2.x: Programmatically adding Maven JAR Coordinates to Spark

The following is my PySpark startup snippet, which is pretty reliable (I’ve been using it a long time). Today I added the two Maven Coordinates shown in the spark.jars.packages option (effectively “plugging” in Kafka support). Now that normally triggers dependency downloads (performed by Spark automatically): However the plugins aren’t downloading and/or loading when I run the snippet (e.g. ./python -i

Python: how to mock a kafka topic for unit tests?

We have a message scheduler that generates a hash-key from the message attributes before placing it on a Kafka topic queue with the key. This is done for de-duplication purposes. However, I am not sure how I could possibly test this deduplication without actually setting up a local cluster and checking that it is performing as expected. Searching online for

Advertisement