I've installed Spark and components locally and I'm able to execute PySpark code in Jupyter, iPython and via spark-submit - however receiving the following WARNING's: The .py file executes but should I be worried about these warnings? Don't want to start writing some code to later find that it doesn't execute down the line. FYI installed PySpark locally. Here's the

PySpark “illegal reflective access operation” when executed in terminal

I’ve installed Spark and components locally and I’m able to execute PySpark code in Jupyter, iPython and via spark-submit – however receiving the following WARNING’s:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 07:54:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

The .py file executes but should I be worried about these warnings? Don’t want to start writing some code to later find that it doesn’t execute down the line. FYI installed PySpark locally. Here’s the code:

test.txt:

This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT

test.py:

import pyspark

sc = pyspark.SparkContext.getOrCreate()
# sc = pyspark.SparkContext(master='local[*]') # or 'local[2]' ?

lines = sc.textFile("test.txt")
llist = lines.collect()
for line in llist:
    print(line)

print("SparkContext version:t", sc.version) # return SparkContext version
print("python version:t", sc.pythonVer) # return python version
print("master URL:t", sc.master) # master URL to connect to
print("path where spark is installed on worker nodes:t", sc.sparkHome) # path where spark is installed on worker nodes
print("name of spark user running SparkContext:t", sc.sparkUser()) # name of spark user running SparkContext

PATHs:

export SPARK_HOME=/Users/ayubk/spark-3.0.1-bin-hadoop3.2
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3

bash terminal:

$ spark-3.0.1-bin-hadoop3.2/bin/spark-submit test.py

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/27 08:00:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/27 08:00:01 INFO SparkContext: Running Spark version 3.0.1
20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO ResourceUtils: Resources for spark.driver:

20/12/27 08:00:01 INFO ResourceUtils: ==============================================================
20/12/27 08:00:01 INFO SparkContext: Submitted application: test.py
20/12/27 08:00:01 INFO SecurityManager: Changing view acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls to: ayubk
20/12/27 08:00:01 INFO SecurityManager: Changing view acls groups to:
20/12/27 08:00:01 INFO SecurityManager: Changing modify acls groups to:
20/12/27 08:00:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ayubk); groups with view permissions: Set(); users  with modify permissions: Set(ayubk); groups with modify permissions: Set()
20/12/27 08:00:02 INFO Utils: Successfully started service 'sparkDriver' on port 51254.
20/12/27 08:00:02 INFO SparkEnv: Registering MapOutputTracker
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMaster
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/12/27 08:00:02 INFO DiskBlockManager: Created local directory at /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/blockmgr-a99e3df1-6d15-4158-8e09-568910c2b045
20/12/27 08:00:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
20/12/27 08:00:02 INFO SparkEnv: Registering OutputCommitCoordinator
20/12/27 08:00:02 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/12/27 08:00:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040
20/12/27 08:00:02 INFO Executor: Starting executor ID driver on host 192.168.1.101
20/12/27 08:00:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51255.
20/12/27 08:00:02 INFO NettyBlockTransferService: Server created on 192.168.1.101:51255
20/12/27 08:00:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/12/27 08:00:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:51255 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None)
20/12/27 08:00:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 51255, None)

20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 175.8 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.1 KiB, free 434.2 MiB)
20/12/27 08:00:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:51255 (size: 27.1 KiB, free: 434.4 MiB)
20/12/27 08:00:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0
20/12/27 08:00:04 INFO FileInputFormat: Total input files to process : 1
20/12/27 08:00:04 INFO SparkContext: Starting job: collect at /Users/ayubk/test.py:9
20/12/27 08:00:04 INFO DAGScheduler: Got job 0 (collect at /Users/ayubk/test.py:9) with 2 output partitions
20/12/27 08:00:04 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /Users/ayubk/test.py:9)
20/12/27 08:00:04 INFO DAGScheduler: Parents of final stage: List()
20/12/27 08:00:04 INFO DAGScheduler: Missing parents: List()
20/12/27 08:00:04 INFO DAGScheduler: Submitting ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0), which has no missing parents
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 434.2 MiB)
20/12/27 08:00:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.101:51255 (size: 2.3 KiB, free: 434.4 MiB)
20/12/27 08:00:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1223
20/12/27 08:00:04 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
20/12/27 08:00:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
20/12/27 08:00:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.101, executor driver, partition 0, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.101, executor driver, partition 1, PROCESS_LOCAL, 7367 bytes)
20/12/27 08:00:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/12/27 08:00:04 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:52+52
20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:0+52
20/12/27 08:00:04 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 956 bytes result sent to driver
20/12/27 08:00:04 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1003 bytes result sent to driver
20/12/27 08:00:04 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 156 ms on 192.168.1.101 (executor driver) (1/2)
20/12/27 08:00:04 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 142 ms on 192.168.1.101 (executor driver) (2/2)
20/12/27 08:00:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/12/27 08:00:04 INFO DAGScheduler: ResultStage 0 (collect at /Users/ayubk/test.py:9) finished in 0.241 s
20/12/27 08:00:04 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/12/27 08:00:04 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/12/27 08:00:04 INFO DAGScheduler: Job 0 finished: collect at /Users/ayubk/test.py:9, took 0.296115 s
This is a test file
This is the second line - TEST
This is the third line
this IS THE fourth LINE - tEsT

SparkContext version:    3.0.1
python version:  3.7
master URL:  local[*]
path where spark is installed on worker nodes:   None
name of spark user running SparkContext:     ayubk
20/12/27 08:00:04 INFO SparkContext: Invoking stop() from shutdown hook
20/12/27 08:00:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
20/12/27 08:00:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/12/27 08:00:04 INFO MemoryStore: MemoryStore cleared
20/12/27 08:00:04 INFO BlockManager: BlockManager stopped
20/12/27 08:00:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/12/27 08:00:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/12/27 08:00:04 INFO SparkContext: Successfully stopped SparkContext
20/12/27 08:00:04 INFO ShutdownHookManager: Shutdown hook called
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-76d186fb-cf42-4898-92db-050a73f9fcb7
20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c/pyspark-ee1fe6ab-a27f-4be6-b8d8-06594704da12

Edit: Tried to install Java8:

brew update
brew tap adoptopenjdk/openjdk
brew search jdk
brew install --cask adoptopenjdk8

Although when typing this java -version, I’m getting this:

openjdk version "13" 2019-09-17
OpenJDK Runtime Environment (build 13+33)
OpenJDK 64-Bit Server VM (build 13+33, mixed mode, sharing)

Answer

Install Java 8 instead of Java 11, which is known to give this sort of warnings with Spark.

Advertisement

Answer