I’ve installed Spark and components locally and I’m able to execute PySpark code in Jupyter, iPython and via spark-submit – however receiving the following WARNING’s:
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 20/12/27 07:54:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
The .py file executes but should I be worried about these warnings? Don’t want to start writing some code to later find that it doesn’t execute down the line. FYI installed PySpark locally. Here’s the code:
test.txt
:
This is a test file This is the second line - TEST This is the third line this IS THE fourth LINE - tEsT
test.py
:
import pyspark sc = pyspark.SparkContext.getOrCreate() # sc = pyspark.SparkContext(master='local[*]') # or 'local[2]' ? lines = sc.textFile("test.txt") llist = lines.collect() for line in llist: print(line) print("SparkContext version:t", sc.version) # return SparkContext version print("python version:t", sc.pythonVer) # return python version print("master URL:t", sc.master) # master URL to connect to print("path where spark is installed on worker nodes:t", sc.sparkHome) # path where spark is installed on worker nodes print("name of spark user running SparkContext:t", sc.sparkUser()) # name of spark user running SparkContext
PATHs:
export SPARK_HOME=/Users/ayubk/spark-3.0.1-bin-hadoop3.2 export PATH=$SPARK_HOME:$PATH export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH export PYSPARK_DRIVER_PYTHON="jupyter" export PYSPARK_DRIVER_PYTHON_OPTS="notebook" export PYSPARK_PYTHON=python3
bash terminal:
$ spark-3.0.1-bin-hadoop3.2/bin/spark-submit test.py
WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/ayubk/spark-3.0.1-bin-hadoop3.2/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 20/12/27 08:00:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/12/27 08:00:01 INFO SparkContext: Running Spark version 3.0.1 20/12/27 08:00:01 INFO ResourceUtils: ============================================================== 20/12/27 08:00:01 INFO ResourceUtils: Resources for spark.driver: 20/12/27 08:00:01 INFO ResourceUtils: ============================================================== 20/12/27 08:00:01 INFO SparkContext: Submitted application: test.py 20/12/27 08:00:01 INFO SecurityManager: Changing view acls to: ayubk 20/12/27 08:00:01 INFO SecurityManager: Changing modify acls to: ayubk 20/12/27 08:00:01 INFO SecurityManager: Changing view acls groups to: 20/12/27 08:00:01 INFO SecurityManager: Changing modify acls groups to: 20/12/27 08:00:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ayubk); groups with view permissions: Set(); users with modify permissions: Set(ayubk); groups with modify permissions: Set() 20/12/27 08:00:02 INFO Utils: Successfully started service 'sparkDriver' on port 51254. 20/12/27 08:00:02 INFO SparkEnv: Registering MapOutputTracker 20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMaster 20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 20/12/27 08:00:02 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 20/12/27 08:00:02 INFO DiskBlockManager: Created local directory at /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/blockmgr-a99e3df1-6d15-4158-8e09-568910c2b045 20/12/27 08:00:02 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB 20/12/27 08:00:02 INFO SparkEnv: Registering OutputCommitCoordinator 20/12/27 08:00:02 INFO Utils: Successfully started service 'SparkUI' on port 4040. 20/12/27 08:00:02 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.101:4040 20/12/27 08:00:02 INFO Executor: Starting executor ID driver on host 192.168.1.101 20/12/27 08:00:02 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51255. 20/12/27 08:00:02 INFO NettyBlockTransferService: Server created on 192.168.1.101:51255 20/12/27 08:00:02 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 20/12/27 08:00:02 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None) 20/12/27 08:00:02 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.101:51255 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.101, 51255, None) 20/12/27 08:00:02 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.101, 51255, None) 20/12/27 08:00:03 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.101, 51255, None) 20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 175.8 KiB, free 434.2 MiB) 20/12/27 08:00:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.1 KiB, free 434.2 MiB) 20/12/27 08:00:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.101:51255 (size: 27.1 KiB, free: 434.4 MiB) 20/12/27 08:00:03 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:0 20/12/27 08:00:04 INFO FileInputFormat: Total input files to process : 1 20/12/27 08:00:04 INFO SparkContext: Starting job: collect at /Users/ayubk/test.py:9 20/12/27 08:00:04 INFO DAGScheduler: Got job 0 (collect at /Users/ayubk/test.py:9) with 2 output partitions 20/12/27 08:00:04 INFO DAGScheduler: Final stage: ResultStage 0 (collect at /Users/ayubk/test.py:9) 20/12/27 08:00:04 INFO DAGScheduler: Parents of final stage: List() 20/12/27 08:00:04 INFO DAGScheduler: Missing parents: List() 20/12/27 08:00:04 INFO DAGScheduler: Submitting ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0), which has no missing parents 20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KiB, free 434.2 MiB) 20/12/27 08:00:04 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KiB, free 434.2 MiB) 20/12/27 08:00:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.101:51255 (size: 2.3 KiB, free: 434.4 MiB) 20/12/27 08:00:04 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1223 20/12/27 08:00:04 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (test.txt MapPartitionsRDD[1] at textFile at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1)) 20/12/27 08:00:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 20/12/27 08:00:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.101, executor driver, partition 0, PROCESS_LOCAL, 7367 bytes) 20/12/27 08:00:04 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.101, executor driver, partition 1, PROCESS_LOCAL, 7367 bytes) 20/12/27 08:00:04 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 20/12/27 08:00:04 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:52+52 20/12/27 08:00:04 INFO HadoopRDD: Input split: file:/Users/ayubk/test.txt:0+52 20/12/27 08:00:04 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 956 bytes result sent to driver 20/12/27 08:00:04 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1003 bytes result sent to driver 20/12/27 08:00:04 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 156 ms on 192.168.1.101 (executor driver) (1/2) 20/12/27 08:00:04 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 142 ms on 192.168.1.101 (executor driver) (2/2) 20/12/27 08:00:04 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 20/12/27 08:00:04 INFO DAGScheduler: ResultStage 0 (collect at /Users/ayubk/test.py:9) finished in 0.241 s 20/12/27 08:00:04 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job 20/12/27 08:00:04 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished 20/12/27 08:00:04 INFO DAGScheduler: Job 0 finished: collect at /Users/ayubk/test.py:9, took 0.296115 s This is a test file This is the second line - TEST This is the third line this IS THE fourth LINE - tEsT SparkContext version: 3.0.1 python version: 3.7 master URL: local[*] path where spark is installed on worker nodes: None name of spark user running SparkContext: ayubk 20/12/27 08:00:04 INFO SparkContext: Invoking stop() from shutdown hook 20/12/27 08:00:04 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040 20/12/27 08:00:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 20/12/27 08:00:04 INFO MemoryStore: MemoryStore cleared 20/12/27 08:00:04 INFO BlockManager: BlockManager stopped 20/12/27 08:00:04 INFO BlockManagerMaster: BlockManagerMaster stopped 20/12/27 08:00:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 20/12/27 08:00:04 INFO SparkContext: Successfully stopped SparkContext 20/12/27 08:00:04 INFO ShutdownHookManager: Shutdown hook called 20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c 20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-76d186fb-cf42-4898-92db-050a73f9fcb7 20/12/27 08:00:04 INFO ShutdownHookManager: Deleting directory /private/var/folders/11/13mml0s91q39ckbt584szkp00000gn/T/spark-eb41b5d5-16e2-4938-8049-8f923e6cb46c/pyspark-ee1fe6ab-a27f-4be6-b8d8-06594704da12
Edit: Tried to install Java8:
brew update brew tap adoptopenjdk/openjdk brew search jdk brew install --cask adoptopenjdk8
Although when typing this java -version
, I’m getting this:
openjdk version "13" 2019-09-17 OpenJDK Runtime Environment (build 13+33) OpenJDK 64-Bit Server VM (build 13+33, mixed mode, sharing)
Advertisement
Answer
Install Java 8 instead of Java 11, which is known to give this sort of warnings with Spark.