You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✋ I have searched the open/closed issues and my issue is not listed.
We are seeing intermittent errors when creating a spark job. We do not see any issues on the kubernetes cluster side of things (such as resource pressures). It appears to be talking about "External scheduler cannot be instantiated" but we are not using this in our jobs.
Reproduction Code
No response
Expected behavior
No response
Actual behavior
No response
Environment & Versions
Kubernetes Version: 1.30.6
Spark Operator Version: v2.1.0-rc.0
Apache Spark Version: 3.5.3
Additional context
Below is an example stack trace we are seeing:
ERROR SparkContext: Error initializing SparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: SparkContext is stopping with exitCode 0.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MemoryStore: MemoryStore cleared
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManager: BlockManager stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManagerMaster: BlockManagerMaster stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 WARN MetricsSystem: Stopping a MetricsSystem that is not running
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: Successfully stopped SparkContext
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Traceback (most recent call last):
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/run_sync.py", line 133, in
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = ctx_helper.init_spark('sync')
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/cien_utils/ctx_helper/init.py", line 345, in init_spark
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = SparkSession.builder.appName(f"{get_coid().lower()}-{job_name.lower()}").config(conf=conf).getOrCreate()
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 497, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] sc = SparkContext.getOrCreate(sparkConf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 515, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] SparkContext(conf=conf or SparkConf())
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 203, in init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._do_init(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 296, in _do_init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._jsc = jsc or self._initialize_context(self._conf._jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 421, in _initialize_context
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return self._jvm.JavaSparkContext(jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1587, in call
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return_value = get_return_value(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] raise Py4JJavaError(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] : org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered:
Based on the logs, it looks like your driver pod cannot communicate with your cluster's control plane. There might be something blocking the network communication between the two e.g. firewall rules, Kubernetes NetworkPolicies, etc. It's hard to say without more context but I'd encourage you to look down that path.
For context, these are the specific logs that point me down that path:
Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
What happened?
We are seeing intermittent errors when creating a spark job. We do not see any issues on the kubernetes cluster side of things (such as resource pressures). It appears to be talking about "External scheduler cannot be instantiated" but we are not using this in our jobs.
Reproduction Code
No response
Expected behavior
No response
Actual behavior
No response
Environment & Versions
Additional context
Below is an example stack trace we are seeing:
ERROR SparkContext: Error initializing SparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: SparkContext is stopping with exitCode 0.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MemoryStore: MemoryStore cleared
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManager: BlockManager stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManagerMaster: BlockManagerMaster stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 WARN MetricsSystem: Stopping a MetricsSystem that is not running
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: Successfully stopped SparkContext
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Traceback (most recent call last):
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/run_sync.py", line 133, in
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = ctx_helper.init_spark('sync')
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/cien_utils/ctx_helper/init.py", line 345, in init_spark
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = SparkSession.builder.appName(f"{get_coid().lower()}-{job_name.lower()}").config(conf=conf).getOrCreate()
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 497, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] sc = SparkContext.getOrCreate(sparkConf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 515, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] SparkContext(conf=conf or SparkConf())
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 203, in init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._do_init(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 296, in _do_init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._jsc = jsc or self._initialize_context(self._conf._jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 421, in _initialize_context
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return self._jvm.JavaSparkContext(jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1587, in call
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return_value = get_return_value(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] raise Py4JJavaError(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] : org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: