Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.1.0-rc.0: External scheduler cannot be instantiated #2338

Open
1 task
benstrum opened this issue Nov 27, 2024 · 1 comment
Open
1 task

v2.1.0-rc.0: External scheduler cannot be instantiated #2338

benstrum opened this issue Nov 27, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@benstrum
Copy link

What happened?

  • ✋ I have searched the open/closed issues and my issue is not listed.
    We are seeing intermittent errors when creating a spark job. We do not see any issues on the kubernetes cluster side of things (such as resource pressures). It appears to be talking about "External scheduler cannot be instantiated" but we are not using this in our jobs.

Reproduction Code

No response

Expected behavior

No response

Actual behavior

No response

Environment & Versions

  • Kubernetes Version: 1.30.6
  • Spark Operator Version: v2.1.0-rc.0
  • Apache Spark Version: 3.5.3

Additional context

Below is an example stack trace we are seeing:

ERROR SparkContext: Error initializing SparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: SparkContext is stopping with exitCode 0.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO MemoryStore: MemoryStore cleared
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManager: BlockManager stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO BlockManagerMaster: BlockManagerMaster stopped
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 WARN MetricsSystem: Stopping a MetricsSystem that is not running
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] 24/11/27 04:08:24 INFO SparkContext: Successfully stopped SparkContext
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Traceback (most recent call last):
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/run_sync.py", line 133, in
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = ctx_helper.init_spark('sync')
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/work-dir/cien_utils/ctx_helper/init.py", line 345, in init_spark
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] spark = SparkSession.builder.appName(f"{get_coid().lower()}-{job_name.lower()}").config(conf=conf).getOrCreate()
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 497, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] sc = SparkContext.getOrCreate(sparkConf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 515, in getOrCreate
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] SparkContext(conf=conf or SparkConf())
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 203, in init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._do_init(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 296, in _do_init
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] self._jsc = jsc or self._initialize_context(self._conf._jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 421, in _initialize_context
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return self._jvm.JavaSparkContext(jconf)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1587, in call
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] return_value = get_return_value(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] raise Py4JJavaError(
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] : org.apache.spark.SparkException: External scheduler cannot be instantiated
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3204)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext.(SparkContext.scala:577)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.Gateway.invoke(Gateway.java:238)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.Thread.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.lang.reflect.InvocationTargetException
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.makeExecutorPodsAllocator(KubernetesClusterManager.scala:179)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:133)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:3198)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 14 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [szrvglbec24091711-syncopp-15rt-vhntkk44-driver] in namespace: [spark] failed.
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:159)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:187)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:141)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:92)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:96)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at scala.Option.map(Option.scala:230)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:94)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.io.IOException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:515)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:535)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:478)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:741)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.requireFromServer(BaseOperation.java:185)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 26 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:249)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 1 more
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] Caused by: java.net.ConnectException: Connection refused (Connection refused)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.AbstractPlainSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.SocksSocketImpl.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at java.base/java.net.Socket.connect(Unknown Source)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
[2024-11-27, 05:08:31 CET] {pod_manager.py:472} INFO - [spark-kubernetes-driver] ... 21 more

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

@benstrum benstrum added the kind/bug Something isn't working label Nov 27, 2024
@jacobsalway
Copy link
Member

jacobsalway commented Nov 30, 2024

Based on the logs, it looks like your driver pod cannot communicate with your cluster's control plane. There might be something blocking the network communication between the two e.g. firewall rules, Kubernetes NetworkPolicies, etc. It's hard to say without more context but I'd encourage you to look down that path.

For context, these are the specific logs that point me down that path:

  • Caused by: java.net.ConnectException: Failed to connect to kubernetes.default.svc/10.0.0.1:443
  • Caused by: java.net.ConnectException: Connection refused (Connection refused)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants