SBT run - Pass dependencies to worker nodes for a spark job
up vote
0
down vote
favorite
I have a streaming job running using SBT.
whenever i do "sbt run", i see below error. I see it is because workers are not able to get the required kafka dependency.
Is there a way of passing the dependency jars along with sbt run command?
ERROR:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 227, 10.148.9.12, executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaContinuousDataReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
build.sbt:
name := "MyAPP"
version := "0.5"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-streaming" % "2.3.1",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.1",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.1",
"com.typesafe" % "config" % "1.3.2",
"org.apache.logging.log4j" % "log4j-api" % "2.11.0",
"org.apache.logging.log4j" % "log4j-core" % "2.11.0",
"org.apache.logging.log4j" %% "log4j-api-scala" % "11.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.2",
"ml.combust.mleap" %% "mleap-runtime" % "0.11.0",
"com.typesafe.play" % "play-json_2.11" % "2.6.10",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.11",
"net.liftweb" %% "lift-json" % "3.3.0"
)
lazy val excludeJpountz = ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val kafkaClients = "org.apache.kafka" % "kafka-clients" % "0.10.2.2" excludeAll(excludeJpountz)
logBuffered in Test := false
fork in Test := true
// Don't run tests before assembling
test in assembly := {}
retrieveManaged := true
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case x => MergeStrategy.first`enter code here`
}
unmanagedBase := baseDirectory.value / "lib"
apache-kafka sbt spark-structured-streaming
add a comment |
up vote
0
down vote
favorite
I have a streaming job running using SBT.
whenever i do "sbt run", i see below error. I see it is because workers are not able to get the required kafka dependency.
Is there a way of passing the dependency jars along with sbt run command?
ERROR:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 227, 10.148.9.12, executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaContinuousDataReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
build.sbt:
name := "MyAPP"
version := "0.5"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-streaming" % "2.3.1",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.1",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.1",
"com.typesafe" % "config" % "1.3.2",
"org.apache.logging.log4j" % "log4j-api" % "2.11.0",
"org.apache.logging.log4j" % "log4j-core" % "2.11.0",
"org.apache.logging.log4j" %% "log4j-api-scala" % "11.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.2",
"ml.combust.mleap" %% "mleap-runtime" % "0.11.0",
"com.typesafe.play" % "play-json_2.11" % "2.6.10",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.11",
"net.liftweb" %% "lift-json" % "3.3.0"
)
lazy val excludeJpountz = ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val kafkaClients = "org.apache.kafka" % "kafka-clients" % "0.10.2.2" excludeAll(excludeJpountz)
logBuffered in Test := false
fork in Test := true
// Don't run tests before assembling
test in assembly := {}
retrieveManaged := true
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case x => MergeStrategy.first`enter code here`
}
unmanagedBase := baseDirectory.value / "lib"
apache-kafka sbt spark-structured-streaming
I doubt you need all of Kafka, including the server..."org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must addprovided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683
– cricket_007
Nov 19 at 21:09
Why do yousbt run
to execute the Spark app? Why notspark-submit --packages
that would take care of this issue?
– Jacek Laskowski
Nov 25 at 19:56
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a streaming job running using SBT.
whenever i do "sbt run", i see below error. I see it is because workers are not able to get the required kafka dependency.
Is there a way of passing the dependency jars along with sbt run command?
ERROR:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 227, 10.148.9.12, executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaContinuousDataReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
build.sbt:
name := "MyAPP"
version := "0.5"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-streaming" % "2.3.1",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.1",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.1",
"com.typesafe" % "config" % "1.3.2",
"org.apache.logging.log4j" % "log4j-api" % "2.11.0",
"org.apache.logging.log4j" % "log4j-core" % "2.11.0",
"org.apache.logging.log4j" %% "log4j-api-scala" % "11.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.2",
"ml.combust.mleap" %% "mleap-runtime" % "0.11.0",
"com.typesafe.play" % "play-json_2.11" % "2.6.10",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.11",
"net.liftweb" %% "lift-json" % "3.3.0"
)
lazy val excludeJpountz = ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val kafkaClients = "org.apache.kafka" % "kafka-clients" % "0.10.2.2" excludeAll(excludeJpountz)
logBuffered in Test := false
fork in Test := true
// Don't run tests before assembling
test in assembly := {}
retrieveManaged := true
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case x => MergeStrategy.first`enter code here`
}
unmanagedBase := baseDirectory.value / "lib"
apache-kafka sbt spark-structured-streaming
I have a streaming job running using SBT.
whenever i do "sbt run", i see below error. I see it is because workers are not able to get the required kafka dependency.
Is there a way of passing the dependency jars along with sbt run command?
ERROR:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 13.0 failed 4 times, most recent failure: Lost task 1.3 in stage 13.0 (TID 227, 10.148.9.12, executor 1): java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaContinuousDataReaderFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
build.sbt:
name := "MyAPP"
version := "0.5"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-streaming" % "2.3.1",
"org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.3.1",
"org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.1",
"com.typesafe" % "config" % "1.3.2",
"org.apache.logging.log4j" % "log4j-api" % "2.11.0",
"org.apache.logging.log4j" % "log4j-core" % "2.11.0",
"org.apache.logging.log4j" %% "log4j-api-scala" % "11.0",
"org.scalatest" %% "scalatest" % "3.0.5" % "test",
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.2",
"ml.combust.mleap" %% "mleap-runtime" % "0.11.0",
"com.typesafe.play" % "play-json_2.11" % "2.6.10",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.11",
"net.liftweb" %% "lift-json" % "3.3.0"
)
lazy val excludeJpountz = ExclusionRule(organization = "net.jpountz.lz4", name = "lz4")
lazy val kafkaClients = "org.apache.kafka" % "kafka-clients" % "0.10.2.2" excludeAll(excludeJpountz)
logBuffered in Test := false
fork in Test := true
// Don't run tests before assembling
test in assembly := {}
retrieveManaged := true
assemblyMergeStrategy in assembly := {
case "META-INF/services/org.apache.spark.sql.sources.DataSourceRegister" => MergeStrategy.concat
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case x => MergeStrategy.first`enter code here`
}
unmanagedBase := baseDirectory.value / "lib"
apache-kafka sbt spark-structured-streaming
apache-kafka sbt spark-structured-streaming
edited Nov 25 at 19:56
Jacek Laskowski
42.8k16126256
42.8k16126256
asked Nov 19 at 19:18
Ramya_dj
112
112
I doubt you need all of Kafka, including the server..."org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must addprovided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683
– cricket_007
Nov 19 at 21:09
Why do yousbt run
to execute the Spark app? Why notspark-submit --packages
that would take care of this issue?
– Jacek Laskowski
Nov 25 at 19:56
add a comment |
I doubt you need all of Kafka, including the server..."org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must addprovided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683
– cricket_007
Nov 19 at 21:09
Why do yousbt run
to execute the Spark app? Why notspark-submit --packages
that would take care of this issue?
– Jacek Laskowski
Nov 25 at 19:56
I doubt you need all of Kafka, including the server...
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must add provided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683– cricket_007
Nov 19 at 21:09
I doubt you need all of Kafka, including the server...
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must add provided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683– cricket_007
Nov 19 at 21:09
Why do you
sbt run
to execute the Spark app? Why not spark-submit --packages
that would take care of this issue?– Jacek Laskowski
Nov 25 at 19:56
Why do you
sbt run
to execute the Spark app? Why not spark-submit --packages
that would take care of this issue?– Jacek Laskowski
Nov 25 at 19:56
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53381243%2fsbt-run-pass-dependencies-to-worker-nodes-for-a-spark-job%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I doubt you need all of Kafka, including the server...
"org.apache.kafka" % "kafka_2.11" % "0.10.2.2",
but you need to create an uber JAR, which I don't know if SBT does by default and you must addprovided
on the Spark Core,Streaming,SQL lines stackoverflow.com/a/28498443/2308683– cricket_007
Nov 19 at 21:09
Why do you
sbt run
to execute the Spark app? Why notspark-submit --packages
that would take care of this issue?– Jacek Laskowski
Nov 25 at 19:56