Readstream on Apache Spark with a bad schema is retrying 1830 times
up vote
0
down vote
favorite
In Spark structured streaming, When the incoming record from S3 doesn't match the schema I enforced with .schema(..), and if the size of the record is large (mine is 397KB), that record is retried exactly 1830 times, tested multiple times. Has anyone noticed this weird behaviour?
apache-spark apache-spark-sql spark-structured-streaming
add a comment |
up vote
0
down vote
favorite
In Spark structured streaming, When the incoming record from S3 doesn't match the schema I enforced with .schema(..), and if the size of the record is large (mine is 397KB), that record is retried exactly 1830 times, tested multiple times. Has anyone noticed this weird behaviour?
apache-spark apache-spark-sql spark-structured-streaming
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
In Spark structured streaming, When the incoming record from S3 doesn't match the schema I enforced with .schema(..), and if the size of the record is large (mine is 397KB), that record is retried exactly 1830 times, tested multiple times. Has anyone noticed this weird behaviour?
apache-spark apache-spark-sql spark-structured-streaming
In Spark structured streaming, When the incoming record from S3 doesn't match the schema I enforced with .schema(..), and if the size of the record is large (mine is 397KB), that record is retried exactly 1830 times, tested multiple times. Has anyone noticed this weird behaviour?
apache-spark apache-spark-sql spark-structured-streaming
apache-spark apache-spark-sql spark-structured-streaming
edited Nov 25 at 19:31
Jacek Laskowski
42.8k16126256
42.8k16126256
asked Nov 19 at 18:01
Naveen Cotha
1249
1249
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.
add a comment |
up vote
0
down vote
accepted
In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.
In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.
answered Nov 22 at 20:53
Naveen Cotha
1249
1249
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380283%2freadstream-on-apache-spark-with-a-bad-schema-is-retrying-1830-times%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown