Readstream on Apache Spark with a bad schema is retrying 1830 times

up vote
0
down vote

favorite

In Spark structured streaming, When the incoming record from S3 doesn't match the schema I enforced with .schema(..), and if the size of the record is large (mine is 397KB), that record is retried exactly 1830 times, tested multiple times. Has anyone noticed this weird behaviour?

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

add a comment |

up vote
0
down vote

favorite

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

add a comment |

up vote
0
down vote

favorite

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

apache-spark apache-spark-sql spark-structured-streaming

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

edited Nov 25 at 19:31

Jacek Laskowski

42.8k16126256

asked Nov 19 at 18:01

Naveen Cotha

1249

asked Nov 19 at 18:01

Naveen Cotha

1249

asked Nov 19 at 18:01

Naveen Cotha

1249

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

In my case the s3 object was a json array, and it turns out that spark-s3 json reader processes each entry of the array as an individual record in spark dataframe. So the s3 object had 1830 items, which is why the same s3 object is iterated for 1830 items with errors. However, I could not find any official documentation for this behaviour.

answered Nov 22 at 20:53

Naveen Cotha

1249

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53380283%2freadstream-on-apache-spark-with-a-bad-schema-is-retrying-1830-times%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

answered Nov 22 at 20:53

Naveen Cotha

1249

add a comment |

up vote
0
down vote

accepted

answered Nov 22 at 20:53

Naveen Cotha

1249

add a comment |

up vote
0
down vote

accepted

answered Nov 22 at 20:53

Naveen Cotha

1249

answered Nov 22 at 20:53

Naveen Cotha

1249

answered Nov 22 at 20:53

Naveen Cotha

1249

answered Nov 22 at 20:53

Naveen Cotha

1249

answered Nov 22 at 20:53

Naveen Cotha

1249

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk