Exactly Once Semantics KAFKA Possible Claim
Regarding Confluent Blog
Exactly-once Semantics are Possible: Here’s How Kafka Does it
Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.
My understanding is that title and message of the above conflict. Am I right or not?
On my last posting it was stated by the KAFKA folks that Confluent takes care of all these things. So, am I to assume that using Kafka Connect with Confluent means I will get Exactly Once behaviour guaranteed, or not?
apache-kafka
|
show 4 more comments
Regarding Confluent Blog
Exactly-once Semantics are Possible: Here’s How Kafka Does it
Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.
My understanding is that title and message of the above conflict. Am I right or not?
On my last posting it was stated by the KAFKA folks that Confluent takes care of all these things. So, am I to assume that using Kafka Connect with Confluent means I will get Exactly Once behaviour guaranteed, or not?
apache-kafka
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
1
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
1
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using somedistinct
functions.
– cricket_007
Nov 23 '18 at 8:19
|
show 4 more comments
Regarding Confluent Blog
Exactly-once Semantics are Possible: Here’s How Kafka Does it
Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.
My understanding is that title and message of the above conflict. Am I right or not?
On my last posting it was stated by the KAFKA folks that Confluent takes care of all these things. So, am I to assume that using Kafka Connect with Confluent means I will get Exactly Once behaviour guaranteed, or not?
apache-kafka
Regarding Confluent Blog
Exactly-once Semantics are Possible: Here’s How Kafka Does it
Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.
My understanding is that title and message of the above conflict. Am I right or not?
On my last posting it was stated by the KAFKA folks that Confluent takes care of all these things. So, am I to assume that using Kafka Connect with Confluent means I will get Exactly Once behaviour guaranteed, or not?
apache-kafka
apache-kafka
edited Nov 23 '18 at 8:11
cricket_007
81.4k1142111
81.4k1142111
asked Nov 22 '18 at 19:54
thebluephantomthebluephantom
2,8033927
2,8033927
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
1
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
1
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using somedistinct
functions.
– cricket_007
Nov 23 '18 at 8:19
|
show 4 more comments
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
1
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
1
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using somedistinct
functions.
– cricket_007
Nov 23 '18 at 8:19
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
1
1
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
1
1
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using some
distinct
functions.– cricket_007
Nov 23 '18 at 8:19
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using some
distinct
functions.– cricket_007
Nov 23 '18 at 8:19
|
show 4 more comments
1 Answer
1
active
oldest
votes
There is still work to do on Client side. By their (Confluent's) own admission, the claim that Kafka does it, is a little too optimistic.
cricket_007 alludes to and confirms my point of view where exactly-once semantics are concerned.
Some Confluent Connectors do have guarantees as he points out - albeit that was well understood.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437348%2fexactly-once-semantics-kafka-possible-claim%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
There is still work to do on Client side. By their (Confluent's) own admission, the claim that Kafka does it, is a little too optimistic.
cricket_007 alludes to and confirms my point of view where exactly-once semantics are concerned.
Some Confluent Connectors do have guarantees as he points out - albeit that was well understood.
add a comment |
There is still work to do on Client side. By their (Confluent's) own admission, the claim that Kafka does it, is a little too optimistic.
cricket_007 alludes to and confirms my point of view where exactly-once semantics are concerned.
Some Confluent Connectors do have guarantees as he points out - albeit that was well understood.
add a comment |
There is still work to do on Client side. By their (Confluent's) own admission, the claim that Kafka does it, is a little too optimistic.
cricket_007 alludes to and confirms my point of view where exactly-once semantics are concerned.
Some Confluent Connectors do have guarantees as he points out - albeit that was well understood.
There is still work to do on Client side. By their (Confluent's) own admission, the claim that Kafka does it, is a little too optimistic.
cricket_007 alludes to and confirms my point of view where exactly-once semantics are concerned.
Some Confluent Connectors do have guarantees as he points out - albeit that was well understood.
edited Jan 3 at 11:57
answered Dec 9 '18 at 16:25
thebluephantomthebluephantom
2,8033927
2,8033927
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53437348%2fexactly-once-semantics-kafka-possible-claim%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Kafka Connect isn't specific to Confluent... It's up to the Source/Sink Connector to implement offset storage for reading/storing data exactly once... The blog post was not calling out Connect, though, rather the Producer/Consumer, and, by extension Kafka Streams API
– cricket_007
Nov 23 '18 at 1:47
I am aware on the Connect and Confluent point. It was simply a related question and valid one. But more importantly do I have a point on the question? I think so... @cricket_007
– thebluephantom
Nov 23 '18 at 5:51
1
It's not clear which connector you are referring to. The HDFS and S3 Sinks claim to have exactly once delivery, and sinks are easier to configure that than sources because you can track the consumed offsets, as with any Kafka consumer client... For JDBC Source, for example, you really only get Primary Key scans or timestamp tracking; if you use Bulk mode, then you are repeatedly scanning the database, and you get duplicates. Plus see issues.apache.org/jira/browse/KAFKA-6080
– cricket_007
Nov 23 '18 at 7:48
Also youtu.be/CeDivZQvdcs?t=301, and all the other KIPs and whitepapers about it...
– cricket_007
Nov 23 '18 at 7:57
1
"Here's how Kafka does it ... by adding some client properties which be understood by the messaging system and the client applications. i.e. Kafka isn't only the brokers. The clients must address the issue as well. Spark and Flink, for example, have external offset storage, and can de-dupe messages on their own using some
distinct
functions.– cricket_007
Nov 23 '18 at 8:19