Kafka Multiple Consumers and Latency
I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.
I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?
apache-kafka
add a comment |
I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.
I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?
apache-kafka
add a comment |
I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.
I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?
apache-kafka
I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.
I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?
apache-kafka
apache-kafka
asked Nov 25 '18 at 15:26
NovemberlandNovemberland
3371523
3371523
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.
The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468986%2fkafka-multiple-consumers-and-latency%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.
The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
add a comment |
Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.
The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
add a comment |
Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.
The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition
Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.
The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition
answered Nov 25 '18 at 17:45
cricket_007cricket_007
83k1145113
83k1145113
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
add a comment |
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.
– Novemberland
Nov 25 '18 at 17:53
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network
– cricket_007
Nov 25 '18 at 17:59
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?
– Novemberland
Nov 25 '18 at 18:26
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders
– cricket_007
Nov 25 '18 at 19:39
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468986%2fkafka-multiple-consumers-and-latency%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown