Kafka Multiple Consumers and Latency












0















I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.



I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?










share|improve this question



























    0















    I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.



    I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?










    share|improve this question

























      0












      0








      0








      I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.



      I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?










      share|improve this question














      I want to have multiple consumers subscribed to a topic with a number of partitions equal to the number of consumers.



      I would like to ask if the latency of reading those messages from the partitions is increased linearly with the number of consumers (and partitions, since I have the same number of partitions and consumers) or non-linearly?







      apache-kafka






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 25 '18 at 15:26









      NovemberlandNovemberland

      3371523




      3371523
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.



          The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition






          share|improve this answer
























          • The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

            – Novemberland
            Nov 25 '18 at 17:53













          • If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

            – cricket_007
            Nov 25 '18 at 17:59













          • I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

            – Novemberland
            Nov 25 '18 at 18:26













          • More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

            – cricket_007
            Nov 25 '18 at 19:39













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468986%2fkafka-multiple-consumers-and-latency%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.



          The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition






          share|improve this answer
























          • The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

            – Novemberland
            Nov 25 '18 at 17:53













          • If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

            – cricket_007
            Nov 25 '18 at 17:59













          • I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

            – Novemberland
            Nov 25 '18 at 18:26













          • More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

            – cricket_007
            Nov 25 '18 at 19:39


















          1














          Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.



          The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition






          share|improve this answer
























          • The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

            – Novemberland
            Nov 25 '18 at 17:53













          • If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

            – cricket_007
            Nov 25 '18 at 17:59













          • I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

            – Novemberland
            Nov 25 '18 at 18:26













          • More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

            – cricket_007
            Nov 25 '18 at 19:39
















          1












          1








          1







          Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.



          The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition






          share|improve this answer













          Since only one consumer thread can be assigned to one partition at a time, then the total sum of all partition lags should go down faster if you have one dedicated consumer thread/application per partition.



          The latency of processing the messages should also be lower because each consumer isn't trying to rebalance over more than one partition







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 25 '18 at 17:45









          cricket_007cricket_007

          83k1145113




          83k1145113













          • The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

            – Novemberland
            Nov 25 '18 at 17:53













          • If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

            – cricket_007
            Nov 25 '18 at 17:59













          • I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

            – Novemberland
            Nov 25 '18 at 18:26













          • More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

            – cricket_007
            Nov 25 '18 at 19:39





















          • The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

            – Novemberland
            Nov 25 '18 at 17:53













          • If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

            – cricket_007
            Nov 25 '18 at 17:59













          • I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

            – Novemberland
            Nov 25 '18 at 18:26













          • More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

            – cricket_007
            Nov 25 '18 at 19:39



















          The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

          – Novemberland
          Nov 25 '18 at 17:53







          The documentation of Kafka, kafka.apache.org/documentation.html#theconsumer, declares : "If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes." In this case the relationship latency-number_of_consumer_groups is it linear or not? Thanks in advance.

          – Novemberland
          Nov 25 '18 at 17:53















          If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

          – cricket_007
          Nov 25 '18 at 17:59







          If you have different consumer groups, then having the same number of groups as partitions doesn't really offer any benefits, and it might introduce some lag, but mostly at the network level, not just from the Kafka API. The brokers will try to prefetch messages and load them in the page cache for fast access at the server-side. I doubt it's linear, though, until up to a point where you've saturated the network

          – cricket_007
          Nov 25 '18 at 17:59















          I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

          – Novemberland
          Nov 25 '18 at 18:26







          I have not understood how true broadcast is achieved. Is it achieved when all the consumer instances have different consumer groups and one partition? In this case the increase of number of different groups triggers a non-linear increaseof latency? Or is it completely distributed?

          – Novemberland
          Nov 25 '18 at 18:26















          More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

          – cricket_007
          Nov 25 '18 at 19:39







          More consumers in general of different groups cause an increase in read times, but it's less than linear, mostly because of the page cache holding the data that's being read. I have not personally ran any types of performance testing to say how much it is affected... The part of the documentation you quoted is simply a fact of any pubsub system; any subscriber to a channel/topic is going to receive each message its interested in. Not sure I understand what you mean by "true broadcast", but each unique group coordinates itself and polls those records from the partition leaders

          – cricket_007
          Nov 25 '18 at 19:39






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53468986%2fkafka-multiple-consumers-and-latency%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Costa Masnaga

          Fotorealismo

          Sidney Franklin