How to avoid duplicates in clickhouse table?












0















I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘










share|improve this question



























    0















    I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



    CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



    insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
    insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



    select * from sample.tmp_api_logs;
    ┌─id─┬──EventDate─┐
    │ 1 │ 2018-11-23 │
    │ 2 │ 2018-11-23 │
    └────┴────────────┘
    ┌─id─┬──EventDate─┐
    │ 1 │ 2018-11-23 │
    │ 2 │ 2018-11-23 │
    └────┴────────────┘










    share|improve this question

























      0












      0








      0








      I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



      CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



      select * from sample.tmp_api_logs;
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘










      share|improve this question














      I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



      CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



      select * from sample.tmp_api_logs;
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘







      clickhouse






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 7:47









      user3383468user3383468

      111




      111
























          2 Answers
          2






          active

          oldest

          votes


















          0














          Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






          share|improve this answer































            0














            If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



            Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






              share|improve this answer




























                0














                Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






                share|improve this answer


























                  0












                  0








                  0







                  Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






                  share|improve this answer













                  Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 23 '18 at 13:37









                  AmosAmos

                  1,58531029




                  1,58531029

























                      0














                      If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                      Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                      share|improve this answer




























                        0














                        If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                        Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                        share|improve this answer


























                          0












                          0








                          0







                          If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                          Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                          share|improve this answer













                          If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                          Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 10 '18 at 8:48









                          Ivan BlinkovIvan Blinkov

                          1,6081016




                          1,6081016






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Costa Masnaga

                              Fotorealismo

                              Sidney Franklin