Detect changes during bulk indexing












2














We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?



I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.










share|improve this question



























    2














    We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?



    I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.










    share|improve this question

























      2












      2








      2







      We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?



      I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.










      share|improve this question













      We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?



      I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.







      elasticsearch






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 at 15:28









      bm1729

      1,277824




      1,277824
























          1 Answer
          1






          active

          oldest

          votes


















          3














          You can use the noop detection when checking the result of your bulk queries.



          When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)



          # Say the document is indexed
          PUT test/doc/1
          {
          "test": "123"
          }

          # Now you want to bulk update it
          POST test/doc/_bulk
          {"update":{"_id": "1"}}
          {"doc":{"test":"123"}} <-- this will yield `result: noop`
          {"update":{"_id": "1"}}
          {"doc":{"test":"1234"}} <-- this will yield `result: updated`
          {"update":{"_id": "2"}}
          {"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`


          Result:



          {
          "took" : 6,
          "errors" : false,
          "items" : [
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 2,
          "result" : "noop", <-- see "noop"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "status" : 200
          }
          },
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 3,
          "result" : "updated", <-- see "updated"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 200
          }
          },
          {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "2",
          "_version" : 1,
          "result" : "created", <-- see "created"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 0,
          "_primary_term" : 1
          }
          ]
          }


          As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created






          share|improve this answer























          • What will the result field be in the case that the document doesn't already exist?
            – bm1729
            Nov 20 at 15:52










          • It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
            – Val
            Nov 20 at 15:53










          • Also, is it able to give you the diff (what was there and what you have changed it to)?
            – bm1729
            Nov 20 at 15:53










          • So far we've been using index.
            – bm1729
            Nov 20 at 15:53










          • No ES will not give you the diff. See my updated answer to see if it fits your use case.
            – Val
            Nov 20 at 15:56











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396315%2fdetect-changes-during-bulk-indexing%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          You can use the noop detection when checking the result of your bulk queries.



          When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)



          # Say the document is indexed
          PUT test/doc/1
          {
          "test": "123"
          }

          # Now you want to bulk update it
          POST test/doc/_bulk
          {"update":{"_id": "1"}}
          {"doc":{"test":"123"}} <-- this will yield `result: noop`
          {"update":{"_id": "1"}}
          {"doc":{"test":"1234"}} <-- this will yield `result: updated`
          {"update":{"_id": "2"}}
          {"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`


          Result:



          {
          "took" : 6,
          "errors" : false,
          "items" : [
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 2,
          "result" : "noop", <-- see "noop"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "status" : 200
          }
          },
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 3,
          "result" : "updated", <-- see "updated"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 200
          }
          },
          {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "2",
          "_version" : 1,
          "result" : "created", <-- see "created"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 0,
          "_primary_term" : 1
          }
          ]
          }


          As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created






          share|improve this answer























          • What will the result field be in the case that the document doesn't already exist?
            – bm1729
            Nov 20 at 15:52










          • It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
            – Val
            Nov 20 at 15:53










          • Also, is it able to give you the diff (what was there and what you have changed it to)?
            – bm1729
            Nov 20 at 15:53










          • So far we've been using index.
            – bm1729
            Nov 20 at 15:53










          • No ES will not give you the diff. See my updated answer to see if it fits your use case.
            – Val
            Nov 20 at 15:56
















          3














          You can use the noop detection when checking the result of your bulk queries.



          When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)



          # Say the document is indexed
          PUT test/doc/1
          {
          "test": "123"
          }

          # Now you want to bulk update it
          POST test/doc/_bulk
          {"update":{"_id": "1"}}
          {"doc":{"test":"123"}} <-- this will yield `result: noop`
          {"update":{"_id": "1"}}
          {"doc":{"test":"1234"}} <-- this will yield `result: updated`
          {"update":{"_id": "2"}}
          {"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`


          Result:



          {
          "took" : 6,
          "errors" : false,
          "items" : [
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 2,
          "result" : "noop", <-- see "noop"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "status" : 200
          }
          },
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 3,
          "result" : "updated", <-- see "updated"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 200
          }
          },
          {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "2",
          "_version" : 1,
          "result" : "created", <-- see "created"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 0,
          "_primary_term" : 1
          }
          ]
          }


          As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created






          share|improve this answer























          • What will the result field be in the case that the document doesn't already exist?
            – bm1729
            Nov 20 at 15:52










          • It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
            – Val
            Nov 20 at 15:53










          • Also, is it able to give you the diff (what was there and what you have changed it to)?
            – bm1729
            Nov 20 at 15:53










          • So far we've been using index.
            – bm1729
            Nov 20 at 15:53










          • No ES will not give you the diff. See my updated answer to see if it fits your use case.
            – Val
            Nov 20 at 15:56














          3












          3








          3






          You can use the noop detection when checking the result of your bulk queries.



          When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)



          # Say the document is indexed
          PUT test/doc/1
          {
          "test": "123"
          }

          # Now you want to bulk update it
          POST test/doc/_bulk
          {"update":{"_id": "1"}}
          {"doc":{"test":"123"}} <-- this will yield `result: noop`
          {"update":{"_id": "1"}}
          {"doc":{"test":"1234"}} <-- this will yield `result: updated`
          {"update":{"_id": "2"}}
          {"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`


          Result:



          {
          "took" : 6,
          "errors" : false,
          "items" : [
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 2,
          "result" : "noop", <-- see "noop"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "status" : 200
          }
          },
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 3,
          "result" : "updated", <-- see "updated"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 200
          }
          },
          {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "2",
          "_version" : 1,
          "result" : "created", <-- see "created"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 0,
          "_primary_term" : 1
          }
          ]
          }


          As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created






          share|improve this answer














          You can use the noop detection when checking the result of your bulk queries.



          When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)



          # Say the document is indexed
          PUT test/doc/1
          {
          "test": "123"
          }

          # Now you want to bulk update it
          POST test/doc/_bulk
          {"update":{"_id": "1"}}
          {"doc":{"test":"123"}} <-- this will yield `result: noop`
          {"update":{"_id": "1"}}
          {"doc":{"test":"1234"}} <-- this will yield `result: updated`
          {"update":{"_id": "2"}}
          {"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`


          Result:



          {
          "took" : 6,
          "errors" : false,
          "items" : [
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 2,
          "result" : "noop", <-- see "noop"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "status" : 200
          }
          },
          {
          "update" : {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "1",
          "_version" : 3,
          "result" : "updated", <-- see "updated"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 2,
          "_primary_term" : 1,
          "status" : 200
          }
          },
          {
          "_index" : "test",
          "_type" : "doc",
          "_id" : "2",
          "_version" : 1,
          "result" : "created", <-- see "created"
          "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
          },
          "_seq_no" : 0,
          "_primary_term" : 1
          }
          ]
          }


          As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 20 at 15:55

























          answered Nov 20 at 15:36









          Val

          101k6132169




          101k6132169












          • What will the result field be in the case that the document doesn't already exist?
            – bm1729
            Nov 20 at 15:52










          • It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
            – Val
            Nov 20 at 15:53










          • Also, is it able to give you the diff (what was there and what you have changed it to)?
            – bm1729
            Nov 20 at 15:53










          • So far we've been using index.
            – bm1729
            Nov 20 at 15:53










          • No ES will not give you the diff. See my updated answer to see if it fits your use case.
            – Val
            Nov 20 at 15:56


















          • What will the result field be in the case that the document doesn't already exist?
            – bm1729
            Nov 20 at 15:52










          • It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
            – Val
            Nov 20 at 15:53










          • Also, is it able to give you the diff (what was there and what you have changed it to)?
            – bm1729
            Nov 20 at 15:53










          • So far we've been using index.
            – bm1729
            Nov 20 at 15:53










          • No ES will not give you the diff. See my updated answer to see if it fits your use case.
            – Val
            Nov 20 at 15:56
















          What will the result field be in the case that the document doesn't already exist?
          – bm1729
          Nov 20 at 15:52




          What will the result field be in the case that the document doesn't already exist?
          – bm1729
          Nov 20 at 15:52












          It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
          – Val
          Nov 20 at 15:53




          It depends on what you're sending in your bulk query (index, update, etc). Feel free to show your bulk query
          – Val
          Nov 20 at 15:53












          Also, is it able to give you the diff (what was there and what you have changed it to)?
          – bm1729
          Nov 20 at 15:53




          Also, is it able to give you the diff (what was there and what you have changed it to)?
          – bm1729
          Nov 20 at 15:53












          So far we've been using index.
          – bm1729
          Nov 20 at 15:53




          So far we've been using index.
          – bm1729
          Nov 20 at 15:53












          No ES will not give you the diff. See my updated answer to see if it fits your use case.
          – Val
          Nov 20 at 15:56




          No ES will not give you the diff. See my updated answer to see if it fits your use case.
          – Val
          Nov 20 at 15:56


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53396315%2fdetect-changes-during-bulk-indexing%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Create new schema in PostgreSQL using DBeaver

          Deepest pit of an array with Javascript: test on Codility

          Costa Masnaga