Apache Solr Field data type change from Strings












0














One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings,



Unfortunately text_general does not help as it is similar to string and not strings.
Is there any other datatype that could help?










share|improve this question






















  • The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
    – Toke Eskildsen
    Nov 20 at 19:18










  • I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
    – BARATH
    Nov 20 at 19:47










  • The text_general field is by default a good match. What's the issue about using it?
    – MatsLindh
    Nov 20 at 21:29










  • Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
    – BARATH
    Nov 20 at 21:31
















0














One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings,



Unfortunately text_general does not help as it is similar to string and not strings.
Is there any other datatype that could help?










share|improve this question






















  • The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
    – Toke Eskildsen
    Nov 20 at 19:18










  • I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
    – BARATH
    Nov 20 at 19:47










  • The text_general field is by default a good match. What's the issue about using it?
    – MatsLindh
    Nov 20 at 21:29










  • Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
    – BARATH
    Nov 20 at 21:31














0












0








0







One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings,



Unfortunately text_general does not help as it is similar to string and not strings.
Is there any other datatype that could help?










share|improve this question













One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings,



Unfortunately text_general does not help as it is similar to string and not strings.
Is there any other datatype that could help?







apache solr lucene






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 at 18:13









BARATH

678




678












  • The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
    – Toke Eskildsen
    Nov 20 at 19:18










  • I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
    – BARATH
    Nov 20 at 19:47










  • The text_general field is by default a good match. What's the issue about using it?
    – MatsLindh
    Nov 20 at 21:29










  • Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
    – BARATH
    Nov 20 at 21:31


















  • The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
    – Toke Eskildsen
    Nov 20 at 19:18










  • I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
    – BARATH
    Nov 20 at 19:47










  • The text_general field is by default a good match. What's the issue about using it?
    – MatsLindh
    Nov 20 at 21:29










  • Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
    – BARATH
    Nov 20 at 21:31
















The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
– Toke Eskildsen
Nov 20 at 19:18




The default text_general is a TextField, which is quite different from StrField. TextField has no size limit as such, while StrField is limited to about 32K characters. Another difference is that text_general is tokenized (split into words), which is probably what you want for search.
– Toke Eskildsen
Nov 20 at 19:18












I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
– BARATH
Nov 20 at 19:47




I am getting the length issue with strings field, So what could be the possible data type that I change to for processing it?
– BARATH
Nov 20 at 19:47












The text_general field is by default a good match. What's the issue about using it?
– MatsLindh
Nov 20 at 21:29




The text_general field is by default a good match. What's the issue about using it?
– MatsLindh
Nov 20 at 21:29












Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
– BARATH
Nov 20 at 21:31




Text_General suits good for string but we are using strings data type which is like an array and holds multiple strings so is there any thing similar to text_general that can use a list of items in it?
– BARATH
Nov 20 at 21:31












2 Answers
2






active

oldest

votes


















2














Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.



When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.



<field name="string_field" type="string" multiValued="true"/>


would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.



So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.



<field name="your_field_name" type="text_general" multiValued="true" />


The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.



Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.



If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.






share|improve this answer





























    0














    text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.



    Another advantage of text_general is it allows tokenization; strings do not.






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399069%2fapache-solr-field-data-type-change-from-strings%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2














      Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.



      When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.



      <field name="string_field" type="string" multiValued="true"/>


      would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.



      So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.



      <field name="your_field_name" type="text_general" multiValued="true" />


      The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.



      Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.



      If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.






      share|improve this answer


























        2














        Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.



        When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.



        <field name="string_field" type="string" multiValued="true"/>


        would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.



        So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.



        <field name="your_field_name" type="text_general" multiValued="true" />


        The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.



        Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.



        If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.






        share|improve this answer
























          2












          2








          2






          Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.



          When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.



          <field name="string_field" type="string" multiValued="true"/>


          would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.



          So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.



          <field name="your_field_name" type="text_general" multiValued="true" />


          The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.



          Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.



          If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.






          share|improve this answer












          Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.



          When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.



          <field name="string_field" type="string" multiValued="true"/>


          would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.



          So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.



          <field name="your_field_name" type="text_general" multiValued="true" />


          The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.



          Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.



          If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 at 9:19









          MatsLindh

          24.7k22241




          24.7k22241

























              0














              text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.



              Another advantage of text_general is it allows tokenization; strings do not.






              share|improve this answer


























                0














                text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.



                Another advantage of text_general is it allows tokenization; strings do not.






                share|improve this answer
























                  0












                  0








                  0






                  text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.



                  Another advantage of text_general is it allows tokenization; strings do not.






                  share|improve this answer












                  text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.



                  Another advantage of text_general is it allows tokenization; strings do not.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 at 1:32









                  user2110254

                  112




                  112






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53399069%2fapache-solr-field-data-type-change-from-strings%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Ottavio Pratesi

                      Tricia Helfer

                      15 giugno