PostgreSQL Full Text Search and reserved words, preserving some words











up vote
2
down vote

favorite












I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.



And so:



SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')


returns 0 results.



SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')


returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:



ts_vector1 = to_tsvector('english', some_text_column)


Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?










share|improve this question




























    up vote
    2
    down vote

    favorite












    I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.



    And so:



    SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')


    returns 0 results.



    SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')


    returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:



    ts_vector1 = to_tsvector('english', some_text_column)


    Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?










    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.



      And so:



      SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')


      returns 0 results.



      SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')


      returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:



      ts_vector1 = to_tsvector('english', some_text_column)


      Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?










      share|improve this question















      I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.



      And so:



      SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')


      returns 0 results.



      SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')


      returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:



      ts_vector1 = to_tsvector('english', some_text_column)


      Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?







      postgresql full-text-search tsvector






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 19 at 11:49









      Christophe Roussy

      8,97815355




      8,97815355










      asked Oct 2 '13 at 10:35









      Adam Lesiak

      35819




      35819
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          'It' is ignored as a stop word, per the relevant docs:



          http://www.postgresql.org/docs/current/static/textsearch-controls.html




          In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.




          You can change the list of stop words by configuring the needed dictionaries:



          http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html






          share|improve this answer




























            up vote
            0
            down vote













            Ok so 2013 is a while ago but the problem is still valid.
            You want to remove 'it' because it is noise, but keep the 'IT' word.
            Usually 'it' for information technology is written as 'IT'.



            Before feeding full-text search via to_tsvector:




            1. Tokenize your text


            2. Replace "IT" word by "information technology"



            Before doing a search using to_tsquery:




            1. Tokenize search query text


            2. Replace "IT" word by "information technology"



            You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.



            Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.






            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19134979%2fpostgresql-full-text-search-and-reserved-words-preserving-some-words%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              3
              down vote



              accepted










              'It' is ignored as a stop word, per the relevant docs:



              http://www.postgresql.org/docs/current/static/textsearch-controls.html




              In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.




              You can change the list of stop words by configuring the needed dictionaries:



              http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html






              share|improve this answer

























                up vote
                3
                down vote



                accepted










                'It' is ignored as a stop word, per the relevant docs:



                http://www.postgresql.org/docs/current/static/textsearch-controls.html




                In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.




                You can change the list of stop words by configuring the needed dictionaries:



                http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html






                share|improve this answer























                  up vote
                  3
                  down vote



                  accepted







                  up vote
                  3
                  down vote



                  accepted






                  'It' is ignored as a stop word, per the relevant docs:



                  http://www.postgresql.org/docs/current/static/textsearch-controls.html




                  In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.




                  You can change the list of stop words by configuring the needed dictionaries:



                  http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html






                  share|improve this answer












                  'It' is ignored as a stop word, per the relevant docs:



                  http://www.postgresql.org/docs/current/static/textsearch-controls.html




                  In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.




                  You can change the list of stop words by configuring the needed dictionaries:



                  http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Oct 2 '13 at 12:13









                  Denis de Bernardy

                  56.4k788119




                  56.4k788119
























                      up vote
                      0
                      down vote













                      Ok so 2013 is a while ago but the problem is still valid.
                      You want to remove 'it' because it is noise, but keep the 'IT' word.
                      Usually 'it' for information technology is written as 'IT'.



                      Before feeding full-text search via to_tsvector:




                      1. Tokenize your text


                      2. Replace "IT" word by "information technology"



                      Before doing a search using to_tsquery:




                      1. Tokenize search query text


                      2. Replace "IT" word by "information technology"



                      You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.



                      Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.






                      share|improve this answer

























                        up vote
                        0
                        down vote













                        Ok so 2013 is a while ago but the problem is still valid.
                        You want to remove 'it' because it is noise, but keep the 'IT' word.
                        Usually 'it' for information technology is written as 'IT'.



                        Before feeding full-text search via to_tsvector:




                        1. Tokenize your text


                        2. Replace "IT" word by "information technology"



                        Before doing a search using to_tsquery:




                        1. Tokenize search query text


                        2. Replace "IT" word by "information technology"



                        You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.



                        Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.






                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Ok so 2013 is a while ago but the problem is still valid.
                          You want to remove 'it' because it is noise, but keep the 'IT' word.
                          Usually 'it' for information technology is written as 'IT'.



                          Before feeding full-text search via to_tsvector:




                          1. Tokenize your text


                          2. Replace "IT" word by "information technology"



                          Before doing a search using to_tsquery:




                          1. Tokenize search query text


                          2. Replace "IT" word by "information technology"



                          You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.



                          Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.






                          share|improve this answer












                          Ok so 2013 is a while ago but the problem is still valid.
                          You want to remove 'it' because it is noise, but keep the 'IT' word.
                          Usually 'it' for information technology is written as 'IT'.



                          Before feeding full-text search via to_tsvector:




                          1. Tokenize your text


                          2. Replace "IT" word by "information technology"



                          Before doing a search using to_tsquery:




                          1. Tokenize search query text


                          2. Replace "IT" word by "information technology"



                          You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.



                          Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 19 at 11:53









                          Christophe Roussy

                          8,97815355




                          8,97815355






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19134979%2fpostgresql-full-text-search-and-reserved-words-preserving-some-words%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Costa Masnaga

                              Fotorealismo

                              Sidney Franklin