PostgreSQL Full Text Search and reserved words, preserving some words
up vote
2
down vote
favorite
I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.
And so:
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')
returns 0 results.
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')
returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:
ts_vector1 = to_tsvector('english', some_text_column)
Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?
postgresql full-text-search tsvector
add a comment |
up vote
2
down vote
favorite
I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.
And so:
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')
returns 0 results.
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')
returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:
ts_vector1 = to_tsvector('english', some_text_column)
Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?
postgresql full-text-search tsvector
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.
And so:
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')
returns 0 results.
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')
returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:
ts_vector1 = to_tsvector('english', some_text_column)
Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?
postgresql full-text-search tsvector
I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.
And so:
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')
returns 0 results.
SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')
returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:
ts_vector1 = to_tsvector('english', some_text_column)
Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?
postgresql full-text-search tsvector
postgresql full-text-search tsvector
edited Nov 19 at 11:49
Christophe Roussy
8,97815355
8,97815355
asked Oct 2 '13 at 10:35
Adam Lesiak
35819
35819
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
3
down vote
accepted
'It' is ignored as a stop word, per the relevant docs:
http://www.postgresql.org/docs/current/static/textsearch-controls.html
In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
add a comment |
up vote
0
down vote
Ok so 2013 is a while ago but the problem is still valid.
You want to remove 'it' because it is noise, but keep the 'IT' word.
Usually 'it' for information technology is written as 'IT'.
Before feeding full-text search via to_tsvector
:
Tokenize your text
Replace "IT" word by "information technology"
Before doing a search using to_tsquery:
Tokenize search query text
Replace "IT" word by "information technology"
You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.
Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
'It' is ignored as a stop word, per the relevant docs:
http://www.postgresql.org/docs/current/static/textsearch-controls.html
In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
add a comment |
up vote
3
down vote
accepted
'It' is ignored as a stop word, per the relevant docs:
http://www.postgresql.org/docs/current/static/textsearch-controls.html
In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
add a comment |
up vote
3
down vote
accepted
up vote
3
down vote
accepted
'It' is ignored as a stop word, per the relevant docs:
http://www.postgresql.org/docs/current/static/textsearch-controls.html
In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
'It' is ignored as a stop word, per the relevant docs:
http://www.postgresql.org/docs/current/static/textsearch-controls.html
In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.
You can change the list of stop words by configuring the needed dictionaries:
http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html
answered Oct 2 '13 at 12:13
Denis de Bernardy
56.4k788119
56.4k788119
add a comment |
add a comment |
up vote
0
down vote
Ok so 2013 is a while ago but the problem is still valid.
You want to remove 'it' because it is noise, but keep the 'IT' word.
Usually 'it' for information technology is written as 'IT'.
Before feeding full-text search via to_tsvector
:
Tokenize your text
Replace "IT" word by "information technology"
Before doing a search using to_tsquery:
Tokenize search query text
Replace "IT" word by "information technology"
You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.
Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.
add a comment |
up vote
0
down vote
Ok so 2013 is a while ago but the problem is still valid.
You want to remove 'it' because it is noise, but keep the 'IT' word.
Usually 'it' for information technology is written as 'IT'.
Before feeding full-text search via to_tsvector
:
Tokenize your text
Replace "IT" word by "information technology"
Before doing a search using to_tsquery:
Tokenize search query text
Replace "IT" word by "information technology"
You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.
Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.
add a comment |
up vote
0
down vote
up vote
0
down vote
Ok so 2013 is a while ago but the problem is still valid.
You want to remove 'it' because it is noise, but keep the 'IT' word.
Usually 'it' for information technology is written as 'IT'.
Before feeding full-text search via to_tsvector
:
Tokenize your text
Replace "IT" word by "information technology"
Before doing a search using to_tsquery:
Tokenize search query text
Replace "IT" word by "information technology"
You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.
Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.
Ok so 2013 is a while ago but the problem is still valid.
You want to remove 'it' because it is noise, but keep the 'IT' word.
Usually 'it' for information technology is written as 'IT'.
Before feeding full-text search via to_tsvector
:
Tokenize your text
Replace "IT" word by "information technology"
Before doing a search using to_tsquery:
Tokenize search query text
Replace "IT" word by "information technology"
You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.
Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.
answered Nov 19 at 11:53
Christophe Roussy
8,97815355
8,97815355
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f19134979%2fpostgresql-full-text-search-and-reserved-words-preserving-some-words%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown