Faster JavaScript fuzzy string matching function?
up vote
15
down vote
favorite
I'm using the following function to fuzzy match strings:
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
Example:
fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false
It's very slow, though. How can I optimize that function?
javascript strings regex
add a comment |
up vote
15
down vote
favorite
I'm using the following function to fuzzy match strings:
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
Example:
fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false
It's very slow, though. How can I optimize that function?
javascript strings regex
2
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
1
The third example returnstrue
as well.jquery.js
does matchj.*r
.
– John Dvorak
Mar 14 '13 at 9:07
add a comment |
up vote
15
down vote
favorite
up vote
15
down vote
favorite
I'm using the following function to fuzzy match strings:
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
Example:
fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false
It's very slow, though. How can I optimize that function?
javascript strings regex
I'm using the following function to fuzzy match strings:
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
Example:
fuzzy_match("fogo","foo") //true
fuzzy_match("jquery.js","jqjs") //true
fuzzy_match("jquery.js","jr") //false
It's very slow, though. How can I optimize that function?
javascript strings regex
javascript strings regex
asked Mar 14 '13 at 6:38
MaiaVictor
6061411
6061411
2
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
1
The third example returnstrue
as well.jquery.js
does matchj.*r
.
– John Dvorak
Mar 14 '13 at 9:07
add a comment |
2
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
1
The third example returnstrue
as well.jquery.js
does matchj.*r
.
– John Dvorak
Mar 14 '13 at 9:07
2
2
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
1
1
The third example returns
true
as well. jquery.js
does match j.*r
.– John Dvorak
Mar 14 '13 at 9:07
The third example returns
true
as well. jquery.js
does match j.*r
.– John Dvorak
Mar 14 '13 at 9:07
add a comment |
2 Answers
2
active
oldest
votes
up vote
35
down vote
accepted
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce
already exists, and it's calledjoin
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised:
.*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier:.*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better:a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
add a comment |
up vote
0
down vote
My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:
In some cases, you might need to know it eg. to make parts of your input bold in the search results
It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/
It works like:
fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
parts: [
{ content: 'l', type: 'input' },
{ content: 'orem ', type: 'fuzzy' },
{ content: 'i', type: 'input' },
{ content: 'psum d', type: 'fuzzy' },
{ content: 'olor', type: 'input' },
{ content: ' sit', type: 'suggestion' },
],
score: 0.87,
}
Here is full implementation (Typescript)
type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
content: string;
type: MatchRoleType;
}
interface FuzzyMatchData {
parts: FuzzyMatchPart;
score: number;
}
interface FuzzyMatchOptions {
truncateTooLongInput?: boolean;
isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {
const getRoleLength = (role: MatchRoleType) =>
fuzzyMatchParts
.filter((part) => part.type === role)
.map((part) => part.content)
.join('').length;
const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
.length;
const fuzzyLength = getRoleLength('fuzzy');
const inputLength = getRoleLength('input');
const suggestionLength = getRoleLength('suggestion');
return (
(inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
);
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
if (isCaseSensitive) {
return a === b;
}
return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
input: string,
stringToBeFound: string,
{ truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
// make some validation first
// if input is longer than string to find, and we dont truncate it - it's incorrect
if (input.length > stringToBeFound.length && !truncateTooLongInput) {
return false;
}
// if truncate is enabled - do it
if (input.length > stringToBeFound.length && truncateTooLongInput) {
input = input.substr(0, stringToBeFound.length);
}
// if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
if (input === stringToBeFound) {
return {
parts: [{ content: input, type: 'input' }],
score: 1,
};
}
const matchParts: FuzzyMatchPart = ;
const remainingInputLetters = input.split('');
// let's create letters buffers
// it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
// we want to add them together as part of match
let ommitedLettersBuffer: string = ;
let matchedLettersBuffer: string = ;
// helper functions to clear the buffers and add them to match
function addOmmitedLettersAsFuzzy() {
if (ommitedLettersBuffer.length > 0) {
matchParts.push({
content: ommitedLettersBuffer.join(''),
type: 'fuzzy',
});
ommitedLettersBuffer = ;
}
}
function addMatchedLettersAsInput() {
if (matchedLettersBuffer.length > 0) {
matchParts.push({
content: matchedLettersBuffer.join(''),
type: 'input',
});
matchedLettersBuffer = ;
}
}
for (let anotherStringToBeFoundLetter of stringToBeFound) {
const inputLetterToMatch = remainingInputLetters[0];
// no more input - finish fuzzy matching
if (!inputLetterToMatch) {
break;
}
const isMatching = compareLetters(
anotherStringToBeFoundLetter,
inputLetterToMatch,
isCaseSesitive,
);
// if input letter doesnt match - we'll go to the next letter to try again
if (!isMatching) {
// add this letter to buffer of ommited letters
ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in matched letters buffer - clear it as matching letters run ended
addMatchedLettersAsInput();
// go to the next input letter
continue;
}
// we have input letter matching!
// remove it from remaining input letters
remainingInputLetters.shift();
// add it to matched letters buffer
matchedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in ommited letters buffer - add it to the match now
addOmmitedLettersAsFuzzy();
// if there is no more letters in input - add this matched letter to match too
if (!remainingInputLetters.length) {
addMatchedLettersAsInput();
}
}
// if we still have letters left in input - means not all input was included in string to find - input was incorrect
if (remainingInputLetters.length > 0) {
return false;
}
// lets get entire matched part (from start to last letter of input)
const matchedPart = matchParts.map((match) => match.content).join('');
// get remaining part of string to be found
const suggestionPart = stringToBeFound.replace(matchedPart, '');
// if we have remaining part - add it as suggestion
if (suggestionPart) {
matchParts.push({ content: suggestionPart, type: 'suggestion' });
}
const score = calculateFuzzyMatchPartsScore(matchParts);
return {
score,
parts: matchParts,
};
}
New contributor
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
35
down vote
accepted
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce
already exists, and it's calledjoin
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised:
.*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier:.*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better:a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
add a comment |
up vote
35
down vote
accepted
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce
already exists, and it's calledjoin
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised:
.*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier:.*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better:a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
add a comment |
up vote
35
down vote
accepted
up vote
35
down vote
accepted
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce
already exists, and it's calledjoin
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised:
.*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier:.*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better:a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
function fuzzy_match(str,pattern){
pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });
return (new RegExp(pattern)).test(str);
};
String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that
reduce
already exists, and it's calledjoin
:
pattern = pattern.split("").join(".*")
The regex itself can be optimised:
.*
initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier:.*?
. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better:a[^b]*b
(at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b
) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
var cache = _.memoize(function(pattern){
return new RegExp(pattern.split("").reduce(function(a,b){
return a+'[^'+b+']*'+b;
})
})
function fuzzy_match(str,pattern){
return cache(pattern).test(str)
};
If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
var fuzzy_match = (function(){
var cache = _.memoize(function(str){
return new RexEgp("^"+str.replace(/./g, function(x){
return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";
})+"$");
})
return function(str, pattern){
return cache(str).test(pattern)
})
})()
Concerning the last regex:
Given some pattern "ace"
, the regex you build (/a.*c.*e/
) tests if the string contains the characters of the pattern in the string, in the correct order.
If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/
. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?
), and flank the regex with string boundary anchors.
edited May 23 '17 at 11:33
Community♦
1
1
answered Mar 14 '13 at 12:37
John Dvorak
1,133814
1,133814
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
add a comment |
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27
1
1
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50
add a comment |
up vote
0
down vote
My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:
In some cases, you might need to know it eg. to make parts of your input bold in the search results
It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/
It works like:
fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
parts: [
{ content: 'l', type: 'input' },
{ content: 'orem ', type: 'fuzzy' },
{ content: 'i', type: 'input' },
{ content: 'psum d', type: 'fuzzy' },
{ content: 'olor', type: 'input' },
{ content: ' sit', type: 'suggestion' },
],
score: 0.87,
}
Here is full implementation (Typescript)
type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
content: string;
type: MatchRoleType;
}
interface FuzzyMatchData {
parts: FuzzyMatchPart;
score: number;
}
interface FuzzyMatchOptions {
truncateTooLongInput?: boolean;
isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {
const getRoleLength = (role: MatchRoleType) =>
fuzzyMatchParts
.filter((part) => part.type === role)
.map((part) => part.content)
.join('').length;
const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
.length;
const fuzzyLength = getRoleLength('fuzzy');
const inputLength = getRoleLength('input');
const suggestionLength = getRoleLength('suggestion');
return (
(inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
);
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
if (isCaseSensitive) {
return a === b;
}
return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
input: string,
stringToBeFound: string,
{ truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
// make some validation first
// if input is longer than string to find, and we dont truncate it - it's incorrect
if (input.length > stringToBeFound.length && !truncateTooLongInput) {
return false;
}
// if truncate is enabled - do it
if (input.length > stringToBeFound.length && truncateTooLongInput) {
input = input.substr(0, stringToBeFound.length);
}
// if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
if (input === stringToBeFound) {
return {
parts: [{ content: input, type: 'input' }],
score: 1,
};
}
const matchParts: FuzzyMatchPart = ;
const remainingInputLetters = input.split('');
// let's create letters buffers
// it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
// we want to add them together as part of match
let ommitedLettersBuffer: string = ;
let matchedLettersBuffer: string = ;
// helper functions to clear the buffers and add them to match
function addOmmitedLettersAsFuzzy() {
if (ommitedLettersBuffer.length > 0) {
matchParts.push({
content: ommitedLettersBuffer.join(''),
type: 'fuzzy',
});
ommitedLettersBuffer = ;
}
}
function addMatchedLettersAsInput() {
if (matchedLettersBuffer.length > 0) {
matchParts.push({
content: matchedLettersBuffer.join(''),
type: 'input',
});
matchedLettersBuffer = ;
}
}
for (let anotherStringToBeFoundLetter of stringToBeFound) {
const inputLetterToMatch = remainingInputLetters[0];
// no more input - finish fuzzy matching
if (!inputLetterToMatch) {
break;
}
const isMatching = compareLetters(
anotherStringToBeFoundLetter,
inputLetterToMatch,
isCaseSesitive,
);
// if input letter doesnt match - we'll go to the next letter to try again
if (!isMatching) {
// add this letter to buffer of ommited letters
ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in matched letters buffer - clear it as matching letters run ended
addMatchedLettersAsInput();
// go to the next input letter
continue;
}
// we have input letter matching!
// remove it from remaining input letters
remainingInputLetters.shift();
// add it to matched letters buffer
matchedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in ommited letters buffer - add it to the match now
addOmmitedLettersAsFuzzy();
// if there is no more letters in input - add this matched letter to match too
if (!remainingInputLetters.length) {
addMatchedLettersAsInput();
}
}
// if we still have letters left in input - means not all input was included in string to find - input was incorrect
if (remainingInputLetters.length > 0) {
return false;
}
// lets get entire matched part (from start to last letter of input)
const matchedPart = matchParts.map((match) => match.content).join('');
// get remaining part of string to be found
const suggestionPart = stringToBeFound.replace(matchedPart, '');
// if we have remaining part - add it as suggestion
if (suggestionPart) {
matchParts.push({ content: suggestionPart, type: 'suggestion' });
}
const score = calculateFuzzyMatchPartsScore(matchParts);
return {
score,
parts: matchParts,
};
}
New contributor
add a comment |
up vote
0
down vote
My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:
In some cases, you might need to know it eg. to make parts of your input bold in the search results
It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/
It works like:
fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
parts: [
{ content: 'l', type: 'input' },
{ content: 'orem ', type: 'fuzzy' },
{ content: 'i', type: 'input' },
{ content: 'psum d', type: 'fuzzy' },
{ content: 'olor', type: 'input' },
{ content: ' sit', type: 'suggestion' },
],
score: 0.87,
}
Here is full implementation (Typescript)
type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
content: string;
type: MatchRoleType;
}
interface FuzzyMatchData {
parts: FuzzyMatchPart;
score: number;
}
interface FuzzyMatchOptions {
truncateTooLongInput?: boolean;
isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {
const getRoleLength = (role: MatchRoleType) =>
fuzzyMatchParts
.filter((part) => part.type === role)
.map((part) => part.content)
.join('').length;
const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
.length;
const fuzzyLength = getRoleLength('fuzzy');
const inputLength = getRoleLength('input');
const suggestionLength = getRoleLength('suggestion');
return (
(inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
);
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
if (isCaseSensitive) {
return a === b;
}
return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
input: string,
stringToBeFound: string,
{ truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
// make some validation first
// if input is longer than string to find, and we dont truncate it - it's incorrect
if (input.length > stringToBeFound.length && !truncateTooLongInput) {
return false;
}
// if truncate is enabled - do it
if (input.length > stringToBeFound.length && truncateTooLongInput) {
input = input.substr(0, stringToBeFound.length);
}
// if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
if (input === stringToBeFound) {
return {
parts: [{ content: input, type: 'input' }],
score: 1,
};
}
const matchParts: FuzzyMatchPart = ;
const remainingInputLetters = input.split('');
// let's create letters buffers
// it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
// we want to add them together as part of match
let ommitedLettersBuffer: string = ;
let matchedLettersBuffer: string = ;
// helper functions to clear the buffers and add them to match
function addOmmitedLettersAsFuzzy() {
if (ommitedLettersBuffer.length > 0) {
matchParts.push({
content: ommitedLettersBuffer.join(''),
type: 'fuzzy',
});
ommitedLettersBuffer = ;
}
}
function addMatchedLettersAsInput() {
if (matchedLettersBuffer.length > 0) {
matchParts.push({
content: matchedLettersBuffer.join(''),
type: 'input',
});
matchedLettersBuffer = ;
}
}
for (let anotherStringToBeFoundLetter of stringToBeFound) {
const inputLetterToMatch = remainingInputLetters[0];
// no more input - finish fuzzy matching
if (!inputLetterToMatch) {
break;
}
const isMatching = compareLetters(
anotherStringToBeFoundLetter,
inputLetterToMatch,
isCaseSesitive,
);
// if input letter doesnt match - we'll go to the next letter to try again
if (!isMatching) {
// add this letter to buffer of ommited letters
ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in matched letters buffer - clear it as matching letters run ended
addMatchedLettersAsInput();
// go to the next input letter
continue;
}
// we have input letter matching!
// remove it from remaining input letters
remainingInputLetters.shift();
// add it to matched letters buffer
matchedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in ommited letters buffer - add it to the match now
addOmmitedLettersAsFuzzy();
// if there is no more letters in input - add this matched letter to match too
if (!remainingInputLetters.length) {
addMatchedLettersAsInput();
}
}
// if we still have letters left in input - means not all input was included in string to find - input was incorrect
if (remainingInputLetters.length > 0) {
return false;
}
// lets get entire matched part (from start to last letter of input)
const matchedPart = matchParts.map((match) => match.content).join('');
// get remaining part of string to be found
const suggestionPart = stringToBeFound.replace(matchedPart, '');
// if we have remaining part - add it as suggestion
if (suggestionPart) {
matchParts.push({ content: suggestionPart, type: 'suggestion' });
}
const score = calculateFuzzyMatchPartsScore(matchParts);
return {
score,
parts: matchParts,
};
}
New contributor
add a comment |
up vote
0
down vote
up vote
0
down vote
My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:
In some cases, you might need to know it eg. to make parts of your input bold in the search results
It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/
It works like:
fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
parts: [
{ content: 'l', type: 'input' },
{ content: 'orem ', type: 'fuzzy' },
{ content: 'i', type: 'input' },
{ content: 'psum d', type: 'fuzzy' },
{ content: 'olor', type: 'input' },
{ content: ' sit', type: 'suggestion' },
],
score: 0.87,
}
Here is full implementation (Typescript)
type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
content: string;
type: MatchRoleType;
}
interface FuzzyMatchData {
parts: FuzzyMatchPart;
score: number;
}
interface FuzzyMatchOptions {
truncateTooLongInput?: boolean;
isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {
const getRoleLength = (role: MatchRoleType) =>
fuzzyMatchParts
.filter((part) => part.type === role)
.map((part) => part.content)
.join('').length;
const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
.length;
const fuzzyLength = getRoleLength('fuzzy');
const inputLength = getRoleLength('input');
const suggestionLength = getRoleLength('suggestion');
return (
(inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
);
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
if (isCaseSensitive) {
return a === b;
}
return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
input: string,
stringToBeFound: string,
{ truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
// make some validation first
// if input is longer than string to find, and we dont truncate it - it's incorrect
if (input.length > stringToBeFound.length && !truncateTooLongInput) {
return false;
}
// if truncate is enabled - do it
if (input.length > stringToBeFound.length && truncateTooLongInput) {
input = input.substr(0, stringToBeFound.length);
}
// if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
if (input === stringToBeFound) {
return {
parts: [{ content: input, type: 'input' }],
score: 1,
};
}
const matchParts: FuzzyMatchPart = ;
const remainingInputLetters = input.split('');
// let's create letters buffers
// it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
// we want to add them together as part of match
let ommitedLettersBuffer: string = ;
let matchedLettersBuffer: string = ;
// helper functions to clear the buffers and add them to match
function addOmmitedLettersAsFuzzy() {
if (ommitedLettersBuffer.length > 0) {
matchParts.push({
content: ommitedLettersBuffer.join(''),
type: 'fuzzy',
});
ommitedLettersBuffer = ;
}
}
function addMatchedLettersAsInput() {
if (matchedLettersBuffer.length > 0) {
matchParts.push({
content: matchedLettersBuffer.join(''),
type: 'input',
});
matchedLettersBuffer = ;
}
}
for (let anotherStringToBeFoundLetter of stringToBeFound) {
const inputLetterToMatch = remainingInputLetters[0];
// no more input - finish fuzzy matching
if (!inputLetterToMatch) {
break;
}
const isMatching = compareLetters(
anotherStringToBeFoundLetter,
inputLetterToMatch,
isCaseSesitive,
);
// if input letter doesnt match - we'll go to the next letter to try again
if (!isMatching) {
// add this letter to buffer of ommited letters
ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in matched letters buffer - clear it as matching letters run ended
addMatchedLettersAsInput();
// go to the next input letter
continue;
}
// we have input letter matching!
// remove it from remaining input letters
remainingInputLetters.shift();
// add it to matched letters buffer
matchedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in ommited letters buffer - add it to the match now
addOmmitedLettersAsFuzzy();
// if there is no more letters in input - add this matched letter to match too
if (!remainingInputLetters.length) {
addMatchedLettersAsInput();
}
}
// if we still have letters left in input - means not all input was included in string to find - input was incorrect
if (remainingInputLetters.length > 0) {
return false;
}
// lets get entire matched part (from start to last letter of input)
const matchedPart = matchParts.map((match) => match.content).join('');
// get remaining part of string to be found
const suggestionPart = stringToBeFound.replace(matchedPart, '');
// if we have remaining part - add it as suggestion
if (suggestionPart) {
matchParts.push({ content: suggestionPart, type: 'suggestion' });
}
const score = calculateFuzzyMatchPartsScore(matchParts);
return {
score,
parts: matchParts,
};
}
New contributor
My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:
In some cases, you might need to know it eg. to make parts of your input bold in the search results
It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/
It works like:
fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
parts: [
{ content: 'l', type: 'input' },
{ content: 'orem ', type: 'fuzzy' },
{ content: 'i', type: 'input' },
{ content: 'psum d', type: 'fuzzy' },
{ content: 'olor', type: 'input' },
{ content: ' sit', type: 'suggestion' },
],
score: 0.87,
}
Here is full implementation (Typescript)
type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
content: string;
type: MatchRoleType;
}
interface FuzzyMatchData {
parts: FuzzyMatchPart;
score: number;
}
interface FuzzyMatchOptions {
truncateTooLongInput?: boolean;
isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {
const getRoleLength = (role: MatchRoleType) =>
fuzzyMatchParts
.filter((part) => part.type === role)
.map((part) => part.content)
.join('').length;
const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
.length;
const fuzzyLength = getRoleLength('fuzzy');
const inputLength = getRoleLength('input');
const suggestionLength = getRoleLength('suggestion');
return (
(inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
);
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
if (isCaseSensitive) {
return a === b;
}
return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
input: string,
stringToBeFound: string,
{ truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
// make some validation first
// if input is longer than string to find, and we dont truncate it - it's incorrect
if (input.length > stringToBeFound.length && !truncateTooLongInput) {
return false;
}
// if truncate is enabled - do it
if (input.length > stringToBeFound.length && truncateTooLongInput) {
input = input.substr(0, stringToBeFound.length);
}
// if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
if (input === stringToBeFound) {
return {
parts: [{ content: input, type: 'input' }],
score: 1,
};
}
const matchParts: FuzzyMatchPart = ;
const remainingInputLetters = input.split('');
// let's create letters buffers
// it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
// we want to add them together as part of match
let ommitedLettersBuffer: string = ;
let matchedLettersBuffer: string = ;
// helper functions to clear the buffers and add them to match
function addOmmitedLettersAsFuzzy() {
if (ommitedLettersBuffer.length > 0) {
matchParts.push({
content: ommitedLettersBuffer.join(''),
type: 'fuzzy',
});
ommitedLettersBuffer = ;
}
}
function addMatchedLettersAsInput() {
if (matchedLettersBuffer.length > 0) {
matchParts.push({
content: matchedLettersBuffer.join(''),
type: 'input',
});
matchedLettersBuffer = ;
}
}
for (let anotherStringToBeFoundLetter of stringToBeFound) {
const inputLetterToMatch = remainingInputLetters[0];
// no more input - finish fuzzy matching
if (!inputLetterToMatch) {
break;
}
const isMatching = compareLetters(
anotherStringToBeFoundLetter,
inputLetterToMatch,
isCaseSesitive,
);
// if input letter doesnt match - we'll go to the next letter to try again
if (!isMatching) {
// add this letter to buffer of ommited letters
ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in matched letters buffer - clear it as matching letters run ended
addMatchedLettersAsInput();
// go to the next input letter
continue;
}
// we have input letter matching!
// remove it from remaining input letters
remainingInputLetters.shift();
// add it to matched letters buffer
matchedLettersBuffer.push(anotherStringToBeFoundLetter);
// in case we had something in ommited letters buffer - add it to the match now
addOmmitedLettersAsFuzzy();
// if there is no more letters in input - add this matched letter to match too
if (!remainingInputLetters.length) {
addMatchedLettersAsInput();
}
}
// if we still have letters left in input - means not all input was included in string to find - input was incorrect
if (remainingInputLetters.length > 0) {
return false;
}
// lets get entire matched part (from start to last letter of input)
const matchedPart = matchParts.map((match) => match.content).join('');
// get remaining part of string to be found
const suggestionPart = stringToBeFound.replace(matchedPart, '');
// if we have remaining part - add it as suggestion
if (suggestionPart) {
matchParts.push({ content: suggestionPart, type: 'suggestion' });
}
const score = calculateFuzzyMatchPartsScore(matchParts);
return {
score,
parts: matchParts,
};
}
New contributor
New contributor
answered 18 mins ago
pie6k
101
101
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f23899%2ffaster-javascript-fuzzy-string-matching-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02
1
The third example returns
true
as well.jquery.js
does matchj.*r
.– John Dvorak
Mar 14 '13 at 9:07