Faster JavaScript fuzzy string matching function?

up vote
15
down vote

favorite

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

Example:

fuzzy_match("fogo","foo") //true

fuzzy_match("jquery.js","jqjs") //true

fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

2

Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02

1

The third example returns true as well. jquery.js does match j.*r.
– John Dvorak
Mar 14 '13 at 9:07

add a comment |

up vote
15
down vote

favorite

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

Example:

fuzzy_match("fogo","foo") //true

fuzzy_match("jquery.js","jqjs") //true

fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

2

Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02

1

The third example returns true as well. jquery.js does match j.*r.
– John Dvorak
Mar 14 '13 at 9:07

add a comment |

up vote
15
down vote

favorite

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

Example:

fuzzy_match("fogo","foo") //true

fuzzy_match("jquery.js","jqjs") //true

fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

I'm using the following function to fuzzy match strings:

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

Example:

fuzzy_match("fogo","foo") //true

fuzzy_match("jquery.js","jqjs") //true

fuzzy_match("jquery.js","jr") //false

It's very slow, though. How can I optimize that function?

javascript strings regex

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

asked Mar 14 '13 at 6:38

MaiaVictor

6061411

2

Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02

1

The third example returns true as well. jquery.js does match j.*r.
– John Dvorak
Mar 14 '13 at 9:07

add a comment |

2

Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02

1

The third example returns true as well. jquery.js does match j.*r.
– John Dvorak
Mar 14 '13 at 9:07

Why is 360k operations per second slow? If your application is slow, I think the problem is not there.
– Florian Margaine
Mar 14 '13 at 9:02

The third example returns true as well. jquery.js does match j.*r.
– John Dvorak
Mar 14 '13 at 9:07

add a comment |

2 Answers
2

active

oldest

votes

up vote
35
down vote

accepted

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```

The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```

Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){

  return new RegExp(pattern.split("").reduce(function(a,b){

    return a+'[^'+b+']*'+b;

  })

})

function fuzzy_match(str,pattern){

  return cache(pattern).test(str)

};
```

If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){

  var cache = _.memoize(function(str){

    return new RexEgp("^"+str.replace(/./g, function(x){

      return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";

    })+"$");

  })

  return function(str, pattern){

    return cache(str).test(pattern)

  })

})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

If you want to test if a given string "abcde" is matched some pattern: The pattern must only contain the characters of the string, in the correct order: /^a?b?c?d?e?$/. To do this, we regex-escape every regex special character (pattern source: CoolAj86), make every character optional (?), and flank the regex with string boundary anchors.

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

1

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

add a comment |

up vote
0
down vote

My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:

In some cases, you might need to know it eg. to make parts of your input bold in the search results

It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

It works like:

fuzzyString('liolor', 'lorem ipsum dolor sit');



// returns

{

  parts: [

    { content: 'l', type: 'input' },

    { content: 'orem ', type: 'fuzzy' },

    { content: 'i', type: 'input' },

    { content: 'psum d', type: 'fuzzy' },

    { content: 'olor', type: 'input' },

    { content: ' sit', type: 'suggestion' },

  ],

  score: 0.87,

}

Here is full implementation (Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';



interface FuzzyMatchPart {

  content: string;

  type: MatchRoleType;

}



interface FuzzyMatchData {

  parts: FuzzyMatchPart;

  score: number;

}



interface FuzzyMatchOptions {

  truncateTooLongInput?: boolean;

  isCaseSesitive?: boolean;

}



function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {

  const getRoleLength = (role: MatchRoleType) =>

    fuzzyMatchParts

      .filter((part) => part.type === role)

      .map((part) => part.content)

      .join('').length;



  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')

    .length;

  const fuzzyLength = getRoleLength('fuzzy');

  const inputLength = getRoleLength('input');

  const suggestionLength = getRoleLength('suggestion');



  return (

    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength

  );

}



function compareLetters(a: string, b: string, isCaseSensitive = false) {

  if (isCaseSensitive) {

    return a === b;

  }

  return a.toLowerCase() === b.toLowerCase();

}



function fuzzyString(

  input: string,

  stringToBeFound: string,

  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},

): FuzzyMatchData | false {

  // make some validation first



  // if input is longer than string to find, and we dont truncate it - it's incorrect

  if (input.length > stringToBeFound.length && !truncateTooLongInput) {

    return false;

  }



  // if truncate is enabled - do it

  if (input.length > stringToBeFound.length && truncateTooLongInput) {

    input = input.substr(0, stringToBeFound.length);

  }



  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match

  if (input === stringToBeFound) {

    return {

      parts: [{ content: input, type: 'input' }],

      score: 1,

    };

  }



  const matchParts: FuzzyMatchPart = ;



  const remainingInputLetters = input.split('');



  // let's create letters buffers

  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row

  // we want to add them together as part of match

  let ommitedLettersBuffer: string = ;

  let matchedLettersBuffer: string = ;



  // helper functions to clear the buffers and add them to match

  function addOmmitedLettersAsFuzzy() {

    if (ommitedLettersBuffer.length > 0) {

      matchParts.push({

        content: ommitedLettersBuffer.join(''),

        type: 'fuzzy',

      });

      ommitedLettersBuffer = ;

    }

  }



  function addMatchedLettersAsInput() {

    if (matchedLettersBuffer.length > 0) {

      matchParts.push({

        content: matchedLettersBuffer.join(''),

        type: 'input',

      });

      matchedLettersBuffer = ;

    }

  }



  for (let anotherStringToBeFoundLetter of stringToBeFound) {

    const inputLetterToMatch = remainingInputLetters[0];



    // no more input - finish fuzzy matching

    if (!inputLetterToMatch) {

      break;

    }



    const isMatching = compareLetters(

      anotherStringToBeFoundLetter,

      inputLetterToMatch,

      isCaseSesitive,

    );



    // if input letter doesnt match - we'll go to the next letter to try again

    if (!isMatching) {

      // add this letter to buffer of ommited letters

      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);

      // in case we had something in matched letters buffer - clear it as matching letters run ended

      addMatchedLettersAsInput();

      // go to the next input letter

      continue;

    }



    // we have input letter matching!



    // remove it from remaining input letters

    remainingInputLetters.shift();



    // add it to matched letters buffer

    matchedLettersBuffer.push(anotherStringToBeFoundLetter);

    // in case we had something in ommited letters buffer - add it to the match now

    addOmmitedLettersAsFuzzy();



    // if there is no more letters in input - add this matched letter to match too

    if (!remainingInputLetters.length) {

      addMatchedLettersAsInput();

    }

  }



  // if we still have letters left in input - means not all input was included in string to find - input was incorrect

  if (remainingInputLetters.length > 0) {

    return false;

  }



  // lets get entire matched part (from start to last letter of input)

  const matchedPart = matchParts.map((match) => match.content).join('');



  // get remaining part of string to be found

  const suggestionPart = stringToBeFound.replace(matchedPart, '');



  // if we have remaining part - add it as suggestion

  if (suggestionPart) {

    matchParts.push({ content: suggestionPart, type: 'suggestion' });

  }

  const score = calculateFuzzyMatchPartsScore(matchParts);



  return {

    score,

    parts: matchParts,

  };

}

answered 18 mins ago

pie6k

101

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f23899%2ffaster-javascript-fuzzy-string-matching-function%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
35
down vote

accepted

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```

The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```

Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){

  return new RegExp(pattern.split("").reduce(function(a,b){

    return a+'[^'+b+']*'+b;

  })

})

function fuzzy_match(str,pattern){

  return cache(pattern).test(str)

};
```

If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){

  var cache = _.memoize(function(str){

    return new RexEgp("^"+str.replace(/./g, function(x){

      return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";

    })+"$");

  })

  return function(str, pattern){

    return cache(str).test(pattern)

  })

})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

1

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

add a comment |

up vote
35
down vote

accepted

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```

The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```

Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){

  return new RegExp(pattern.split("").reduce(function(a,b){

    return a+'[^'+b+']*'+b;

  })

})

function fuzzy_match(str,pattern){

  return cache(pattern).test(str)

};
```

If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){

  var cache = _.memoize(function(str){

    return new RexEgp("^"+str.replace(/./g, function(x){

      return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";

    })+"$");

  })

  return function(str, pattern){

    return cache(str).test(pattern)

  })

})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

1

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

add a comment |

up vote
35
down vote

accepted

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```

The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```

Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){

  return new RegExp(pattern.split("").reduce(function(a,b){

    return a+'[^'+b+']*'+b;

  })

})

function fuzzy_match(str,pattern){

  return cache(pattern).test(str)

};
```

If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){

  var cache = _.memoize(function(str){

    return new RexEgp("^"+str.replace(/./g, function(x){

      return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";

    })+"$");

  })

  return function(str, pattern){

    return cache(str).test(pattern)

  })

})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

function fuzzy_match(str,pattern){

    pattern = pattern.split("").reduce(function(a,b){ return a+".*"+b; });

    return (new RegExp(pattern)).test(str);

};

String concatenation is slow. The reason is that you need to allocate new memory anytime you concatenate two strings. A smart compiler may optimise that, but not too much. However, what you are trying to do with that reduce already exists, and it's called join:
```
pattern = pattern.split("").join(".*")
```

The regex itself can be optimised: .* initially grabs as much as possible, then retracts only if the rest doesn't match. This takes a larger than neccessary amount of backtracking to find out. Instead, you could use a reluctant quantifier: .*?. This attempts to match as little as possible. If a match is found, it is found without backtracking. However, it still does a lot of backtracking if there is no match. We can do better: a[^b]*b (at this point, it doesn't really matter if the quantifier is greedy). A possessive quantifier (a[^b]*+b) would be even better, but javascript doesn't support these. Using a character class deprives us of a join, but see the next point.
```
pattern = pattern.split("").reduce(function(a,b){ return a+'[^'+b+']*'+b; });
```

Since you are complaining about an operation that takes about 3000 ns (as noted in the comments), it can be assumed you are doing a lot of queries. If there are very few patterns, you can cache your regexes. Underscore.js has a handy function that I'll demonstrate, but you could easily build the cache yourself.
```
var cache = _.memoize(function(pattern){

  return new RegExp(pattern.split("").reduce(function(a,b){

    return a+'[^'+b+']*'+b;

  })

})

function fuzzy_match(str,pattern){

  return cache(pattern).test(str)

};
```

If there is a lot of repetition among the tested strings, you should use these as a pattern instead, and the pattern as the tested string. This is even easier. I will also demonstrate how to scope your variables using the export pattern. Also, a bugfix must be inserted here (we can't assume all characters in the string are alphanumeric), to properly escape non-alphanumeric characters:
```
var fuzzy_match = (function(){

  var cache = _.memoize(function(str){

    return new RexEgp("^"+str.replace(/./g, function(x){

      return /[-[]/{}()*+?.\^$|]/.test(x) ? "\"+x+"?" : x+"?";

    })+"$");

  })

  return function(str, pattern){

    return cache(str).test(pattern)

  })

})()
```

Concerning the last regex:

Given some pattern "ace", the regex you build (/a.*c.*e/) tests if the string contains the characters of the pattern in the string, in the correct order.

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

edited May 23 '17 at 11:33

Community♦

edited May 23 '17 at 11:33

Community♦

edited May 23 '17 at 11:33

Community♦

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

answered Mar 14 '13 at 12:37

John Dvorak

1,133814

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

1

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

add a comment |

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

1

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

Awesome answer, thank you very much. If not abuse might I ask you to elaborate only a little bit on that last regexp?
– MaiaVictor
Mar 14 '13 at 13:27

@Dokkat explanation added. Proper escaping added.
– John Dvorak
Mar 14 '13 at 13:50

add a comment |

up vote
0
down vote

My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:

In some cases, you might need to know it eg. to make parts of your input bold in the search results

It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

It works like:

fuzzyString('liolor', 'lorem ipsum dolor sit');



// returns

{

  parts: [

    { content: 'l', type: 'input' },

    { content: 'orem ', type: 'fuzzy' },

    { content: 'i', type: 'input' },

    { content: 'psum d', type: 'fuzzy' },

    { content: 'olor', type: 'input' },

    { content: ' sit', type: 'suggestion' },

  ],

  score: 0.87,

}

Here is full implementation (Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';



interface FuzzyMatchPart {

  content: string;

  type: MatchRoleType;

}



interface FuzzyMatchData {

  parts: FuzzyMatchPart;

  score: number;

}



interface FuzzyMatchOptions {

  truncateTooLongInput?: boolean;

  isCaseSesitive?: boolean;

}



function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {

  const getRoleLength = (role: MatchRoleType) =>

    fuzzyMatchParts

      .filter((part) => part.type === role)

      .map((part) => part.content)

      .join('').length;



  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')

    .length;

  const fuzzyLength = getRoleLength('fuzzy');

  const inputLength = getRoleLength('input');

  const suggestionLength = getRoleLength('suggestion');



  return (

    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength

  );

}



function compareLetters(a: string, b: string, isCaseSensitive = false) {

  if (isCaseSensitive) {

    return a === b;

  }

  return a.toLowerCase() === b.toLowerCase();

}



function fuzzyString(

  input: string,

  stringToBeFound: string,

  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},

): FuzzyMatchData | false {

  // make some validation first



  // if input is longer than string to find, and we dont truncate it - it's incorrect

  if (input.length > stringToBeFound.length && !truncateTooLongInput) {

    return false;

  }



  // if truncate is enabled - do it

  if (input.length > stringToBeFound.length && truncateTooLongInput) {

    input = input.substr(0, stringToBeFound.length);

  }



  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match

  if (input === stringToBeFound) {

    return {

      parts: [{ content: input, type: 'input' }],

      score: 1,

    };

  }



  const matchParts: FuzzyMatchPart = ;



  const remainingInputLetters = input.split('');



  // let's create letters buffers

  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row

  // we want to add them together as part of match

  let ommitedLettersBuffer: string = ;

  let matchedLettersBuffer: string = ;



  // helper functions to clear the buffers and add them to match

  function addOmmitedLettersAsFuzzy() {

    if (ommitedLettersBuffer.length > 0) {

      matchParts.push({

        content: ommitedLettersBuffer.join(''),

        type: 'fuzzy',

      });

      ommitedLettersBuffer = ;

    }

  }



  function addMatchedLettersAsInput() {

    if (matchedLettersBuffer.length > 0) {

      matchParts.push({

        content: matchedLettersBuffer.join(''),

        type: 'input',

      });

      matchedLettersBuffer = ;

    }

  }



  for (let anotherStringToBeFoundLetter of stringToBeFound) {

    const inputLetterToMatch = remainingInputLetters[0];



    // no more input - finish fuzzy matching

    if (!inputLetterToMatch) {

      break;

    }



    const isMatching = compareLetters(

      anotherStringToBeFoundLetter,

      inputLetterToMatch,

      isCaseSesitive,

    );



    // if input letter doesnt match - we'll go to the next letter to try again

    if (!isMatching) {

      // add this letter to buffer of ommited letters

      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);

      // in case we had something in matched letters buffer - clear it as matching letters run ended

      addMatchedLettersAsInput();

      // go to the next input letter

      continue;

    }



    // we have input letter matching!



    // remove it from remaining input letters

    remainingInputLetters.shift();



    // add it to matched letters buffer

    matchedLettersBuffer.push(anotherStringToBeFoundLetter);

    // in case we had something in ommited letters buffer - add it to the match now

    addOmmitedLettersAsFuzzy();



    // if there is no more letters in input - add this matched letter to match too

    if (!remainingInputLetters.length) {

      addMatchedLettersAsInput();

    }

  }



  // if we still have letters left in input - means not all input was included in string to find - input was incorrect

  if (remainingInputLetters.length > 0) {

    return false;

  }



  // lets get entire matched part (from start to last letter of input)

  const matchedPart = matchParts.map((match) => match.content).join('');



  // get remaining part of string to be found

  const suggestionPart = stringToBeFound.replace(matchedPart, '');



  // if we have remaining part - add it as suggestion

  if (suggestionPart) {

    matchParts.push({ content: suggestionPart, type: 'suggestion' });

  }

  const score = calculateFuzzyMatchPartsScore(matchParts);



  return {

    score,

    parts: matchParts,

  };

}

answered 18 mins ago

pie6k

101

New contributor

add a comment |

up vote
0
down vote

My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:

In some cases, you might need to know it eg. to make parts of your input bold in the search results

It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

It works like:

fuzzyString('liolor', 'lorem ipsum dolor sit');



// returns

{

  parts: [

    { content: 'l', type: 'input' },

    { content: 'orem ', type: 'fuzzy' },

    { content: 'i', type: 'input' },

    { content: 'psum d', type: 'fuzzy' },

    { content: 'olor', type: 'input' },

    { content: ' sit', type: 'suggestion' },

  ],

  score: 0.87,

}

Here is full implementation (Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';



interface FuzzyMatchPart {

  content: string;

  type: MatchRoleType;

}



interface FuzzyMatchData {

  parts: FuzzyMatchPart;

  score: number;

}



interface FuzzyMatchOptions {

  truncateTooLongInput?: boolean;

  isCaseSesitive?: boolean;

}



function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {

  const getRoleLength = (role: MatchRoleType) =>

    fuzzyMatchParts

      .filter((part) => part.type === role)

      .map((part) => part.content)

      .join('').length;



  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')

    .length;

  const fuzzyLength = getRoleLength('fuzzy');

  const inputLength = getRoleLength('input');

  const suggestionLength = getRoleLength('suggestion');



  return (

    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength

  );

}



function compareLetters(a: string, b: string, isCaseSensitive = false) {

  if (isCaseSensitive) {

    return a === b;

  }

  return a.toLowerCase() === b.toLowerCase();

}



function fuzzyString(

  input: string,

  stringToBeFound: string,

  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},

): FuzzyMatchData | false {

  // make some validation first



  // if input is longer than string to find, and we dont truncate it - it's incorrect

  if (input.length > stringToBeFound.length && !truncateTooLongInput) {

    return false;

  }



  // if truncate is enabled - do it

  if (input.length > stringToBeFound.length && truncateTooLongInput) {

    input = input.substr(0, stringToBeFound.length);

  }



  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match

  if (input === stringToBeFound) {

    return {

      parts: [{ content: input, type: 'input' }],

      score: 1,

    };

  }



  const matchParts: FuzzyMatchPart = ;



  const remainingInputLetters = input.split('');



  // let's create letters buffers

  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row

  // we want to add them together as part of match

  let ommitedLettersBuffer: string = ;

  let matchedLettersBuffer: string = ;



  // helper functions to clear the buffers and add them to match

  function addOmmitedLettersAsFuzzy() {

    if (ommitedLettersBuffer.length > 0) {

      matchParts.push({

        content: ommitedLettersBuffer.join(''),

        type: 'fuzzy',

      });

      ommitedLettersBuffer = ;

    }

  }



  function addMatchedLettersAsInput() {

    if (matchedLettersBuffer.length > 0) {

      matchParts.push({

        content: matchedLettersBuffer.join(''),

        type: 'input',

      });

      matchedLettersBuffer = ;

    }

  }



  for (let anotherStringToBeFoundLetter of stringToBeFound) {

    const inputLetterToMatch = remainingInputLetters[0];



    // no more input - finish fuzzy matching

    if (!inputLetterToMatch) {

      break;

    }



    const isMatching = compareLetters(

      anotherStringToBeFoundLetter,

      inputLetterToMatch,

      isCaseSesitive,

    );



    // if input letter doesnt match - we'll go to the next letter to try again

    if (!isMatching) {

      // add this letter to buffer of ommited letters

      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);

      // in case we had something in matched letters buffer - clear it as matching letters run ended

      addMatchedLettersAsInput();

      // go to the next input letter

      continue;

    }



    // we have input letter matching!



    // remove it from remaining input letters

    remainingInputLetters.shift();



    // add it to matched letters buffer

    matchedLettersBuffer.push(anotherStringToBeFoundLetter);

    // in case we had something in ommited letters buffer - add it to the match now

    addOmmitedLettersAsFuzzy();



    // if there is no more letters in input - add this matched letter to match too

    if (!remainingInputLetters.length) {

      addMatchedLettersAsInput();

    }

  }



  // if we still have letters left in input - means not all input was included in string to find - input was incorrect

  if (remainingInputLetters.length > 0) {

    return false;

  }



  // lets get entire matched part (from start to last letter of input)

  const matchedPart = matchParts.map((match) => match.content).join('');



  // get remaining part of string to be found

  const suggestionPart = stringToBeFound.replace(matchedPart, '');



  // if we have remaining part - add it as suggestion

  if (suggestionPart) {

    matchParts.push({ content: suggestionPart, type: 'suggestion' });

  }

  const score = calculateFuzzyMatchPartsScore(matchParts);



  return {

    score,

    parts: matchParts,

  };

}

answered 18 mins ago

pie6k

101

New contributor

add a comment |

up vote
0
down vote

My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:

In some cases, you might need to know it eg. to make parts of your input bold in the search results

It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

It works like:

fuzzyString('liolor', 'lorem ipsum dolor sit');



// returns

{

  parts: [

    { content: 'l', type: 'input' },

    { content: 'orem ', type: 'fuzzy' },

    { content: 'i', type: 'input' },

    { content: 'psum d', type: 'fuzzy' },

    { content: 'olor', type: 'input' },

    { content: ' sit', type: 'suggestion' },

  ],

  score: 0.87,

}

Here is full implementation (Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';



interface FuzzyMatchPart {

  content: string;

  type: MatchRoleType;

}



interface FuzzyMatchData {

  parts: FuzzyMatchPart;

  score: number;

}



interface FuzzyMatchOptions {

  truncateTooLongInput?: boolean;

  isCaseSesitive?: boolean;

}



function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {

  const getRoleLength = (role: MatchRoleType) =>

    fuzzyMatchParts

      .filter((part) => part.type === role)

      .map((part) => part.content)

      .join('').length;



  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')

    .length;

  const fuzzyLength = getRoleLength('fuzzy');

  const inputLength = getRoleLength('input');

  const suggestionLength = getRoleLength('suggestion');



  return (

    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength

  );

}



function compareLetters(a: string, b: string, isCaseSensitive = false) {

  if (isCaseSensitive) {

    return a === b;

  }

  return a.toLowerCase() === b.toLowerCase();

}



function fuzzyString(

  input: string,

  stringToBeFound: string,

  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},

): FuzzyMatchData | false {

  // make some validation first



  // if input is longer than string to find, and we dont truncate it - it's incorrect

  if (input.length > stringToBeFound.length && !truncateTooLongInput) {

    return false;

  }



  // if truncate is enabled - do it

  if (input.length > stringToBeFound.length && truncateTooLongInput) {

    input = input.substr(0, stringToBeFound.length);

  }



  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match

  if (input === stringToBeFound) {

    return {

      parts: [{ content: input, type: 'input' }],

      score: 1,

    };

  }



  const matchParts: FuzzyMatchPart = ;



  const remainingInputLetters = input.split('');



  // let's create letters buffers

  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row

  // we want to add them together as part of match

  let ommitedLettersBuffer: string = ;

  let matchedLettersBuffer: string = ;



  // helper functions to clear the buffers and add them to match

  function addOmmitedLettersAsFuzzy() {

    if (ommitedLettersBuffer.length > 0) {

      matchParts.push({

        content: ommitedLettersBuffer.join(''),

        type: 'fuzzy',

      });

      ommitedLettersBuffer = ;

    }

  }



  function addMatchedLettersAsInput() {

    if (matchedLettersBuffer.length > 0) {

      matchParts.push({

        content: matchedLettersBuffer.join(''),

        type: 'input',

      });

      matchedLettersBuffer = ;

    }

  }



  for (let anotherStringToBeFoundLetter of stringToBeFound) {

    const inputLetterToMatch = remainingInputLetters[0];



    // no more input - finish fuzzy matching

    if (!inputLetterToMatch) {

      break;

    }



    const isMatching = compareLetters(

      anotherStringToBeFoundLetter,

      inputLetterToMatch,

      isCaseSesitive,

    );



    // if input letter doesnt match - we'll go to the next letter to try again

    if (!isMatching) {

      // add this letter to buffer of ommited letters

      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);

      // in case we had something in matched letters buffer - clear it as matching letters run ended

      addMatchedLettersAsInput();

      // go to the next input letter

      continue;

    }



    // we have input letter matching!



    // remove it from remaining input letters

    remainingInputLetters.shift();



    // add it to matched letters buffer

    matchedLettersBuffer.push(anotherStringToBeFoundLetter);

    // in case we had something in ommited letters buffer - add it to the match now

    addOmmitedLettersAsFuzzy();



    // if there is no more letters in input - add this matched letter to match too

    if (!remainingInputLetters.length) {

      addMatchedLettersAsInput();

    }

  }



  // if we still have letters left in input - means not all input was included in string to find - input was incorrect

  if (remainingInputLetters.length > 0) {

    return false;

  }



  // lets get entire matched part (from start to last letter of input)

  const matchedPart = matchParts.map((match) => match.content).join('');



  // get remaining part of string to be found

  const suggestionPart = stringToBeFound.replace(matchedPart, '');



  // if we have remaining part - add it as suggestion

  if (suggestionPart) {

    matchParts.push({ content: suggestionPart, type: 'suggestion' });

  }

  const score = calculateFuzzyMatchPartsScore(matchParts);



  return {

    score,

    parts: matchParts,

  };

}

answered 18 mins ago

pie6k

101

New contributor

My solution might be not fast enough for you (200k ops/s) :) but beside true/false it also provides informations about the match:

In some cases, you might need to know it eg. to make parts of your input bold in the search results

It's written in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

It works like:

fuzzyString('liolor', 'lorem ipsum dolor sit');



// returns

{

  parts: [

    { content: 'l', type: 'input' },

    { content: 'orem ', type: 'fuzzy' },

    { content: 'i', type: 'input' },

    { content: 'psum d', type: 'fuzzy' },

    { content: 'olor', type: 'input' },

    { content: ' sit', type: 'suggestion' },

  ],

  score: 0.87,

}

Here is full implementation (Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';



interface FuzzyMatchPart {

  content: string;

  type: MatchRoleType;

}



interface FuzzyMatchData {

  parts: FuzzyMatchPart;

  score: number;

}



interface FuzzyMatchOptions {

  truncateTooLongInput?: boolean;

  isCaseSesitive?: boolean;

}



function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart) {

  const getRoleLength = (role: MatchRoleType) =>

    fuzzyMatchParts

      .filter((part) => part.type === role)

      .map((part) => part.content)

      .join('').length;



  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')

    .length;

  const fuzzyLength = getRoleLength('fuzzy');

  const inputLength = getRoleLength('input');

  const suggestionLength = getRoleLength('suggestion');



  return (

    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength

  );

}



function compareLetters(a: string, b: string, isCaseSensitive = false) {

  if (isCaseSensitive) {

    return a === b;

  }

  return a.toLowerCase() === b.toLowerCase();

}



function fuzzyString(

  input: string,

  stringToBeFound: string,

  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},

): FuzzyMatchData | false {

  // make some validation first



  // if input is longer than string to find, and we dont truncate it - it's incorrect

  if (input.length > stringToBeFound.length && !truncateTooLongInput) {

    return false;

  }



  // if truncate is enabled - do it

  if (input.length > stringToBeFound.length && truncateTooLongInput) {

    input = input.substr(0, stringToBeFound.length);

  }



  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match

  if (input === stringToBeFound) {

    return {

      parts: [{ content: input, type: 'input' }],

      score: 1,

    };

  }



  const matchParts: FuzzyMatchPart = ;



  const remainingInputLetters = input.split('');



  // let's create letters buffers

  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row

  // we want to add them together as part of match

  let ommitedLettersBuffer: string = ;

  let matchedLettersBuffer: string = ;



  // helper functions to clear the buffers and add them to match

  function addOmmitedLettersAsFuzzy() {

    if (ommitedLettersBuffer.length > 0) {

      matchParts.push({

        content: ommitedLettersBuffer.join(''),

        type: 'fuzzy',

      });

      ommitedLettersBuffer = ;

    }

  }



  function addMatchedLettersAsInput() {

    if (matchedLettersBuffer.length > 0) {

      matchParts.push({

        content: matchedLettersBuffer.join(''),

        type: 'input',

      });

      matchedLettersBuffer = ;

    }

  }



  for (let anotherStringToBeFoundLetter of stringToBeFound) {

    const inputLetterToMatch = remainingInputLetters[0];



    // no more input - finish fuzzy matching

    if (!inputLetterToMatch) {

      break;

    }



    const isMatching = compareLetters(

      anotherStringToBeFoundLetter,

      inputLetterToMatch,

      isCaseSesitive,

    );



    // if input letter doesnt match - we'll go to the next letter to try again

    if (!isMatching) {

      // add this letter to buffer of ommited letters

      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);

      // in case we had something in matched letters buffer - clear it as matching letters run ended

      addMatchedLettersAsInput();

      // go to the next input letter

      continue;

    }



    // we have input letter matching!



    // remove it from remaining input letters

    remainingInputLetters.shift();



    // add it to matched letters buffer

    matchedLettersBuffer.push(anotherStringToBeFoundLetter);

    // in case we had something in ommited letters buffer - add it to the match now

    addOmmitedLettersAsFuzzy();



    // if there is no more letters in input - add this matched letter to match too

    if (!remainingInputLetters.length) {

      addMatchedLettersAsInput();

    }

  }



  // if we still have letters left in input - means not all input was included in string to find - input was incorrect

  if (remainingInputLetters.length > 0) {

    return false;

  }



  // lets get entire matched part (from start to last letter of input)

  const matchedPart = matchParts.map((match) => match.content).join('');



  // get remaining part of string to be found

  const suggestionPart = stringToBeFound.replace(matchedPart, '');



  // if we have remaining part - add it as suggestion

  if (suggestionPart) {

    matchParts.push({ content: suggestionPart, type: 'suggestion' });

  }

  const score = calculateFuzzyMatchPartsScore(matchParts);



  return {

    score,

    parts: matchParts,

  };

}

answered 18 mins ago

pie6k

101

New contributor

answered 18 mins ago

pie6k

101

New contributor

answered 18 mins ago

pie6k

101

answered 18 mins ago

pie6k

101

New contributor

pie6k is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk