Unique character lookup











up vote
3
down vote

favorite
1












I'm solving 387. First Unique Character in a String LeetCode problem defined as:




Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.



Examples:



s = "leetcode"
return 0.

s = "loveleetcode",
return 2.


Note: You may assume the string contain only lowercase letters.




Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.



Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?



class Solution {

public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();

for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}

int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}

}


LeetCode score



Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each loop performs better than standard for loop but I can't prove it. It's an observation based on few of my previous submissions.










share|improve this question









New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
    – Bill K
    yesterday










  • @BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
    – AJNeufeld
    yesterday















up vote
3
down vote

favorite
1












I'm solving 387. First Unique Character in a String LeetCode problem defined as:




Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.



Examples:



s = "leetcode"
return 0.

s = "loveleetcode",
return 2.


Note: You may assume the string contain only lowercase letters.




Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.



Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?



class Solution {

public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();

for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}

int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}

}


LeetCode score



Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each loop performs better than standard for loop but I can't prove it. It's an observation based on few of my previous submissions.










share|improve this question









New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
    – Bill K
    yesterday










  • @BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
    – AJNeufeld
    yesterday













up vote
3
down vote

favorite
1









up vote
3
down vote

favorite
1






1





I'm solving 387. First Unique Character in a String LeetCode problem defined as:




Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.



Examples:



s = "leetcode"
return 0.

s = "loveleetcode",
return 2.


Note: You may assume the string contain only lowercase letters.




Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.



Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?



class Solution {

public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();

for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}

int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}

}


LeetCode score



Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each loop performs better than standard for loop but I can't prove it. It's an observation based on few of my previous submissions.










share|improve this question









New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I'm solving 387. First Unique Character in a String LeetCode problem defined as:




Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.



Examples:



s = "leetcode"
return 0.

s = "loveleetcode",
return 2.


Note: You may assume the string contain only lowercase letters.




Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.



Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?



class Solution {

public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();

for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}

int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}

}


LeetCode score



Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each loop performs better than standard for loop but I can't prove it. It's an observation based on few of my previous submissions.







java algorithm






share|improve this question









New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









Jamal

30.2k11115226




30.2k11115226






New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









Karol Dowbecki

1163




1163




New contributor




Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Karol Dowbecki is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
    – Bill K
    yesterday










  • @BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
    – AJNeufeld
    yesterday














  • 1




    I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
    – Bill K
    yesterday










  • @BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
    – AJNeufeld
    yesterday








1




1




I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday




I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday












@BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
– AJNeufeld
yesterday




@BillK Except firstSpot & secondSpot are local variables; they may not incur any memory cycles.
– AJNeufeld
yesterday










5 Answers
5






active

oldest

votes

















up vote
4
down vote














  • When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.



  • That said,





    • the first/second spot logic looks suboptimal. Consider



      if ((firstSpot & mask) != 0) {
      secondSpot |= mask;
      } else {
      firstSpot |= mask;
      }



    (using else may also be redundant).




    • It is unclear (to me) how for (char c : chars) over the char array fares against other techniques. This could be an entertaining reading.


    Yet again, profile.



  • Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.







share|improve this answer























  • @AJNeufeld OK. Got carried away.
    – vnp
    yesterday










  • @AJNeufeld I should not RUI.
    – vnp
    yesterday










  • RUI? Is that “Reply Under the Influence”?
    – AJNeufeld
    yesterday










  • @AJNeufeld Almost. "Review Under the Influence".
    – vnp
    yesterday










  • @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
    – Sylvain Leroux
    16 hours ago




















up vote
3
down vote













Some suggestions:




  • You could get rid of the class and put the entire method contents within your main method; that'll at least get rid of an instantiation.

  • I'm not sure whether 'a' is converted to an integer at compilation time or runtime. Try using 0x61 instead.

  • You can avoid converting to a char array.






share|improve this answer























  • LeetCode gives a method signature to fill, I don't believe I can change that.
    – Karol Dowbecki
    yesterday




















up vote
3
down vote













You don’t need to check if the second spot has been filled in; just unconditionally set it.



if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}




Avoiding branching usually helps.



secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;





share|improve this answer






























    up vote
    0
    down vote













    I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
    When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.



    Here is my shot (admittedly in c#).



    int firstUniqCharInString(string s)
    {
    Dictionary<char, int> seen = new Dictionary<char, int>();
    int t = -1;
    foreach (char c in s)
    {
    t++;
    if (seen.ContainsKey(c))
    {
    seen[c] = s.Length + 1;
    }
    else
    {
    seen.Add(c, t);
    }

    }
    int len = s.Length + 1;
    foreach(KeyValuePair<char, int> pair in seen)
    {
    if(pair.Value < len) {
    return pair.Value;
    }
    }
    return (-1);
    }





    share|improve this answer








    New contributor




    user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















    • A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
      – Konrad Rudolph
      12 hours ago












    • You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
      – Mast
      12 hours ago










    • I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
      – user185166
      12 hours ago


















    up vote
    0
    down vote














    Taking advantage of the input being fully lowercase ASCII




    While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.




    I created two bit vectors to track when we encounter a character for the first and second time.




    And that really requires the input to be ASCII. Because char can hold any value between 0 and 65535 the << might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.



    But if it's just ASCII characters between 'a' and 'a' + 31 the approach will work.




    Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?




    It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".



    As far as I see it:




    • You don't have any comments,

    • you don't have tests,

    • your variables have odd names. For example: firstSpot is a weird name for a bitarray that represents if a character has been found once, ...

    • you don't have benchmarks


    So my personal opinion is: There's lots of room for improvement.





    I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:




    • You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the toCharArray and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).


    • In case there is no single character you might even stop the loop early. As soon as your secondSpot bitarray contains an entry for all ASCII characters you can immediately return -1.


    • As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.







    share|improve this answer























      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "196"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.










       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208020%2funique-character-lookup%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      4
      down vote














      • When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.



      • That said,





        • the first/second spot logic looks suboptimal. Consider



          if ((firstSpot & mask) != 0) {
          secondSpot |= mask;
          } else {
          firstSpot |= mask;
          }



        (using else may also be redundant).




        • It is unclear (to me) how for (char c : chars) over the char array fares against other techniques. This could be an entertaining reading.


        Yet again, profile.



      • Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.







      share|improve this answer























      • @AJNeufeld OK. Got carried away.
        – vnp
        yesterday










      • @AJNeufeld I should not RUI.
        – vnp
        yesterday










      • RUI? Is that “Reply Under the Influence”?
        – AJNeufeld
        yesterday










      • @AJNeufeld Almost. "Review Under the Influence".
        – vnp
        yesterday










      • @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
        – Sylvain Leroux
        16 hours ago

















      up vote
      4
      down vote














      • When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.



      • That said,





        • the first/second spot logic looks suboptimal. Consider



          if ((firstSpot & mask) != 0) {
          secondSpot |= mask;
          } else {
          firstSpot |= mask;
          }



        (using else may also be redundant).




        • It is unclear (to me) how for (char c : chars) over the char array fares against other techniques. This could be an entertaining reading.


        Yet again, profile.



      • Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.







      share|improve this answer























      • @AJNeufeld OK. Got carried away.
        – vnp
        yesterday










      • @AJNeufeld I should not RUI.
        – vnp
        yesterday










      • RUI? Is that “Reply Under the Influence”?
        – AJNeufeld
        yesterday










      • @AJNeufeld Almost. "Review Under the Influence".
        – vnp
        yesterday










      • @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
        – Sylvain Leroux
        16 hours ago















      up vote
      4
      down vote










      up vote
      4
      down vote










      • When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.



      • That said,





        • the first/second spot logic looks suboptimal. Consider



          if ((firstSpot & mask) != 0) {
          secondSpot |= mask;
          } else {
          firstSpot |= mask;
          }



        (using else may also be redundant).




        • It is unclear (to me) how for (char c : chars) over the char array fares against other techniques. This could be an entertaining reading.


        Yet again, profile.



      • Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.







      share|improve this answer















      • When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.



      • That said,





        • the first/second spot logic looks suboptimal. Consider



          if ((firstSpot & mask) != 0) {
          secondSpot |= mask;
          } else {
          firstSpot |= mask;
          }



        (using else may also be redundant).




        • It is unclear (to me) how for (char c : chars) over the char array fares against other techniques. This could be an entertaining reading.


        Yet again, profile.



      • Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.








      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited yesterday

























      answered yesterday









      vnp

      38.2k13096




      38.2k13096












      • @AJNeufeld OK. Got carried away.
        – vnp
        yesterday










      • @AJNeufeld I should not RUI.
        – vnp
        yesterday










      • RUI? Is that “Reply Under the Influence”?
        – AJNeufeld
        yesterday










      • @AJNeufeld Almost. "Review Under the Influence".
        – vnp
        yesterday










      • @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
        – Sylvain Leroux
        16 hours ago




















      • @AJNeufeld OK. Got carried away.
        – vnp
        yesterday










      • @AJNeufeld I should not RUI.
        – vnp
        yesterday










      • RUI? Is that “Reply Under the Influence”?
        – AJNeufeld
        yesterday










      • @AJNeufeld Almost. "Review Under the Influence".
        – vnp
        yesterday










      • @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
        – Sylvain Leroux
        16 hours ago


















      @AJNeufeld OK. Got carried away.
      – vnp
      yesterday




      @AJNeufeld OK. Got carried away.
      – vnp
      yesterday












      @AJNeufeld I should not RUI.
      – vnp
      yesterday




      @AJNeufeld I should not RUI.
      – vnp
      yesterday












      RUI? Is that “Reply Under the Influence”?
      – AJNeufeld
      yesterday




      RUI? Is that “Reply Under the Influence”?
      – AJNeufeld
      yesterday












      @AJNeufeld Almost. "Review Under the Influence".
      – vnp
      yesterday




      @AJNeufeld Almost. "Review Under the Influence".
      – vnp
      yesterday












      @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
      – Sylvain Leroux
      16 hours ago






      @vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a Normalizer first could help?
      – Sylvain Leroux
      16 hours ago














      up vote
      3
      down vote













      Some suggestions:




      • You could get rid of the class and put the entire method contents within your main method; that'll at least get rid of an instantiation.

      • I'm not sure whether 'a' is converted to an integer at compilation time or runtime. Try using 0x61 instead.

      • You can avoid converting to a char array.






      share|improve this answer























      • LeetCode gives a method signature to fill, I don't believe I can change that.
        – Karol Dowbecki
        yesterday

















      up vote
      3
      down vote













      Some suggestions:




      • You could get rid of the class and put the entire method contents within your main method; that'll at least get rid of an instantiation.

      • I'm not sure whether 'a' is converted to an integer at compilation time or runtime. Try using 0x61 instead.

      • You can avoid converting to a char array.






      share|improve this answer























      • LeetCode gives a method signature to fill, I don't believe I can change that.
        – Karol Dowbecki
        yesterday















      up vote
      3
      down vote










      up vote
      3
      down vote









      Some suggestions:




      • You could get rid of the class and put the entire method contents within your main method; that'll at least get rid of an instantiation.

      • I'm not sure whether 'a' is converted to an integer at compilation time or runtime. Try using 0x61 instead.

      • You can avoid converting to a char array.






      share|improve this answer














      Some suggestions:




      • You could get rid of the class and put the entire method contents within your main method; that'll at least get rid of an instantiation.

      • I'm not sure whether 'a' is converted to an integer at compilation time or runtime. Try using 0x61 instead.

      • You can avoid converting to a char array.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited yesterday

























      answered yesterday









      l0b0

      3,877923




      3,877923












      • LeetCode gives a method signature to fill, I don't believe I can change that.
        – Karol Dowbecki
        yesterday




















      • LeetCode gives a method signature to fill, I don't believe I can change that.
        – Karol Dowbecki
        yesterday


















      LeetCode gives a method signature to fill, I don't believe I can change that.
      – Karol Dowbecki
      yesterday






      LeetCode gives a method signature to fill, I don't believe I can change that.
      – Karol Dowbecki
      yesterday












      up vote
      3
      down vote













      You don’t need to check if the second spot has been filled in; just unconditionally set it.



      if ((firstSpot & mask) == 0) {
      firstSpot |= mask;
      } else {
      secondSpot |= mask;
      }




      Avoiding branching usually helps.



      secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
      firstSpot |= mask;





      share|improve this answer



























        up vote
        3
        down vote













        You don’t need to check if the second spot has been filled in; just unconditionally set it.



        if ((firstSpot & mask) == 0) {
        firstSpot |= mask;
        } else {
        secondSpot |= mask;
        }




        Avoiding branching usually helps.



        secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
        firstSpot |= mask;





        share|improve this answer

























          up vote
          3
          down vote










          up vote
          3
          down vote









          You don’t need to check if the second spot has been filled in; just unconditionally set it.



          if ((firstSpot & mask) == 0) {
          firstSpot |= mask;
          } else {
          secondSpot |= mask;
          }




          Avoiding branching usually helps.



          secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
          firstSpot |= mask;





          share|improve this answer














          You don’t need to check if the second spot has been filled in; just unconditionally set it.



          if ((firstSpot & mask) == 0) {
          firstSpot |= mask;
          } else {
          secondSpot |= mask;
          }




          Avoiding branching usually helps.



          secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
          firstSpot |= mask;






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          AJNeufeld

          3,585317




          3,585317






















              up vote
              0
              down vote













              I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
              When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.



              Here is my shot (admittedly in c#).



              int firstUniqCharInString(string s)
              {
              Dictionary<char, int> seen = new Dictionary<char, int>();
              int t = -1;
              foreach (char c in s)
              {
              t++;
              if (seen.ContainsKey(c))
              {
              seen[c] = s.Length + 1;
              }
              else
              {
              seen.Add(c, t);
              }

              }
              int len = s.Length + 1;
              foreach(KeyValuePair<char, int> pair in seen)
              {
              if(pair.Value < len) {
              return pair.Value;
              }
              }
              return (-1);
              }





              share|improve this answer








              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.


















              • A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
                – Konrad Rudolph
                12 hours ago












              • You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
                – Mast
                12 hours ago










              • I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
                – user185166
                12 hours ago















              up vote
              0
              down vote













              I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
              When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.



              Here is my shot (admittedly in c#).



              int firstUniqCharInString(string s)
              {
              Dictionary<char, int> seen = new Dictionary<char, int>();
              int t = -1;
              foreach (char c in s)
              {
              t++;
              if (seen.ContainsKey(c))
              {
              seen[c] = s.Length + 1;
              }
              else
              {
              seen.Add(c, t);
              }

              }
              int len = s.Length + 1;
              foreach(KeyValuePair<char, int> pair in seen)
              {
              if(pair.Value < len) {
              return pair.Value;
              }
              }
              return (-1);
              }





              share|improve this answer








              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.


















              • A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
                – Konrad Rudolph
                12 hours ago












              • You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
                – Mast
                12 hours ago










              • I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
                – user185166
                12 hours ago













              up vote
              0
              down vote










              up vote
              0
              down vote









              I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
              When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.



              Here is my shot (admittedly in c#).



              int firstUniqCharInString(string s)
              {
              Dictionary<char, int> seen = new Dictionary<char, int>();
              int t = -1;
              foreach (char c in s)
              {
              t++;
              if (seen.ContainsKey(c))
              {
              seen[c] = s.Length + 1;
              }
              else
              {
              seen.Add(c, t);
              }

              }
              int len = s.Length + 1;
              foreach(KeyValuePair<char, int> pair in seen)
              {
              if(pair.Value < len) {
              return pair.Value;
              }
              }
              return (-1);
              }





              share|improve this answer








              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
              When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.



              Here is my shot (admittedly in c#).



              int firstUniqCharInString(string s)
              {
              Dictionary<char, int> seen = new Dictionary<char, int>();
              int t = -1;
              foreach (char c in s)
              {
              t++;
              if (seen.ContainsKey(c))
              {
              seen[c] = s.Length + 1;
              }
              else
              {
              seen.Add(c, t);
              }

              }
              int len = s.Length + 1;
              foreach(KeyValuePair<char, int> pair in seen)
              {
              if(pair.Value < len) {
              return pair.Value;
              }
              }
              return (-1);
              }






              share|improve this answer








              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              share|improve this answer



              share|improve this answer






              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              answered 15 hours ago









              user185166

              1




              1




              New contributor




              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





              New contributor





              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              user185166 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.












              • A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
                – Konrad Rudolph
                12 hours ago












              • You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
                – Mast
                12 hours ago










              • I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
                – user185166
                12 hours ago


















              • A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
                – Konrad Rudolph
                12 hours ago












              • You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
                – Mast
                12 hours ago










              • I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
                – user185166
                12 hours ago
















              A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
              – Konrad Rudolph
              12 hours ago






              A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length. s.Length alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t at -1 and pre-increment it. Start at 0 instead.
              – Konrad Rudolph
              12 hours ago














              You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
              – Mast
              12 hours ago




              You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
              – Mast
              12 hours ago












              I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
              – user185166
              12 hours ago




              I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
              – user185166
              12 hours ago










              up vote
              0
              down vote














              Taking advantage of the input being fully lowercase ASCII




              While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.




              I created two bit vectors to track when we encounter a character for the first and second time.




              And that really requires the input to be ASCII. Because char can hold any value between 0 and 65535 the << might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.



              But if it's just ASCII characters between 'a' and 'a' + 31 the approach will work.




              Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?




              It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".



              As far as I see it:




              • You don't have any comments,

              • you don't have tests,

              • your variables have odd names. For example: firstSpot is a weird name for a bitarray that represents if a character has been found once, ...

              • you don't have benchmarks


              So my personal opinion is: There's lots of room for improvement.





              I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:




              • You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the toCharArray and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).


              • In case there is no single character you might even stop the loop early. As soon as your secondSpot bitarray contains an entry for all ASCII characters you can immediately return -1.


              • As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.







              share|improve this answer



























                up vote
                0
                down vote














                Taking advantage of the input being fully lowercase ASCII




                While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.




                I created two bit vectors to track when we encounter a character for the first and second time.




                And that really requires the input to be ASCII. Because char can hold any value between 0 and 65535 the << might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.



                But if it's just ASCII characters between 'a' and 'a' + 31 the approach will work.




                Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?




                It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".



                As far as I see it:




                • You don't have any comments,

                • you don't have tests,

                • your variables have odd names. For example: firstSpot is a weird name for a bitarray that represents if a character has been found once, ...

                • you don't have benchmarks


                So my personal opinion is: There's lots of room for improvement.





                I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:




                • You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the toCharArray and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).


                • In case there is no single character you might even stop the loop early. As soon as your secondSpot bitarray contains an entry for all ASCII characters you can immediately return -1.


                • As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.







                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote










                  Taking advantage of the input being fully lowercase ASCII




                  While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.




                  I created two bit vectors to track when we encounter a character for the first and second time.




                  And that really requires the input to be ASCII. Because char can hold any value between 0 and 65535 the << might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.



                  But if it's just ASCII characters between 'a' and 'a' + 31 the approach will work.




                  Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?




                  It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".



                  As far as I see it:




                  • You don't have any comments,

                  • you don't have tests,

                  • your variables have odd names. For example: firstSpot is a weird name for a bitarray that represents if a character has been found once, ...

                  • you don't have benchmarks


                  So my personal opinion is: There's lots of room for improvement.





                  I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:




                  • You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the toCharArray and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).


                  • In case there is no single character you might even stop the loop early. As soon as your secondSpot bitarray contains an entry for all ASCII characters you can immediately return -1.


                  • As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.







                  share|improve this answer















                  Taking advantage of the input being fully lowercase ASCII




                  While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.




                  I created two bit vectors to track when we encounter a character for the first and second time.




                  And that really requires the input to be ASCII. Because char can hold any value between 0 and 65535 the << might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.



                  But if it's just ASCII characters between 'a' and 'a' + 31 the approach will work.




                  Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?




                  It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".



                  As far as I see it:




                  • You don't have any comments,

                  • you don't have tests,

                  • your variables have odd names. For example: firstSpot is a weird name for a bitarray that represents if a character has been found once, ...

                  • you don't have benchmarks


                  So my personal opinion is: There's lots of room for improvement.





                  I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:




                  • You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the toCharArray and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).


                  • In case there is no single character you might even stop the loop early. As soon as your secondSpot bitarray contains an entry for all ASCII characters you can immediately return -1.


                  • As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.








                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 10 hours ago

























                  answered 11 hours ago









                  MSeifert

                  570416




                  570416






















                      Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.










                       

                      draft saved


                      draft discarded


















                      Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.













                      Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.












                      Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.















                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208020%2funique-character-lookup%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Costa Masnaga

                      Fotorealismo

                      Sidney Franklin