Unique character lookup
up vote
3
down vote
favorite
I'm solving 387. First Unique Character in a String LeetCode problem defined as:
Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
Examples:
s = "leetcode"
return 0.
s = "loveleetcode",
return 2.
Note: You may assume the string contain only lowercase letters.
Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
class Solution {
public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();
for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}
int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}
}
Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each
loop performs better than standard for
loop but I can't prove it. It's an observation based on few of my previous submissions.
java algorithm
New contributor
add a comment |
up vote
3
down vote
favorite
I'm solving 387. First Unique Character in a String LeetCode problem defined as:
Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
Examples:
s = "leetcode"
return 0.
s = "loveleetcode",
return 2.
Note: You may assume the string contain only lowercase letters.
Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
class Solution {
public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();
for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}
int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}
}
Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each
loop performs better than standard for
loop but I can't prove it. It's an observation based on few of my previous submissions.
java algorithm
New contributor
1
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
@BillK ExceptfirstSpot
&secondSpot
are local variables; they may not incur any memory cycles.
– AJNeufeld
yesterday
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I'm solving 387. First Unique Character in a String LeetCode problem defined as:
Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
Examples:
s = "leetcode"
return 0.
s = "loveleetcode",
return 2.
Note: You may assume the string contain only lowercase letters.
Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
class Solution {
public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();
for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}
int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}
}
Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each
loop performs better than standard for
loop but I can't prove it. It's an observation based on few of my previous submissions.
java algorithm
New contributor
I'm solving 387. First Unique Character in a String LeetCode problem defined as:
Given a string, find the first non-repeating character in it and return it's index. If it doesn't exist, return -1.
Examples:
s = "leetcode"
return 0.
s = "loveleetcode",
return 2.
Note: You may assume the string contain only lowercase letters.
Taking advantage of the input being fully lowercase ASCII I created two bit vectors to track when we encounter a character for the first and second time.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
class Solution {
public int firstUniqChar(String s) {
int firstSpot = 0;
int secondSpot = 0;
char chars = s.toCharArray();
for (char c : chars) {
int mask = 1 << c - 'a';
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else if ((secondSpot & mask) == 0) {
secondSpot |= mask;
}
}
int i = 0;
for (char c : chars) {
int mask = 1 << c - 'a';
if ((secondSpot & mask) == 0) {
return i;
}
i++;
}
return -1;
}
}
Are there tricks that can be done to improve the LeetCode score? It seems that enhanced for-each
loop performs better than standard for
loop but I can't prove it. It's an observation based on few of my previous submissions.
java algorithm
java algorithm
New contributor
New contributor
edited yesterday
Jamal♦
30.2k11115226
30.2k11115226
New contributor
asked yesterday
Karol Dowbecki
1163
1163
New contributor
New contributor
1
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
@BillK ExceptfirstSpot
&secondSpot
are local variables; they may not incur any memory cycles.
– AJNeufeld
yesterday
add a comment |
1
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
@BillK ExceptfirstSpot
&secondSpot
are local variables; they may not incur any memory cycles.
– AJNeufeld
yesterday
1
1
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
@BillK Except
firstSpot
& secondSpot
are local variables; they may not incur any memory cycles.– AJNeufeld
yesterday
@BillK Except
firstSpot
& secondSpot
are local variables; they may not incur any memory cycles.– AJNeufeld
yesterday
add a comment |
5 Answers
5
active
oldest
votes
up vote
4
down vote
When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.
That said,
the first/second spot logic looks suboptimal. Consider
if ((firstSpot & mask) != 0) {
secondSpot |= mask;
} else {
firstSpot |= mask;
}
(using
else
may also be redundant).
- It is unclear (to me) how
for (char c : chars)
over the char array fares against other techniques. This could be an entertaining reading.
Yet again, profile.
Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using aNormalizer
first could help?
– Sylvain Leroux
16 hours ago
|
show 1 more comment
up vote
3
down vote
Some suggestions:
- You could get rid of the class and put the entire method contents within your
main
method; that'll at least get rid of an instantiation. - I'm not sure whether
'a'
is converted to an integer at compilation time or runtime. Try using 0x61 instead. - You can avoid converting to a char array.
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
add a comment |
up vote
3
down vote
You don’t need to check if the second spot has been filled in; just unconditionally set it.
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}
Avoiding branching usually helps.
secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;
add a comment |
up vote
0
down vote
I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.
Here is my shot (admittedly in c#).
int firstUniqCharInString(string s)
{
Dictionary<char, int> seen = new Dictionary<char, int>();
int t = -1;
foreach (char c in s)
{
t++;
if (seen.ContainsKey(c))
{
seen[c] = s.Length + 1;
}
else
{
seen.Add(c, t);
}
}
int len = s.Length + 1;
foreach(KeyValuePair<char, int> pair in seen)
{
if(pair.Value < len) {
return pair.Value;
}
}
return (-1);
}
New contributor
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’sS.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start offt
at -1 and pre-increment it. Start at 0 instead.
– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
add a comment |
up vote
0
down vote
Taking advantage of the input being fully lowercase ASCII
While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.
I created two bit vectors to track when we encounter a character for the first and second time.
And that really requires the input to be ASCII. Because char
can hold any value between 0 and 65535 the <<
might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.
But if it's just ASCII characters between 'a'
and 'a' + 31
the approach will work.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".
As far as I see it:
- You don't have any comments,
- you don't have tests,
- your variables have odd names. For example:
firstSpot
is a weird name for a bitarray that represents if a character has been found once, ... - you don't have benchmarks
So my personal opinion is: There's lots of room for improvement.
I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:
You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the
toCharArray
and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).In case there is no single character you might even stop the loop early. As soon as your
secondSpot
bitarray contains an entry for all ASCII characters you can immediately return-1
.As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.
That said,
the first/second spot logic looks suboptimal. Consider
if ((firstSpot & mask) != 0) {
secondSpot |= mask;
} else {
firstSpot |= mask;
}
(using
else
may also be redundant).
- It is unclear (to me) how
for (char c : chars)
over the char array fares against other techniques. This could be an entertaining reading.
Yet again, profile.
Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using aNormalizer
first could help?
– Sylvain Leroux
16 hours ago
|
show 1 more comment
up vote
4
down vote
When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.
That said,
the first/second spot logic looks suboptimal. Consider
if ((firstSpot & mask) != 0) {
secondSpot |= mask;
} else {
firstSpot |= mask;
}
(using
else
may also be redundant).
- It is unclear (to me) how
for (char c : chars)
over the char array fares against other techniques. This could be an entertaining reading.
Yet again, profile.
Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using aNormalizer
first could help?
– Sylvain Leroux
16 hours ago
|
show 1 more comment
up vote
4
down vote
up vote
4
down vote
When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.
That said,
the first/second spot logic looks suboptimal. Consider
if ((firstSpot & mask) != 0) {
secondSpot |= mask;
} else {
firstSpot |= mask;
}
(using
else
may also be redundant).
- It is unclear (to me) how
for (char c : chars)
over the char array fares against other techniques. This could be an entertaining reading.
Yet again, profile.
Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.
When in doubt, profile. We may theorize about possible optimizations, but only the profiler can definitely say where the program does spend time.
That said,
the first/second spot logic looks suboptimal. Consider
if ((firstSpot & mask) != 0) {
secondSpot |= mask;
} else {
firstSpot |= mask;
}
(using
else
may also be redundant).
- It is unclear (to me) how
for (char c : chars)
over the char array fares against other techniques. This could be an entertaining reading.
Yet again, profile.
Strictly speaking, the challenge only says the string contain only lowercase letters, but does not specify the locale.
edited yesterday
answered yesterday
vnp
38.2k13096
38.2k13096
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using aNormalizer
first could help?
– Sylvain Leroux
16 hours ago
|
show 1 more comment
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using aNormalizer
first could help?
– Sylvain Leroux
16 hours ago
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld OK. Got carried away.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
@AJNeufeld I should not RUI.
– vnp
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
RUI? Is that “Reply Under the Influence”?
– AJNeufeld
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@AJNeufeld Almost. "Review Under the Influence".
– vnp
yesterday
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a
Normalizer
first could help?– Sylvain Leroux
16 hours ago
@vnp I Would be curious to know how you handle accented letters (which may or may not be built using combining characters or am I wrong)? Maybe using a
Normalizer
first could help?– Sylvain Leroux
16 hours ago
|
show 1 more comment
up vote
3
down vote
Some suggestions:
- You could get rid of the class and put the entire method contents within your
main
method; that'll at least get rid of an instantiation. - I'm not sure whether
'a'
is converted to an integer at compilation time or runtime. Try using 0x61 instead. - You can avoid converting to a char array.
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
add a comment |
up vote
3
down vote
Some suggestions:
- You could get rid of the class and put the entire method contents within your
main
method; that'll at least get rid of an instantiation. - I'm not sure whether
'a'
is converted to an integer at compilation time or runtime. Try using 0x61 instead. - You can avoid converting to a char array.
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
add a comment |
up vote
3
down vote
up vote
3
down vote
Some suggestions:
- You could get rid of the class and put the entire method contents within your
main
method; that'll at least get rid of an instantiation. - I'm not sure whether
'a'
is converted to an integer at compilation time or runtime. Try using 0x61 instead. - You can avoid converting to a char array.
Some suggestions:
- You could get rid of the class and put the entire method contents within your
main
method; that'll at least get rid of an instantiation. - I'm not sure whether
'a'
is converted to an integer at compilation time or runtime. Try using 0x61 instead. - You can avoid converting to a char array.
edited yesterday
answered yesterday
l0b0
3,877923
3,877923
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
add a comment |
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
LeetCode gives a method signature to fill, I don't believe I can change that.
– Karol Dowbecki
yesterday
add a comment |
up vote
3
down vote
You don’t need to check if the second spot has been filled in; just unconditionally set it.
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}
Avoiding branching usually helps.
secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;
add a comment |
up vote
3
down vote
You don’t need to check if the second spot has been filled in; just unconditionally set it.
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}
Avoiding branching usually helps.
secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;
add a comment |
up vote
3
down vote
up vote
3
down vote
You don’t need to check if the second spot has been filled in; just unconditionally set it.
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}
Avoiding branching usually helps.
secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;
You don’t need to check if the second spot has been filled in; just unconditionally set it.
if ((firstSpot & mask) == 0) {
firstSpot |= mask;
} else {
secondSpot |= mask;
}
Avoiding branching usually helps.
secondSpot |= mask & firstSpot; // sets second spot bit unless first spot bit is still zero
firstSpot |= mask;
edited yesterday
answered yesterday
AJNeufeld
3,585317
3,585317
add a comment |
add a comment |
up vote
0
down vote
I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.
Here is my shot (admittedly in c#).
int firstUniqCharInString(string s)
{
Dictionary<char, int> seen = new Dictionary<char, int>();
int t = -1;
foreach (char c in s)
{
t++;
if (seen.ContainsKey(c))
{
seen[c] = s.Length + 1;
}
else
{
seen.Add(c, t);
}
}
int len = s.Length + 1;
foreach(KeyValuePair<char, int> pair in seen)
{
if(pair.Value < len) {
return pair.Value;
}
}
return (-1);
}
New contributor
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’sS.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start offt
at -1 and pre-increment it. Start at 0 instead.
– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
add a comment |
up vote
0
down vote
I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.
Here is my shot (admittedly in c#).
int firstUniqCharInString(string s)
{
Dictionary<char, int> seen = new Dictionary<char, int>();
int t = -1;
foreach (char c in s)
{
t++;
if (seen.ContainsKey(c))
{
seen[c] = s.Length + 1;
}
else
{
seen.Add(c, t);
}
}
int len = s.Length + 1;
foreach(KeyValuePair<char, int> pair in seen)
{
if(pair.Value < len) {
return pair.Value;
}
}
return (-1);
}
New contributor
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’sS.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start offt
at -1 and pre-increment it. Start at 0 instead.
– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.
Here is my shot (admittedly in c#).
int firstUniqCharInString(string s)
{
Dictionary<char, int> seen = new Dictionary<char, int>();
int t = -1;
foreach (char c in s)
{
t++;
if (seen.ContainsKey(c))
{
seen[c] = s.Length + 1;
}
else
{
seen.Add(c, t);
}
}
int len = s.Length + 1;
foreach(KeyValuePair<char, int> pair in seen)
{
if(pair.Value < len) {
return pair.Value;
}
}
return (-1);
}
New contributor
I'd use a dictionary. Loop once over the characters in the string. For each found character, create an entry in the dictionary, the character being the key, the value being the index. If a character exists in the dictionary, set the value to [length of word] + 1.
When all characters are seen, loop over the dictionary. Pick the least (first by my testing) index encountered less than [length of word] + 1 and return it. If no value is returned, return -1.
Here is my shot (admittedly in c#).
int firstUniqCharInString(string s)
{
Dictionary<char, int> seen = new Dictionary<char, int>();
int t = -1;
foreach (char c in s)
{
t++;
if (seen.ContainsKey(c))
{
seen[c] = s.Length + 1;
}
else
{
seen.Add(c, t);
}
}
int len = s.Length + 1;
foreach(KeyValuePair<char, int> pair in seen)
{
if(pair.Value < len) {
return pair.Value;
}
}
return (-1);
}
New contributor
New contributor
answered 15 hours ago
user185166
1
1
New contributor
New contributor
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’sS.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start offt
at -1 and pre-increment it. Start at 0 instead.
– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
add a comment |
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’sS.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start offt
at -1 and pre-increment it. Start at 0 instead.
– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.
s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t
at -1 and pre-increment it. Start at 0 instead.– Konrad Rudolph
12 hours ago
A generic dictionary implementation (which, in this case, is a hash table) will never beat direct lookup in a contiguous array or bit vector. Consequently, this will be much slower than OP’s solution. Apart from that you don’t need to add 1 to the string length.
s.Length
alone is sufficient. Thirdly, you are performing two lookups per character, which is also inefficient. Unfortunately C#’s S.C.G.Dictionary
does not contain any way of reducing this to a single lookup but other languages/APIs do. Lastly, it’s unclear why you start off t
at -1 and pre-increment it. Start at 0 instead.– Konrad Rudolph
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process.
– Mast
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
I was trying to avoid checking each character twice, the main point of my answer. Looking at the accepted answer here: stackoverflow.com/questions/53383866/… checking twice might be the way to go anyway. Even after slightly optimizing my code after Konrads point, I think my code will be less efficient than the OP's. Would explaining my idea help anyway? I would downvote my answer or just remove it altogether.
– user185166
12 hours ago
add a comment |
up vote
0
down vote
Taking advantage of the input being fully lowercase ASCII
While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.
I created two bit vectors to track when we encounter a character for the first and second time.
And that really requires the input to be ASCII. Because char
can hold any value between 0 and 65535 the <<
might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.
But if it's just ASCII characters between 'a'
and 'a' + 31
the approach will work.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".
As far as I see it:
- You don't have any comments,
- you don't have tests,
- your variables have odd names. For example:
firstSpot
is a weird name for a bitarray that represents if a character has been found once, ... - you don't have benchmarks
So my personal opinion is: There's lots of room for improvement.
I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:
You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the
toCharArray
and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).In case there is no single character you might even stop the loop early. As soon as your
secondSpot
bitarray contains an entry for all ASCII characters you can immediately return-1
.As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.
add a comment |
up vote
0
down vote
Taking advantage of the input being fully lowercase ASCII
While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.
I created two bit vectors to track when we encounter a character for the first and second time.
And that really requires the input to be ASCII. Because char
can hold any value between 0 and 65535 the <<
might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.
But if it's just ASCII characters between 'a'
and 'a' + 31
the approach will work.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".
As far as I see it:
- You don't have any comments,
- you don't have tests,
- your variables have odd names. For example:
firstSpot
is a weird name for a bitarray that represents if a character has been found once, ... - you don't have benchmarks
So my personal opinion is: There's lots of room for improvement.
I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:
You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the
toCharArray
and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).In case there is no single character you might even stop the loop early. As soon as your
secondSpot
bitarray contains an entry for all ASCII characters you can immediately return-1
.As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.
add a comment |
up vote
0
down vote
up vote
0
down vote
Taking advantage of the input being fully lowercase ASCII
While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.
I created two bit vectors to track when we encounter a character for the first and second time.
And that really requires the input to be ASCII. Because char
can hold any value between 0 and 65535 the <<
might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.
But if it's just ASCII characters between 'a'
and 'a' + 31
the approach will work.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".
As far as I see it:
- You don't have any comments,
- you don't have tests,
- your variables have odd names. For example:
firstSpot
is a weird name for a bitarray that represents if a character has been found once, ... - you don't have benchmarks
So my personal opinion is: There's lots of room for improvement.
I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:
You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the
toCharArray
and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).In case there is no single character you might even stop the loop early. As soon as your
secondSpot
bitarray contains an entry for all ASCII characters you can immediately return-1
.As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.
Taking advantage of the input being fully lowercase ASCII
While it says that all characters are lowercase it doesn't say anything about ASCII. Given that it passed the tests that was okay, but generally lowercase doesn't imply ASCII.
I created two bit vectors to track when we encounter a character for the first and second time.
And that really requires the input to be ASCII. Because char
can hold any value between 0 and 65535 the <<
might not be safe to use, I'm not a Java programmer but it could result in an exception, an unwanted result or undefined behavior.
But if it's just ASCII characters between 'a'
and 'a' + 31
the approach will work.
Can this code be improved further? LeetCode says that below code is better than 94.33% solutions. What else could have been done by the last 5.67% solutions that they were better?
It seems like your definition of "improvement" is "better performance". But that again is probably not how every programmer defines "improvement".
As far as I see it:
- You don't have any comments,
- you don't have tests,
- your variables have odd names. For example:
firstSpot
is a weird name for a bitarray that represents if a character has been found once, ... - you don't have benchmarks
So my personal opinion is: There's lots of room for improvement.
I also have some thoughts about the algorithm and how it could be improved. As said I'm not a Java programmer so I'm just sharing some thoughts:
You iterate over the string three times while it would be possible to iterate only once. You could create an array of integers (26 elements long) to store the first occurrence of each character and after the loop you look at the elements that only occurred once and then take the minimum of the indices. Since you iterate with index you can remove the
toCharArray
and index into the string directly (which incidentally may be a faster way to iterate over the string altogether, according to this StackOverflow Q+A).In case there is no single character you might even stop the loop early. As soon as your
secondSpot
bitarray contains an entry for all ASCII characters you can immediately return-1
.As the others mentioned you could even drop some of the "branches" inside the loop which may give some performance improvement.
edited 10 hours ago
answered 11 hours ago
MSeifert
570416
570416
add a comment |
add a comment |
Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.
Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.
Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.
Karol Dowbecki is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208020%2funique-character-lookup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I would not recommend the bitfield unless you need it for memory usage--an int array will perform better than bits (Writing bitfields requires the compiler to read/update/write to memory instead of just write).
– Bill K
yesterday
@BillK Except
firstSpot
&secondSpot
are local variables; they may not incur any memory cycles.– AJNeufeld
yesterday