Split a string into a list of tokens
up vote
2
down vote
favorite
I have the following assignment that I succeeded in solving, but the code is very inefficient. I would appreciate if someone could show me a more efficient way, perhaps with substring. Note that I am not allowed to use imports or regexes or add more functions.
/**
* Separates a given string into tokens, which are the "words" that are
* separated by one or more occurrences of the given separator character.
* Returns the tokens as an array of String values.
*/
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
String tokens = new String[charRunCount (source,separator)+1];
String tmp = ""; // a string in order to take a word, then run over this string
int j = 0;
int i = 0;
while (i < tokens.length) {
if ( source.charAt (j) != separator ) {
do {
tmp += source.charAt (j);
if ( j >= source.length () - 1 ) {
break;
}
else { // so that we math the source length
j++;
}
} while (source.charAt (j) != separator);
}
if ( source.charAt (j) == separator ) {
j++;
while (source.charAt (j) == separator) {
j++;
}
}
tokens[i] = tmp;// taking the token into place
tmp = ""; //resetting the token so we can begin anew
i++;
}
return tokens;
}
the charRunCount() function:
public static int charRunCount(String str, char c){
char last = 0;
int counter = 0;
for (int i = 0; i < str.length(); i++) {
// whenever a run starts.
if (last != c && str.charAt(i) == c) {
counter++;
}
last = str.charAt(i);
}
return counter;
}
java performance strings parsing reinventing-the-wheel
add a comment |
up vote
2
down vote
favorite
I have the following assignment that I succeeded in solving, but the code is very inefficient. I would appreciate if someone could show me a more efficient way, perhaps with substring. Note that I am not allowed to use imports or regexes or add more functions.
/**
* Separates a given string into tokens, which are the "words" that are
* separated by one or more occurrences of the given separator character.
* Returns the tokens as an array of String values.
*/
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
String tokens = new String[charRunCount (source,separator)+1];
String tmp = ""; // a string in order to take a word, then run over this string
int j = 0;
int i = 0;
while (i < tokens.length) {
if ( source.charAt (j) != separator ) {
do {
tmp += source.charAt (j);
if ( j >= source.length () - 1 ) {
break;
}
else { // so that we math the source length
j++;
}
} while (source.charAt (j) != separator);
}
if ( source.charAt (j) == separator ) {
j++;
while (source.charAt (j) == separator) {
j++;
}
}
tokens[i] = tmp;// taking the token into place
tmp = ""; //resetting the token so we can begin anew
i++;
}
return tokens;
}
the charRunCount() function:
public static int charRunCount(String str, char c){
char last = 0;
int counter = 0;
for (int i = 0; i < str.length(); i++) {
// whenever a run starts.
if (last != c && str.charAt(i) == c) {
counter++;
}
last = str.charAt(i);
}
return counter;
}
java performance strings parsing reinventing-the-wheel
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have the following assignment that I succeeded in solving, but the code is very inefficient. I would appreciate if someone could show me a more efficient way, perhaps with substring. Note that I am not allowed to use imports or regexes or add more functions.
/**
* Separates a given string into tokens, which are the "words" that are
* separated by one or more occurrences of the given separator character.
* Returns the tokens as an array of String values.
*/
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
String tokens = new String[charRunCount (source,separator)+1];
String tmp = ""; // a string in order to take a word, then run over this string
int j = 0;
int i = 0;
while (i < tokens.length) {
if ( source.charAt (j) != separator ) {
do {
tmp += source.charAt (j);
if ( j >= source.length () - 1 ) {
break;
}
else { // so that we math the source length
j++;
}
} while (source.charAt (j) != separator);
}
if ( source.charAt (j) == separator ) {
j++;
while (source.charAt (j) == separator) {
j++;
}
}
tokens[i] = tmp;// taking the token into place
tmp = ""; //resetting the token so we can begin anew
i++;
}
return tokens;
}
the charRunCount() function:
public static int charRunCount(String str, char c){
char last = 0;
int counter = 0;
for (int i = 0; i < str.length(); i++) {
// whenever a run starts.
if (last != c && str.charAt(i) == c) {
counter++;
}
last = str.charAt(i);
}
return counter;
}
java performance strings parsing reinventing-the-wheel
I have the following assignment that I succeeded in solving, but the code is very inefficient. I would appreciate if someone could show me a more efficient way, perhaps with substring. Note that I am not allowed to use imports or regexes or add more functions.
/**
* Separates a given string into tokens, which are the "words" that are
* separated by one or more occurrences of the given separator character.
* Returns the tokens as an array of String values.
*/
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
String tokens = new String[charRunCount (source,separator)+1];
String tmp = ""; // a string in order to take a word, then run over this string
int j = 0;
int i = 0;
while (i < tokens.length) {
if ( source.charAt (j) != separator ) {
do {
tmp += source.charAt (j);
if ( j >= source.length () - 1 ) {
break;
}
else { // so that we math the source length
j++;
}
} while (source.charAt (j) != separator);
}
if ( source.charAt (j) == separator ) {
j++;
while (source.charAt (j) == separator) {
j++;
}
}
tokens[i] = tmp;// taking the token into place
tmp = ""; //resetting the token so we can begin anew
i++;
}
return tokens;
}
the charRunCount() function:
public static int charRunCount(String str, char c){
char last = 0;
int counter = 0;
for (int i = 0; i < str.length(); i++) {
// whenever a run starts.
if (last != c && str.charAt(i) == c) {
counter++;
}
last = str.charAt(i);
}
return counter;
}
java performance strings parsing reinventing-the-wheel
java performance strings parsing reinventing-the-wheel
edited Nov 6 at 9:37
asked Nov 5 at 16:32
Yuki1112
112
112
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26
add a comment |
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26
add a comment |
4 Answers
4
active
oldest
votes
up vote
2
down vote
tmp is never a good name for a variable. In this case, you should call it token instead, or perhaps word. And you rightly complain that building strings using repeated += operations is inefficient, and correctly suggest that .substring() would be better.
Logically, then, you need to find the starting and ending indexes of each token! So, let's define those helper functions (as private static functions):
/**
* Considering str starting at startIndex, find the index at which the
* next token starts.
*
* @return The index of the start of a token (or str.length() if no more
* tokens).
*/
private static int start(String str, char sep, int startIndex) {
int i;
for (i = startIndex; i < str.length() && str.charAt(i) == sep; i++);
return i;
}
/**
* Considering str starting at startIndex, find the index at which the
* current token ends.
*
* @return The index just beyond the end of a token (the index of a
* sep character, or str.length() if this is the last token)
*/
private static int end(String str, char sep, int startIndex) {
assert(str.charAt(sep) != sep);
int i;
for (i = startIndex; i < str.length() && str.charAt(i) != sep; i++);
return i;
}
Then, we can use them in tokenize():
public static String tokenize(String str, char sep) {
int tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); e = end(str, sep, s)) {
tokenCount++;
}
String tokens = new String[tokenCount];
tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); ) {
tokens[tokenCount++] = str.substring(s, e = end(str, sep, s));
}
assert(tokens.length == tokenCount);
return tokens;
}
Notice that now you can take advantage of the helper functions to predetermine the size of the array. Also, a lot of the repetitiveness of your conditions and loops is eliminated.
All of the analysis is done using string indexes, so there is no string manipulation other than just the essential .substring() calls — and even the trim() call is gone!
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define acharRunCount()helper function in your own solution?
– 200_success
Nov 6 at 13:52
add a comment |
up vote
0
down vote
Not sure of what you mean with "I am not allowed to use imports or regexes or add more functions." but if you can use String#indexOf this can be greatly simplified:
public class Tokenizer {
private final char separator;
public Tokenizer(char separator) {
this.separator = separator;
}
public List<String> tokenize(String string) {
List<String> tokens = new LinkedList<>();
int start = 0, end = 0;
while ( start < string.length() && (end = string.indexOf(separator, start))>-1 ) {
tokens.add(string.substring(start, end));
start = end+1;
}
tokens.add(string.substring(start));
return tokens;
}
}
1
Listwould be imported fromjava.util.
– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
add a comment |
up vote
0
down vote
I'd like to suggest to do it in a a way, that
- doesn't have two loops that basically do the same thing
- is more Java-esque
Instead of storing the result in an array, which requires you to first count the potential tokens, store the tokens in a LinkedList.
Normally one would then just return that list instead of an array, because in Java arrays are usually used as an implementation detail hidden inside objects. If you yet need to return an array, LinkedList has a toArray method, that copies the list into an array.
My solution would look like this:
public static String tokenize(String str, char separator) {
Collection<String> result = new LinkedList<>();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray(new String[result.size()]);
}
EDIT: Here's a version without additional import :)
static class LinkedList {
private class Node {
Node(String value) {
this.value = value;
}
String value;
Node next;
}
private Node first = null;
private Node last = null;
private int size = 0;
public void add(String value) {
Node node = new Node(value);
size++;
if (first == null) {
first = last = node;
return;
}
last.next = node;
last = node;
}
public String toArray() {
String array = new String[size];
int i = 0;
Node current = first;
while (current != null) {
array[i] = current.value;
i++;
current = current.next;
}
return array;
}
}
public static String tokenize(String str, char separator) {
LinkedList result = new LinkedList();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray();
}
1
Collectionwould be imported fromjava.util.
– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't useLinkedListthen anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)
– RoToRa
Nov 7 at 7:38
add a comment |
up vote
0
down vote
I found a simple way for beginners:
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
// In the following statement, replace the 0 with the correct number of tokens, and complete the missing code.
String tokens = new String[charRunCount (source,separator)+1]; //since we start from 0.
int i = 0;
int j = 0;
int sourceLen = source.length ();
while (i < sourceLen) {
String tmp = "";
while (i < sourceLen && source.charAt (i) != separator) { // add each character in the source until you hit separator
tmp += source.charAt (i++);
}
if ( !(tmp.equals ("")) ) { //so that if i=separator don't go into the token
tokens[j++] = tmp;
}
i++;
}
return tokens;
}
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
tmp is never a good name for a variable. In this case, you should call it token instead, or perhaps word. And you rightly complain that building strings using repeated += operations is inefficient, and correctly suggest that .substring() would be better.
Logically, then, you need to find the starting and ending indexes of each token! So, let's define those helper functions (as private static functions):
/**
* Considering str starting at startIndex, find the index at which the
* next token starts.
*
* @return The index of the start of a token (or str.length() if no more
* tokens).
*/
private static int start(String str, char sep, int startIndex) {
int i;
for (i = startIndex; i < str.length() && str.charAt(i) == sep; i++);
return i;
}
/**
* Considering str starting at startIndex, find the index at which the
* current token ends.
*
* @return The index just beyond the end of a token (the index of a
* sep character, or str.length() if this is the last token)
*/
private static int end(String str, char sep, int startIndex) {
assert(str.charAt(sep) != sep);
int i;
for (i = startIndex; i < str.length() && str.charAt(i) != sep; i++);
return i;
}
Then, we can use them in tokenize():
public static String tokenize(String str, char sep) {
int tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); e = end(str, sep, s)) {
tokenCount++;
}
String tokens = new String[tokenCount];
tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); ) {
tokens[tokenCount++] = str.substring(s, e = end(str, sep, s));
}
assert(tokens.length == tokenCount);
return tokens;
}
Notice that now you can take advantage of the helper functions to predetermine the size of the array. Also, a lot of the repetitiveness of your conditions and loops is eliminated.
All of the analysis is done using string indexes, so there is no string manipulation other than just the essential .substring() calls — and even the trim() call is gone!
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define acharRunCount()helper function in your own solution?
– 200_success
Nov 6 at 13:52
add a comment |
up vote
2
down vote
tmp is never a good name for a variable. In this case, you should call it token instead, or perhaps word. And you rightly complain that building strings using repeated += operations is inefficient, and correctly suggest that .substring() would be better.
Logically, then, you need to find the starting and ending indexes of each token! So, let's define those helper functions (as private static functions):
/**
* Considering str starting at startIndex, find the index at which the
* next token starts.
*
* @return The index of the start of a token (or str.length() if no more
* tokens).
*/
private static int start(String str, char sep, int startIndex) {
int i;
for (i = startIndex; i < str.length() && str.charAt(i) == sep; i++);
return i;
}
/**
* Considering str starting at startIndex, find the index at which the
* current token ends.
*
* @return The index just beyond the end of a token (the index of a
* sep character, or str.length() if this is the last token)
*/
private static int end(String str, char sep, int startIndex) {
assert(str.charAt(sep) != sep);
int i;
for (i = startIndex; i < str.length() && str.charAt(i) != sep; i++);
return i;
}
Then, we can use them in tokenize():
public static String tokenize(String str, char sep) {
int tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); e = end(str, sep, s)) {
tokenCount++;
}
String tokens = new String[tokenCount];
tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); ) {
tokens[tokenCount++] = str.substring(s, e = end(str, sep, s));
}
assert(tokens.length == tokenCount);
return tokens;
}
Notice that now you can take advantage of the helper functions to predetermine the size of the array. Also, a lot of the repetitiveness of your conditions and loops is eliminated.
All of the analysis is done using string indexes, so there is no string manipulation other than just the essential .substring() calls — and even the trim() call is gone!
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define acharRunCount()helper function in your own solution?
– 200_success
Nov 6 at 13:52
add a comment |
up vote
2
down vote
up vote
2
down vote
tmp is never a good name for a variable. In this case, you should call it token instead, or perhaps word. And you rightly complain that building strings using repeated += operations is inefficient, and correctly suggest that .substring() would be better.
Logically, then, you need to find the starting and ending indexes of each token! So, let's define those helper functions (as private static functions):
/**
* Considering str starting at startIndex, find the index at which the
* next token starts.
*
* @return The index of the start of a token (or str.length() if no more
* tokens).
*/
private static int start(String str, char sep, int startIndex) {
int i;
for (i = startIndex; i < str.length() && str.charAt(i) == sep; i++);
return i;
}
/**
* Considering str starting at startIndex, find the index at which the
* current token ends.
*
* @return The index just beyond the end of a token (the index of a
* sep character, or str.length() if this is the last token)
*/
private static int end(String str, char sep, int startIndex) {
assert(str.charAt(sep) != sep);
int i;
for (i = startIndex; i < str.length() && str.charAt(i) != sep; i++);
return i;
}
Then, we can use them in tokenize():
public static String tokenize(String str, char sep) {
int tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); e = end(str, sep, s)) {
tokenCount++;
}
String tokens = new String[tokenCount];
tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); ) {
tokens[tokenCount++] = str.substring(s, e = end(str, sep, s));
}
assert(tokens.length == tokenCount);
return tokens;
}
Notice that now you can take advantage of the helper functions to predetermine the size of the array. Also, a lot of the repetitiveness of your conditions and loops is eliminated.
All of the analysis is done using string indexes, so there is no string manipulation other than just the essential .substring() calls — and even the trim() call is gone!
tmp is never a good name for a variable. In this case, you should call it token instead, or perhaps word. And you rightly complain that building strings using repeated += operations is inefficient, and correctly suggest that .substring() would be better.
Logically, then, you need to find the starting and ending indexes of each token! So, let's define those helper functions (as private static functions):
/**
* Considering str starting at startIndex, find the index at which the
* next token starts.
*
* @return The index of the start of a token (or str.length() if no more
* tokens).
*/
private static int start(String str, char sep, int startIndex) {
int i;
for (i = startIndex; i < str.length() && str.charAt(i) == sep; i++);
return i;
}
/**
* Considering str starting at startIndex, find the index at which the
* current token ends.
*
* @return The index just beyond the end of a token (the index of a
* sep character, or str.length() if this is the last token)
*/
private static int end(String str, char sep, int startIndex) {
assert(str.charAt(sep) != sep);
int i;
for (i = startIndex; i < str.length() && str.charAt(i) != sep; i++);
return i;
}
Then, we can use them in tokenize():
public static String tokenize(String str, char sep) {
int tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); e = end(str, sep, s)) {
tokenCount++;
}
String tokens = new String[tokenCount];
tokenCount = 0;
for (int s, e = 0; (s = start(str, sep, e)) < str.length(); ) {
tokens[tokenCount++] = str.substring(s, e = end(str, sep, s));
}
assert(tokens.length == tokenCount);
return tokens;
}
Notice that now you can take advantage of the helper functions to predetermine the size of the array. Also, a lot of the repetitiveness of your conditions and loops is eliminated.
All of the analysis is done using string indexes, so there is no string manipulation other than just the essential .substring() calls — and even the trim() call is gone!
answered Nov 6 at 7:43
200_success
127k15148410
127k15148410
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define acharRunCount()helper function in your own solution?
– 200_success
Nov 6 at 13:52
add a comment |
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define acharRunCount()helper function in your own solution?
– 200_success
Nov 6 at 13:52
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
thank you for your answer, I agree It's more efficient to separate the functions, however we are not allowed to do that in this assignment
– Yuki1112
Nov 6 at 9:36
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define a
charRunCount() helper function in your own solution?– 200_success
Nov 6 at 13:52
What, exactly, are you not allowed to do, and why didn't you state the restrictions up front in your question? Didn't you define a
charRunCount() helper function in your own solution?– 200_success
Nov 6 at 13:52
add a comment |
up vote
0
down vote
Not sure of what you mean with "I am not allowed to use imports or regexes or add more functions." but if you can use String#indexOf this can be greatly simplified:
public class Tokenizer {
private final char separator;
public Tokenizer(char separator) {
this.separator = separator;
}
public List<String> tokenize(String string) {
List<String> tokens = new LinkedList<>();
int start = 0, end = 0;
while ( start < string.length() && (end = string.indexOf(separator, start))>-1 ) {
tokens.add(string.substring(start, end));
start = end+1;
}
tokens.add(string.substring(start));
return tokens;
}
}
1
Listwould be imported fromjava.util.
– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
add a comment |
up vote
0
down vote
Not sure of what you mean with "I am not allowed to use imports or regexes or add more functions." but if you can use String#indexOf this can be greatly simplified:
public class Tokenizer {
private final char separator;
public Tokenizer(char separator) {
this.separator = separator;
}
public List<String> tokenize(String string) {
List<String> tokens = new LinkedList<>();
int start = 0, end = 0;
while ( start < string.length() && (end = string.indexOf(separator, start))>-1 ) {
tokens.add(string.substring(start, end));
start = end+1;
}
tokens.add(string.substring(start));
return tokens;
}
}
1
Listwould be imported fromjava.util.
– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
add a comment |
up vote
0
down vote
up vote
0
down vote
Not sure of what you mean with "I am not allowed to use imports or regexes or add more functions." but if you can use String#indexOf this can be greatly simplified:
public class Tokenizer {
private final char separator;
public Tokenizer(char separator) {
this.separator = separator;
}
public List<String> tokenize(String string) {
List<String> tokens = new LinkedList<>();
int start = 0, end = 0;
while ( start < string.length() && (end = string.indexOf(separator, start))>-1 ) {
tokens.add(string.substring(start, end));
start = end+1;
}
tokens.add(string.substring(start));
return tokens;
}
}
Not sure of what you mean with "I am not allowed to use imports or regexes or add more functions." but if you can use String#indexOf this can be greatly simplified:
public class Tokenizer {
private final char separator;
public Tokenizer(char separator) {
this.separator = separator;
}
public List<String> tokenize(String string) {
List<String> tokens = new LinkedList<>();
int start = 0, end = 0;
while ( start < string.length() && (end = string.indexOf(separator, start))>-1 ) {
tokens.add(string.substring(start, end));
start = end+1;
}
tokens.add(string.substring(start));
return tokens;
}
}
answered Nov 7 at 7:17
gervais.b
96139
96139
1
Listwould be imported fromjava.util.
– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
add a comment |
1
Listwould be imported fromjava.util.
– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
1
1
List would be imported from java.util.– 200_success
Nov 7 at 13:11
List would be imported from java.util.– 200_success
Nov 7 at 13:11
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
Indeed. Then we should either loop twice to count the occurrence of "separator" and create a good sized array then loop to populate it. Or manage the increase of the array size by ourselves.
– gervais.b
Nov 7 at 22:03
add a comment |
up vote
0
down vote
I'd like to suggest to do it in a a way, that
- doesn't have two loops that basically do the same thing
- is more Java-esque
Instead of storing the result in an array, which requires you to first count the potential tokens, store the tokens in a LinkedList.
Normally one would then just return that list instead of an array, because in Java arrays are usually used as an implementation detail hidden inside objects. If you yet need to return an array, LinkedList has a toArray method, that copies the list into an array.
My solution would look like this:
public static String tokenize(String str, char separator) {
Collection<String> result = new LinkedList<>();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray(new String[result.size()]);
}
EDIT: Here's a version without additional import :)
static class LinkedList {
private class Node {
Node(String value) {
this.value = value;
}
String value;
Node next;
}
private Node first = null;
private Node last = null;
private int size = 0;
public void add(String value) {
Node node = new Node(value);
size++;
if (first == null) {
first = last = node;
return;
}
last.next = node;
last = node;
}
public String toArray() {
String array = new String[size];
int i = 0;
Node current = first;
while (current != null) {
array[i] = current.value;
i++;
current = current.next;
}
return array;
}
}
public static String tokenize(String str, char separator) {
LinkedList result = new LinkedList();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray();
}
1
Collectionwould be imported fromjava.util.
– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't useLinkedListthen anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)
– RoToRa
Nov 7 at 7:38
add a comment |
up vote
0
down vote
I'd like to suggest to do it in a a way, that
- doesn't have two loops that basically do the same thing
- is more Java-esque
Instead of storing the result in an array, which requires you to first count the potential tokens, store the tokens in a LinkedList.
Normally one would then just return that list instead of an array, because in Java arrays are usually used as an implementation detail hidden inside objects. If you yet need to return an array, LinkedList has a toArray method, that copies the list into an array.
My solution would look like this:
public static String tokenize(String str, char separator) {
Collection<String> result = new LinkedList<>();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray(new String[result.size()]);
}
EDIT: Here's a version without additional import :)
static class LinkedList {
private class Node {
Node(String value) {
this.value = value;
}
String value;
Node next;
}
private Node first = null;
private Node last = null;
private int size = 0;
public void add(String value) {
Node node = new Node(value);
size++;
if (first == null) {
first = last = node;
return;
}
last.next = node;
last = node;
}
public String toArray() {
String array = new String[size];
int i = 0;
Node current = first;
while (current != null) {
array[i] = current.value;
i++;
current = current.next;
}
return array;
}
}
public static String tokenize(String str, char separator) {
LinkedList result = new LinkedList();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray();
}
1
Collectionwould be imported fromjava.util.
– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't useLinkedListthen anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)
– RoToRa
Nov 7 at 7:38
add a comment |
up vote
0
down vote
up vote
0
down vote
I'd like to suggest to do it in a a way, that
- doesn't have two loops that basically do the same thing
- is more Java-esque
Instead of storing the result in an array, which requires you to first count the potential tokens, store the tokens in a LinkedList.
Normally one would then just return that list instead of an array, because in Java arrays are usually used as an implementation detail hidden inside objects. If you yet need to return an array, LinkedList has a toArray method, that copies the list into an array.
My solution would look like this:
public static String tokenize(String str, char separator) {
Collection<String> result = new LinkedList<>();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray(new String[result.size()]);
}
EDIT: Here's a version without additional import :)
static class LinkedList {
private class Node {
Node(String value) {
this.value = value;
}
String value;
Node next;
}
private Node first = null;
private Node last = null;
private int size = 0;
public void add(String value) {
Node node = new Node(value);
size++;
if (first == null) {
first = last = node;
return;
}
last.next = node;
last = node;
}
public String toArray() {
String array = new String[size];
int i = 0;
Node current = first;
while (current != null) {
array[i] = current.value;
i++;
current = current.next;
}
return array;
}
}
public static String tokenize(String str, char separator) {
LinkedList result = new LinkedList();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray();
}
I'd like to suggest to do it in a a way, that
- doesn't have two loops that basically do the same thing
- is more Java-esque
Instead of storing the result in an array, which requires you to first count the potential tokens, store the tokens in a LinkedList.
Normally one would then just return that list instead of an array, because in Java arrays are usually used as an implementation detail hidden inside objects. If you yet need to return an array, LinkedList has a toArray method, that copies the list into an array.
My solution would look like this:
public static String tokenize(String str, char separator) {
Collection<String> result = new LinkedList<>();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray(new String[result.size()]);
}
EDIT: Here's a version without additional import :)
static class LinkedList {
private class Node {
Node(String value) {
this.value = value;
}
String value;
Node next;
}
private Node first = null;
private Node last = null;
private int size = 0;
public void add(String value) {
Node node = new Node(value);
size++;
if (first == null) {
first = last = node;
return;
}
last.next = node;
last = node;
}
public String toArray() {
String array = new String[size];
int i = 0;
Node current = first;
while (current != null) {
array[i] = current.value;
i++;
current = current.next;
}
return array;
}
}
public static String tokenize(String str, char separator) {
LinkedList result = new LinkedList();
int len = str.length();
int pos = 0;
int start = 0;
while (pos < len) {
if (str.charAt(pos) != separator) {
pos++;
continue;
}
if (pos > start) {
result.add(str.substring(start, pos));
}
do {
pos++;
start = pos;
} while (pos < len && str.charAt(pos) == separator);
}
if (pos > start) {
result.add(str.substring(start, pos));
}
return result.toArray();
}
edited Nov 7 at 7:57
answered Nov 6 at 9:36
RoToRa
6,1761236
6,1761236
1
Collectionwould be imported fromjava.util.
– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't useLinkedListthen anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)
– RoToRa
Nov 7 at 7:38
add a comment |
1
Collectionwould be imported fromjava.util.
– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't useLinkedListthen anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)
– RoToRa
Nov 7 at 7:38
1
1
Collection would be imported from java.util.– 200_success
Nov 6 at 13:53
Collection would be imported from java.util.– 200_success
Nov 6 at 13:53
@200_success Ah, I missed that requirement. I can't use
LinkedList then anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)– RoToRa
Nov 7 at 7:38
@200_success Ah, I missed that requirement. I can't use
LinkedList then anyway. But in that case I'd just write a simple linked list class myself, which isn't against the rules :)– RoToRa
Nov 7 at 7:38
add a comment |
up vote
0
down vote
I found a simple way for beginners:
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
// In the following statement, replace the 0 with the correct number of tokens, and complete the missing code.
String tokens = new String[charRunCount (source,separator)+1]; //since we start from 0.
int i = 0;
int j = 0;
int sourceLen = source.length ();
while (i < sourceLen) {
String tmp = "";
while (i < sourceLen && source.charAt (i) != separator) { // add each character in the source until you hit separator
tmp += source.charAt (i++);
}
if ( !(tmp.equals ("")) ) { //so that if i=separator don't go into the token
tokens[j++] = tmp;
}
i++;
}
return tokens;
}
add a comment |
up vote
0
down vote
I found a simple way for beginners:
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
// In the following statement, replace the 0 with the correct number of tokens, and complete the missing code.
String tokens = new String[charRunCount (source,separator)+1]; //since we start from 0.
int i = 0;
int j = 0;
int sourceLen = source.length ();
while (i < sourceLen) {
String tmp = "";
while (i < sourceLen && source.charAt (i) != separator) { // add each character in the source until you hit separator
tmp += source.charAt (i++);
}
if ( !(tmp.equals ("")) ) { //so that if i=separator don't go into the token
tokens[j++] = tmp;
}
i++;
}
return tokens;
}
add a comment |
up vote
0
down vote
up vote
0
down vote
I found a simple way for beginners:
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
// In the following statement, replace the 0 with the correct number of tokens, and complete the missing code.
String tokens = new String[charRunCount (source,separator)+1]; //since we start from 0.
int i = 0;
int j = 0;
int sourceLen = source.length ();
while (i < sourceLen) {
String tmp = "";
while (i < sourceLen && source.charAt (i) != separator) { // add each character in the source until you hit separator
tmp += source.charAt (i++);
}
if ( !(tmp.equals ("")) ) { //so that if i=separator don't go into the token
tokens[j++] = tmp;
}
i++;
}
return tokens;
}
I found a simple way for beginners:
public static String tokenize (String str, char separator) {
// Removes all the occurrences of the separator at the beginning and end of str
String source = trim(str, separator);
// In the following statement, replace the 0 with the correct number of tokens, and complete the missing code.
String tokens = new String[charRunCount (source,separator)+1]; //since we start from 0.
int i = 0;
int j = 0;
int sourceLen = source.length ();
while (i < sourceLen) {
String tmp = "";
while (i < sourceLen && source.charAt (i) != separator) { // add each character in the source until you hit separator
tmp += source.charAt (i++);
}
if ( !(tmp.equals ("")) ) { //so that if i=separator don't go into the token
tokens[j++] = tmp;
}
i++;
}
return tokens;
}
answered yesterday
Yuki1112
112
112
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f206997%2fsplit-a-string-into-a-list-of-tokens%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have.
– Toby Speight
Nov 5 at 16:49
Does Java have string views? That is, a non-owning view onto a string? If it does, you should consider using them. You marked your question with the performance tag so using string views rather than allocating a string for each token will considerably improve performance.
– Kerndog73
Nov 6 at 7:26