Quicksort gets faster with duplicate keys (without three way partitioning). What is going on?











up vote
1
down vote

favorite












I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.




  • The code actually sorts the array.

  • The code actually sorts the array within a time frame that is
    expected of quicksort.

  • When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

  • I use the Random library nextInt method to fill the array.


When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.



1million elements (range 0 - 1 million): 155 ms



1million elements (range 0 - 2): 118 ms



30million elements (range 0-1 million): 3085 ms



30million elements (range 0-2): 1021 ms



I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.



I couldn't find any explanation why this might occur.



Here is my code:



public void qsort2median(int array , int lowest, int highest)
{
int pivot = array[(lowest+highest)/2];
int left = lowest;
int right = highest;

while (left <= right){
while (array[left] < pivot){
left ++;
}
while (array[right] > pivot){
right--;
}
if (left <= right){
int temp = array [left];
array[left] = array[right];
array[right] = temp;
left ++;
right--;
}
}
if (lowest < right) {
qsort2median(array, lowest, right);
}
if (left < highest) {
qsort2median(array, left, highest);
}
}


and here is how I run it:



    int  arraytest = new int  [1000000];
int arraytest1 = new int [1000000];

Quicksort qsorttest = new Quicksort();

qsorttest.randomArrayGenerator(arraytest,2);

System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);
int endpoint = arraytest.length-1;

//RUN FIRST ARRAY
long startTime = System.currentTimeMillis();
qsorttest.qsort2median(arraytest,0,endpoint);
long endTime = System.currentTimeMillis();

long duration = (endTime - startTime);
System.out.println("Quicksort with Median Element pivot ran in: " + duration);


and here is my randomArrayGenerator method:



void randomArrayGenerator(int  array, int n){

Random generator = new Random();

System.out.println();
System.out.println(array.length);

for (int i = 0; i < array.length; i++){

array[i]= generator.nextInt(n);
}
}


Thanks for any input.



As of Monday, I am still trying to figure this out.










share|improve this question




















  • 1




    It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
    – Ole V.V.
    Nov 18 at 11:23












  • @OleV.V. Thanks for trying to help
    – C. O.
    Nov 18 at 15:01






  • 1




    Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
    – Andriy Berestovskyy
    Nov 20 at 10:29










  • @AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
    – C. O.
    Nov 20 at 13:36















up vote
1
down vote

favorite












I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.




  • The code actually sorts the array.

  • The code actually sorts the array within a time frame that is
    expected of quicksort.

  • When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

  • I use the Random library nextInt method to fill the array.


When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.



1million elements (range 0 - 1 million): 155 ms



1million elements (range 0 - 2): 118 ms



30million elements (range 0-1 million): 3085 ms



30million elements (range 0-2): 1021 ms



I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.



I couldn't find any explanation why this might occur.



Here is my code:



public void qsort2median(int array , int lowest, int highest)
{
int pivot = array[(lowest+highest)/2];
int left = lowest;
int right = highest;

while (left <= right){
while (array[left] < pivot){
left ++;
}
while (array[right] > pivot){
right--;
}
if (left <= right){
int temp = array [left];
array[left] = array[right];
array[right] = temp;
left ++;
right--;
}
}
if (lowest < right) {
qsort2median(array, lowest, right);
}
if (left < highest) {
qsort2median(array, left, highest);
}
}


and here is how I run it:



    int  arraytest = new int  [1000000];
int arraytest1 = new int [1000000];

Quicksort qsorttest = new Quicksort();

qsorttest.randomArrayGenerator(arraytest,2);

System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);
int endpoint = arraytest.length-1;

//RUN FIRST ARRAY
long startTime = System.currentTimeMillis();
qsorttest.qsort2median(arraytest,0,endpoint);
long endTime = System.currentTimeMillis();

long duration = (endTime - startTime);
System.out.println("Quicksort with Median Element pivot ran in: " + duration);


and here is my randomArrayGenerator method:



void randomArrayGenerator(int  array, int n){

Random generator = new Random();

System.out.println();
System.out.println(array.length);

for (int i = 0; i < array.length; i++){

array[i]= generator.nextInt(n);
}
}


Thanks for any input.



As of Monday, I am still trying to figure this out.










share|improve this question




















  • 1




    It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
    – Ole V.V.
    Nov 18 at 11:23












  • @OleV.V. Thanks for trying to help
    – C. O.
    Nov 18 at 15:01






  • 1




    Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
    – Andriy Berestovskyy
    Nov 20 at 10:29










  • @AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
    – C. O.
    Nov 20 at 13:36













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.




  • The code actually sorts the array.

  • The code actually sorts the array within a time frame that is
    expected of quicksort.

  • When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

  • I use the Random library nextInt method to fill the array.


When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.



1million elements (range 0 - 1 million): 155 ms



1million elements (range 0 - 2): 118 ms



30million elements (range 0-1 million): 3085 ms



30million elements (range 0-2): 1021 ms



I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.



I couldn't find any explanation why this might occur.



Here is my code:



public void qsort2median(int array , int lowest, int highest)
{
int pivot = array[(lowest+highest)/2];
int left = lowest;
int right = highest;

while (left <= right){
while (array[left] < pivot){
left ++;
}
while (array[right] > pivot){
right--;
}
if (left <= right){
int temp = array [left];
array[left] = array[right];
array[right] = temp;
left ++;
right--;
}
}
if (lowest < right) {
qsort2median(array, lowest, right);
}
if (left < highest) {
qsort2median(array, left, highest);
}
}


and here is how I run it:



    int  arraytest = new int  [1000000];
int arraytest1 = new int [1000000];

Quicksort qsorttest = new Quicksort();

qsorttest.randomArrayGenerator(arraytest,2);

System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);
int endpoint = arraytest.length-1;

//RUN FIRST ARRAY
long startTime = System.currentTimeMillis();
qsorttest.qsort2median(arraytest,0,endpoint);
long endTime = System.currentTimeMillis();

long duration = (endTime - startTime);
System.out.println("Quicksort with Median Element pivot ran in: " + duration);


and here is my randomArrayGenerator method:



void randomArrayGenerator(int  array, int n){

Random generator = new Random();

System.out.println();
System.out.println(array.length);

for (int i = 0; i < array.length; i++){

array[i]= generator.nextInt(n);
}
}


Thanks for any input.



As of Monday, I am still trying to figure this out.










share|improve this question















I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.




  • The code actually sorts the array.

  • The code actually sorts the array within a time frame that is
    expected of quicksort.

  • When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

  • I use the Random library nextInt method to fill the array.


When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.



1million elements (range 0 - 1 million): 155 ms



1million elements (range 0 - 2): 118 ms



30million elements (range 0-1 million): 3085 ms



30million elements (range 0-2): 1021 ms



I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.



I couldn't find any explanation why this might occur.



Here is my code:



public void qsort2median(int array , int lowest, int highest)
{
int pivot = array[(lowest+highest)/2];
int left = lowest;
int right = highest;

while (left <= right){
while (array[left] < pivot){
left ++;
}
while (array[right] > pivot){
right--;
}
if (left <= right){
int temp = array [left];
array[left] = array[right];
array[right] = temp;
left ++;
right--;
}
}
if (lowest < right) {
qsort2median(array, lowest, right);
}
if (left < highest) {
qsort2median(array, left, highest);
}
}


and here is how I run it:



    int  arraytest = new int  [1000000];
int arraytest1 = new int [1000000];

Quicksort qsorttest = new Quicksort();

qsorttest.randomArrayGenerator(arraytest,2);

System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);
int endpoint = arraytest.length-1;

//RUN FIRST ARRAY
long startTime = System.currentTimeMillis();
qsorttest.qsort2median(arraytest,0,endpoint);
long endTime = System.currentTimeMillis();

long duration = (endTime - startTime);
System.out.println("Quicksort with Median Element pivot ran in: " + duration);


and here is my randomArrayGenerator method:



void randomArrayGenerator(int  array, int n){

Random generator = new Random();

System.out.println();
System.out.println(array.length);

for (int i = 0; i < array.length; i++){

array[i]= generator.nextInt(n);
}
}


Thanks for any input.



As of Monday, I am still trying to figure this out.







java arrays performance quicksort






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 0:16

























asked Nov 18 at 10:04









C. O.

183




183








  • 1




    It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
    – Ole V.V.
    Nov 18 at 11:23












  • @OleV.V. Thanks for trying to help
    – C. O.
    Nov 18 at 15:01






  • 1




    Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
    – Andriy Berestovskyy
    Nov 20 at 10:29










  • @AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
    – C. O.
    Nov 20 at 13:36














  • 1




    It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
    – Ole V.V.
    Nov 18 at 11:23












  • @OleV.V. Thanks for trying to help
    – C. O.
    Nov 18 at 15:01






  • 1




    Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
    – Andriy Berestovskyy
    Nov 20 at 10:29










  • @AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
    – C. O.
    Nov 20 at 13:36








1




1




It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23






It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23














@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01




@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01




1




1




Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29




Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29












@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36




@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36












1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










I guess I am allowed to answer my own question.



Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.



I tested both of them and got the expected result.



So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.






share|improve this answer





















  • Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
    – Andriy Berestovskyy
    Nov 22 at 8:46













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53359700%2fquicksort-gets-faster-with-duplicate-keys-without-three-way-partitioning-what%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










I guess I am allowed to answer my own question.



Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.



I tested both of them and got the expected result.



So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.






share|improve this answer





















  • Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
    – Andriy Berestovskyy
    Nov 22 at 8:46

















up vote
0
down vote



accepted










I guess I am allowed to answer my own question.



Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.



I tested both of them and got the expected result.



So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.






share|improve this answer





















  • Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
    – Andriy Berestovskyy
    Nov 22 at 8:46















up vote
0
down vote



accepted







up vote
0
down vote



accepted






I guess I am allowed to answer my own question.



Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.



I tested both of them and got the expected result.



So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.






share|improve this answer












I guess I am allowed to answer my own question.



Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.



I tested both of them and got the expected result.



So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 20 at 13:52









C. O.

183




183












  • Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
    – Andriy Berestovskyy
    Nov 22 at 8:46




















  • Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
    – Andriy Berestovskyy
    Nov 22 at 8:46


















Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46






Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53359700%2fquicksort-gets-faster-with-duplicate-keys-without-three-way-partitioning-what%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Costa Masnaga

Fotorealismo

Sidney Franklin