Quicksort gets faster with duplicate keys (without three way partitioning). What is going on?

up vote
1
down vote

favorite

I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.

The code actually sorts the array.

The code actually sorts the array within a time frame that is
expected of quicksort.

When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

I use the Random library nextInt method to fill the array.

When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.

1million elements (range 0 - 1 million): 155 ms

1million elements (range 0 - 2): 118 ms

30million elements (range 0-1 million): 3085 ms

30million elements (range 0-2): 1021 ms

I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.

I couldn't find any explanation why this might occur.

Here is my code:

public void qsort2median(int array , int lowest, int highest)

    {

        int pivot = array[(lowest+highest)/2];

        int left = lowest;

        int right = highest;



        while (left <= right){

            while (array[left] < pivot){

                left ++;

            }

            while (array[right] > pivot){

                right--;

            }

            if (left <= right){

                int temp = array [left];

                array[left] = array[right];

                array[right] = temp;

                left ++;

                right--;

            }

        }

        if (lowest < right) {

            qsort2median(array, lowest, right);

        }

        if (left < highest) {

            qsort2median(array, left, highest);

        }

    }

and here is how I run it:

    int  arraytest = new int  [1000000];

    int  arraytest1 = new int [1000000];



    Quicksort qsorttest = new Quicksort();



    qsorttest.randomArrayGenerator(arraytest,2);



    System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);

    int endpoint = arraytest.length-1;



    //RUN FIRST ARRAY

    long startTime = System.currentTimeMillis();

    qsorttest.qsort2median(arraytest,0,endpoint);

    long endTime = System.currentTimeMillis();



    long duration = (endTime - startTime); 

    System.out.println("Quicksort with Median Element pivot ran in: " + duration);

and here is my randomArrayGenerator method:

void randomArrayGenerator(int  array, int n){



    Random generator = new Random();



    System.out.println();

    System.out.println(array.length);



    for (int i = 0; i < array.length; i++){



        array[i]= generator.nextInt(n);

    }

}

Thanks for any input.

As of Monday, I am still trying to figure this out.

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

1

It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23

@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01

1

Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29

@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36

add a comment |

up vote
1
down vote

favorite

I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.

The code actually sorts the array.

The code actually sorts the array within a time frame that is
expected of quicksort.

When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

I use the Random library nextInt method to fill the array.

When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.

1million elements (range 0 - 1 million): 155 ms

1million elements (range 0 - 2): 118 ms

30million elements (range 0-1 million): 3085 ms

30million elements (range 0-2): 1021 ms

I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.

I couldn't find any explanation why this might occur.

Here is my code:

public void qsort2median(int array , int lowest, int highest)

    {

        int pivot = array[(lowest+highest)/2];

        int left = lowest;

        int right = highest;



        while (left <= right){

            while (array[left] < pivot){

                left ++;

            }

            while (array[right] > pivot){

                right--;

            }

            if (left <= right){

                int temp = array [left];

                array[left] = array[right];

                array[right] = temp;

                left ++;

                right--;

            }

        }

        if (lowest < right) {

            qsort2median(array, lowest, right);

        }

        if (left < highest) {

            qsort2median(array, left, highest);

        }

    }

and here is how I run it:

    int  arraytest = new int  [1000000];

    int  arraytest1 = new int [1000000];



    Quicksort qsorttest = new Quicksort();



    qsorttest.randomArrayGenerator(arraytest,2);



    System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);

    int endpoint = arraytest.length-1;



    //RUN FIRST ARRAY

    long startTime = System.currentTimeMillis();

    qsorttest.qsort2median(arraytest,0,endpoint);

    long endTime = System.currentTimeMillis();



    long duration = (endTime - startTime); 

    System.out.println("Quicksort with Median Element pivot ran in: " + duration);

and here is my randomArrayGenerator method:

void randomArrayGenerator(int  array, int n){



    Random generator = new Random();



    System.out.println();

    System.out.println(array.length);



    for (int i = 0; i < array.length; i++){



        array[i]= generator.nextInt(n);

    }

}

Thanks for any input.

As of Monday, I am still trying to figure this out.

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

1

It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23

@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01

1

Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29

@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36

add a comment |

up vote
1
down vote

favorite

I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.

The code actually sorts the array.

The code actually sorts the array within a time frame that is
expected of quicksort.

When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

I use the Random library nextInt method to fill the array.

When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.

1million elements (range 0 - 1 million): 155 ms

1million elements (range 0 - 2): 118 ms

30million elements (range 0-1 million): 3085 ms

30million elements (range 0-2): 1021 ms

I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.

I couldn't find any explanation why this might occur.

Here is my code:

public void qsort2median(int array , int lowest, int highest)

    {

        int pivot = array[(lowest+highest)/2];

        int left = lowest;

        int right = highest;



        while (left <= right){

            while (array[left] < pivot){

                left ++;

            }

            while (array[right] > pivot){

                right--;

            }

            if (left <= right){

                int temp = array [left];

                array[left] = array[right];

                array[right] = temp;

                left ++;

                right--;

            }

        }

        if (lowest < right) {

            qsort2median(array, lowest, right);

        }

        if (left < highest) {

            qsort2median(array, left, highest);

        }

    }

and here is how I run it:

    int  arraytest = new int  [1000000];

    int  arraytest1 = new int [1000000];



    Quicksort qsorttest = new Quicksort();



    qsorttest.randomArrayGenerator(arraytest,2);



    System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);

    int endpoint = arraytest.length-1;



    //RUN FIRST ARRAY

    long startTime = System.currentTimeMillis();

    qsorttest.qsort2median(arraytest,0,endpoint);

    long endTime = System.currentTimeMillis();



    long duration = (endTime - startTime); 

    System.out.println("Quicksort with Median Element pivot ran in: " + duration);

and here is my randomArrayGenerator method:

void randomArrayGenerator(int  array, int n){



    Random generator = new Random();



    System.out.println();

    System.out.println(array.length);



    for (int i = 0; i < array.length; i++){



        array[i]= generator.nextInt(n);

    }

}

Thanks for any input.

As of Monday, I am still trying to figure this out.

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

I am a little confused. I've been trying to test quicksort and everything seems to be working fine, except, when I have many duplicate array elements, I am a getting the unexpected result.

The code actually sorts the array.

The code actually sorts the array within a time frame that is
expected of quicksort.

When the array is sorted or reverse sorted I get the expected inefficiency (This, of course when the pivot is set to the first element)

I use the Random library nextInt method to fill the array.

When I decrease the data range (which consequently puts more duplicates in the array), Quick sort runs faster.

1million elements (range 0 - 1 million): 155 ms

1million elements (range 0 - 2): 118 ms

30million elements (range 0-1 million): 3085 ms

30million elements (range 0-2): 1021 ms

I tried this many times, so it doesn't seem like it's due to the randomization. The difference gets even bolder with larger data.

I couldn't find any explanation why this might occur.

Here is my code:

public void qsort2median(int array , int lowest, int highest)

    {

        int pivot = array[(lowest+highest)/2];

        int left = lowest;

        int right = highest;



        while (left <= right){

            while (array[left] < pivot){

                left ++;

            }

            while (array[right] > pivot){

                right--;

            }

            if (left <= right){

                int temp = array [left];

                array[left] = array[right];

                array[right] = temp;

                left ++;

                right--;

            }

        }

        if (lowest < right) {

            qsort2median(array, lowest, right);

        }

        if (left < highest) {

            qsort2median(array, left, highest);

        }

    }

and here is how I run it:

    int  arraytest = new int  [1000000];

    int  arraytest1 = new int [1000000];



    Quicksort qsorttest = new Quicksort();



    qsorttest.randomArrayGenerator(arraytest,2);



    System.arraycopy(arraytest,0,arraytest1,0,arraytest.length);

    int endpoint = arraytest.length-1;



    //RUN FIRST ARRAY

    long startTime = System.currentTimeMillis();

    qsorttest.qsort2median(arraytest,0,endpoint);

    long endTime = System.currentTimeMillis();



    long duration = (endTime - startTime); 

    System.out.println("Quicksort with Median Element pivot ran in: " + duration);

and here is my randomArrayGenerator method:

void randomArrayGenerator(int  array, int n){



    Random generator = new Random();



    System.out.println();

    System.out.println(array.length);



    for (int i = 0; i < array.length; i++){



        array[i]= generator.nextInt(n);

    }

}

Thanks for any input.

As of Monday, I am still trying to figure this out.

java arrays performance quicksort

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

edited Nov 20 at 0:16

asked Nov 18 at 10:04

C. O.

183

asked Nov 18 at 10:04

C. O.

183

asked Nov 18 at 10:04

C. O.

183

1

It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23

@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01

1

Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29

@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36

add a comment |

1

It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23

@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01

1

Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29

@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36

It seems that when sorting 1 million elements in the range up to 1 million, your code performs around 28 500 000 comparisons of elements (not counting comparisons of indices) and 5 000 000 swaps. When the range is only up to 2 (exclusive), that is, there any many duplicate elements, only 19 000 000 comparisons but now 9 500 000 swaps. I have not yet understood why that is, but the fewer comparisons must account for the shorter time (if it’s not just random variation; after all we are using random input, and other factors also vary from run to run).
– Ole V.V.
Nov 18 at 11:23

@OleV.V. Thanks for trying to help
– C. O.
Nov 18 at 15:01

Not quite sure what's unexpected here? Quicksort on an array with lots of duplicates perform faster, that's the expected behavior...
– Andriy Berestovskyy
Nov 20 at 10:29

@AndriyBerestovskyy I think there has been a misunderstanding on my part. The text I was reading was based on Lomuto's partitioning, which indeed gets slower with a smaller range (I tested it). But it gets faster with Hoare's partitioning, which is what this code uses.
– C. O.
Nov 20 at 13:36

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

I guess I am allowed to answer my own question.

Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.

I tested both of them and got the expected result.

So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.

answered Nov 20 at 13:52

C. O.

183

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53359700%2fquicksort-gets-faster-with-duplicate-keys-without-three-way-partitioning-what%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

I guess I am allowed to answer my own question.

Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.

I tested both of them and got the expected result.

So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.

answered Nov 20 at 13:52

C. O.

183

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

add a comment |

up vote
0
down vote

accepted

I guess I am allowed to answer my own question.

Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.

I tested both of them and got the expected result.

So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.

answered Nov 20 at 13:52

C. O.

183

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

add a comment |

up vote
0
down vote

accepted

I guess I am allowed to answer my own question.

Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.

I tested both of them and got the expected result.

So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.

answered Nov 20 at 13:52

C. O.

183

I guess I am allowed to answer my own question.

Upon further research, I discovered that duplicates only become a problem when Quicksort algorithm uses Lomuto's partitioning. This, on the other hand, uses Hoare's partitioning.

I tested both of them and got the expected result.

So Lomuto's partitioning gets slower with increased duplicates and Hoare's algorithm gets faster with duplicates.

answered Nov 20 at 13:52

C. O.

183

answered Nov 20 at 13:52

C. O.

183

answered Nov 20 at 13:52

C. O.

183

answered Nov 20 at 13:52

C. O.

183

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

add a comment |

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

Right, there are algorithms on the internet which will swap the elements even if they are equal to the pivot, i.e. if A[j] <= pivot then ... swap() Those will be bad for the duplicates indeed. But have a look at the Lomuto partition on Wikipedia. It swaps only elements which are strictly less then the pivot, i.e. if A[j] < pivot then ... swap(). I am pretty sure this Lomuto implementation should be good for the duplicates, though I did not test it...
– Andriy Berestovskyy
Nov 22 at 8:46

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

z9,2wdHuHpv1NIf7qYHbyyU0qp3jHyx7g mBvQ1,m kn0u4HjR3,uXquHFXY5rt,FHO 9,hO lZzdxqR8S,ak,rPww9 ok949LcLdFnyp

搜尋此網誌

Nsryjdtyk