random.shuffle very slow in Python 3 with list

up vote
0
down vote

favorite

I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:

dimension = 5

sample_size = 100



generate_indexes = itertools.combinations(range(sample_size),dimension)

all_indexes = list(generate_indexes)



# here I do the shuffle

random.shuffle(all_indexes)

the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.

Is there any way to make it fast?

because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

6

You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24

Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25

Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29

It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30

@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33

|
show 2 more comments

up vote
0
down vote

favorite

dimension = 5

sample_size = 100



generate_indexes = itertools.combinations(range(sample_size),dimension)

all_indexes = list(generate_indexes)



# here I do the shuffle

random.shuffle(all_indexes)

the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.

Is there any way to make it fast?

because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

6

You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24

Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25

Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29

It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30

@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33

|
show 2 more comments

up vote
0
down vote

favorite

dimension = 5

sample_size = 100



generate_indexes = itertools.combinations(range(sample_size),dimension)

all_indexes = list(generate_indexes)



# here I do the shuffle

random.shuffle(all_indexes)

the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.

Is there any way to make it fast?

because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

dimension = 5

sample_size = 100



generate_indexes = itertools.combinations(range(sample_size),dimension)

all_indexes = list(generate_indexes)



# here I do the shuffle

random.shuffle(all_indexes)

the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.

Is there any way to make it fast?

because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...

python-3.x list random shuffle

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

edited Nov 19 at 16:31

asked Nov 19 at 16:15

azeez

9429

asked Nov 19 at 16:15

azeez

9429

asked Nov 19 at 16:15

azeez

9429

6

You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24

Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25

Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29

It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30

@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33

|
show 2 more comments

6

You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24

Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25

Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29

It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30

@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33

You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24

Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25

Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29

It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30

@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33

|
show 2 more comments

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.

I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.

If you are not going to need all of the random indexes, you may be better off sampling each time using,

random.sample(range(sample_size), dimension)

This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.

There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.

new_sample = random.sample(range(sample_size), dimension)

if new_sample not in random_indexes:

    random_indexes.append(new_sample)

else:

    # Handle this however you need.

This does add more run time, but again will be faster if you don't need too many of your samples.

The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:

new_sample = set(random.sample(range(sample_size), dimension))

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

1

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378699%2frandom-shuffle-very-slow-in-python-3-with-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

If you are not going to need all of the random indexes, you may be better off sampling each time using,

random.sample(range(sample_size), dimension)

new_sample = random.sample(range(sample_size), dimension)

if new_sample not in random_indexes:

    random_indexes.append(new_sample)

else:

    # Handle this however you need.

This does add more run time, but again will be faster if you don't need too many of your samples.

new_sample = set(random.sample(range(sample_size), dimension))

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

1

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

add a comment |

up vote
0
down vote

accepted

If you are not going to need all of the random indexes, you may be better off sampling each time using,

random.sample(range(sample_size), dimension)

new_sample = random.sample(range(sample_size), dimension)

if new_sample not in random_indexes:

    random_indexes.append(new_sample)

else:

    # Handle this however you need.

This does add more run time, but again will be faster if you don't need too many of your samples.

new_sample = set(random.sample(range(sample_size), dimension))

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

1

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

add a comment |

up vote
0
down vote

accepted

If you are not going to need all of the random indexes, you may be better off sampling each time using,

random.sample(range(sample_size), dimension)

new_sample = random.sample(range(sample_size), dimension)

if new_sample not in random_indexes:

    random_indexes.append(new_sample)

else:

    # Handle this however you need.

This does add more run time, but again will be faster if you don't need too many of your samples.

new_sample = set(random.sample(range(sample_size), dimension))

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

If you are not going to need all of the random indexes, you may be better off sampling each time using,

random.sample(range(sample_size), dimension)

new_sample = random.sample(range(sample_size), dimension)

if new_sample not in random_indexes:

    random_indexes.append(new_sample)

else:

    # Handle this however you need.

This does add more run time, but again will be faster if you don't need too many of your samples.

new_sample = set(random.sample(range(sample_size), dimension))

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

answered Nov 19 at 17:09

Andrew McDowell

1,5221215

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

1

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

add a comment |

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

1

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40

at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01

That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk