Fast arbitrary distribution random sampling

up vote
13
down vote

favorite

The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.

I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:

def my_dist(x):

    # Some distribution, assume c1,c2,c3 and c4 are known.

    f = c1*exp(-((x-c2)**c3)/c4)

    return f



# Draw N random samples from my distribution between given limits a,b.

N = 1000

N_rand_samples = ran_func_sample(my_dist, a, b, N)

where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28

My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30

2

Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32

I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34

1

I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06

|
show 2 more comments

up vote
13
down vote

favorite

I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:

def my_dist(x):

    # Some distribution, assume c1,c2,c3 and c4 are known.

    f = c1*exp(-((x-c2)**c3)/c4)

    return f



# Draw N random samples from my distribution between given limits a,b.

N = 1000

N_rand_samples = ran_func_sample(my_dist, a, b, N)

where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28

My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30

2

Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32

I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34

1

I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06

|
show 2 more comments

up vote
13
down vote

favorite

I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:

def my_dist(x):

    # Some distribution, assume c1,c2,c3 and c4 are known.

    f = c1*exp(-((x-c2)**c3)/c4)

    return f



# Draw N random samples from my distribution between given limits a,b.

N = 1000

N_rand_samples = ran_func_sample(my_dist, a, b, N)

where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:

def my_dist(x):

    # Some distribution, assume c1,c2,c3 and c4 are known.

    f = c1*exp(-((x-c2)**c3)/c4)

    return f



# Draw N random samples from my distribution between given limits a,b.

N = 1000

N_rand_samples = ran_func_sample(my_dist, a, b, N)

where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?

python performance random

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

edited Sep 16 '16 at 15:38

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

asked Jan 13 '14 at 20:26

Gabriel

10.8k33114228

You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28

My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30

2

Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32

I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34

1

I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06

|
show 2 more comments

You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28

My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30

2

Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32

I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34

1

I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06

You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28

My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30

Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32

I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34

I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06

|
show 2 more comments

4 Answers
4

active

oldest

votes

up vote
11
down vote

accepted

You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].

After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

[inverted_function(random.random()) for x in range(1000)]

More on Inverse Transform Sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

Also, there is a good question on StackOverflow related to the topic:

Pythonic way to select list elements with different probability

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

1

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

add a comment |

up vote
8
down vote

This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.

class Distribution(object):

    """

    draws samples from a one dimensional probability distribution,

    by means of inversion of a discrete inverstion of a cumulative density function



    the pdf can be sorted first to prevent numerical error in the cumulative sum

    this is set as default; for big density functions with high contrast,

    it is absolutely necessary, and for small density functions,

    the overhead is minimal



    a call to this distibution object returns indices into density array

    """

    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):

        self.shape          = pdf.shape

        self.pdf            = pdf.ravel()

        self.sort           = sort

        self.interpolation  = interpolation

        self.transform      = transform



        #a pdf can not be negative

        assert(np.all(pdf>=0))



        #sort the pdf by magnitude

        if self.sort:

            self.sortindex = np.argsort(self.pdf, axis=None)

            self.pdf = self.pdf[self.sortindex]

        #construct the cumulative distribution function

        self.cdf = np.cumsum(self.pdf)

    @property

    def ndim(self):

        return len(self.shape)

    @property

    def sum(self):

        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""

        return self.cdf[-1]

    def __call__(self, N):

        """draw """

        #pick numbers which are uniformly random over the cumulative distribution function

        choice = np.random.uniform(high = self.sum, size = N)

        #find the indices corresponding to this point on the CDF

        index = np.searchsorted(self.cdf, choice)

        #if necessary, map the indices back to their original ordering

        if self.sort:

            index = self.sortindex[index]

        #map back to multi-dimensional indexing

        index = np.unravel_index(index, self.shape)

        index = np.vstack(index)

        #is this a discrete or piecewise continuous distribution?

        if self.interpolation:

            index = index + np.random.uniform(size=index.shape)

        return self.transform(index)





if __name__=='__main__':

    shape = 3,3

    pdf = np.ones(shape)

    pdf[1]=0

    dist = Distribution(pdf, transform=lambda i:i-1.5)

    print dist(10)

    import matplotlib.pyplot as pp

    pp.scatter(*dist(1000))

    pp.show()

And as a more real-world relevant example:

x = np.linspace(-100, 100, 512)

p = np.exp(-x**2)

pdf = p[:,None]*p[None,:]     #2d gaussian

dist = Distribution(pdf, transform=lambda i:i-256)

print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range

import matplotlib.pyplot as pp

pp.scatter(*dist(1000))

pp.show()

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

add a comment |

up vote
3
down vote

import numpy as np

import scipy.interpolate as interpolate



def inverse_transform_sampling(data, n_bins, n_samples):

    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)

    cum_values = np.zeros(bin_edges.shape)

    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))

    inv_cdf = interpolate.interp1d(cum_values, bin_edges)

    r = np.random.rand(n_samples)

    return inv_cdf(r)

So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

9

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

add a comment |

up vote
1
down vote

I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

def metropolis_hastings(target_density, size=500000):

    burnin_size = 10000

    size += burnin_size

    x0 = np.array([[0, 0]])

    xt = x0

    samples = 

    for i in range(size):

        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])

        accept_prob = (target_density(xt_candidate))/(target_density(xt))

        if np.random.uniform(0, 1) < accept_prob:

            xt = xt_candidate

        samples.append(xt)

    samples = np.array(samples[burnin_size:])

    samples = np.reshape(samples, [samples.shape[0], 2])

    return samples

This function requires a function target_density which takes in a data-point and computes its probability.

For details check-out this detailed answer of mine.

answered 2 days ago

Abdul Fatir

4,00421641

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21100716%2ffast-arbitrary-distribution-random-sampling%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
11
down vote

accepted

After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

[inverted_function(random.random()) for x in range(1000)]

More on Inverse Transform Sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

Also, there is a good question on StackOverflow related to the topic:

Pythonic way to select list elements with different probability

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

1

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

add a comment |

up vote
11
down vote

accepted

After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

[inverted_function(random.random()) for x in range(1000)]

More on Inverse Transform Sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

Also, there is a good question on StackOverflow related to the topic:

Pythonic way to select list elements with different probability

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

1

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

add a comment |

up vote
11
down vote

accepted

After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

[inverted_function(random.random()) for x in range(1000)]

More on Inverse Transform Sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

Also, there is a good question on StackOverflow related to the topic:

Pythonic way to select list elements with different probability

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:

[inverted_function(random.random()) for x in range(1000)]

More on Inverse Transform Sampling:

http://en.wikipedia.org/wiki/Inverse_transform_sampling

Also, there is a good question on StackOverflow related to the topic:

Pythonic way to select list elements with different probability

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

edited May 23 '17 at 12:17

Community♦

edited May 23 '17 at 12:17

Community♦

edited May 23 '17 at 12:17

Community♦

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

answered Jan 13 '14 at 20:45

Igor Chubin

35.9k784108

1

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

add a comment |

1

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59

add a comment |

up vote
8
down vote

class Distribution(object):

    """

    draws samples from a one dimensional probability distribution,

    by means of inversion of a discrete inverstion of a cumulative density function



    the pdf can be sorted first to prevent numerical error in the cumulative sum

    this is set as default; for big density functions with high contrast,

    it is absolutely necessary, and for small density functions,

    the overhead is minimal



    a call to this distibution object returns indices into density array

    """

    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):

        self.shape          = pdf.shape

        self.pdf            = pdf.ravel()

        self.sort           = sort

        self.interpolation  = interpolation

        self.transform      = transform



        #a pdf can not be negative

        assert(np.all(pdf>=0))



        #sort the pdf by magnitude

        if self.sort:

            self.sortindex = np.argsort(self.pdf, axis=None)

            self.pdf = self.pdf[self.sortindex]

        #construct the cumulative distribution function

        self.cdf = np.cumsum(self.pdf)

    @property

    def ndim(self):

        return len(self.shape)

    @property

    def sum(self):

        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""

        return self.cdf[-1]

    def __call__(self, N):

        """draw """

        #pick numbers which are uniformly random over the cumulative distribution function

        choice = np.random.uniform(high = self.sum, size = N)

        #find the indices corresponding to this point on the CDF

        index = np.searchsorted(self.cdf, choice)

        #if necessary, map the indices back to their original ordering

        if self.sort:

            index = self.sortindex[index]

        #map back to multi-dimensional indexing

        index = np.unravel_index(index, self.shape)

        index = np.vstack(index)

        #is this a discrete or piecewise continuous distribution?

        if self.interpolation:

            index = index + np.random.uniform(size=index.shape)

        return self.transform(index)





if __name__=='__main__':

    shape = 3,3

    pdf = np.ones(shape)

    pdf[1]=0

    dist = Distribution(pdf, transform=lambda i:i-1.5)

    print dist(10)

    import matplotlib.pyplot as pp

    pp.scatter(*dist(1000))

    pp.show()

And as a more real-world relevant example:

x = np.linspace(-100, 100, 512)

p = np.exp(-x**2)

pdf = p[:,None]*p[None,:]     #2d gaussian

dist = Distribution(pdf, transform=lambda i:i-256)

print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range

import matplotlib.pyplot as pp

pp.scatter(*dist(1000))

pp.show()

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

add a comment |

up vote
8
down vote

class Distribution(object):

    """

    draws samples from a one dimensional probability distribution,

    by means of inversion of a discrete inverstion of a cumulative density function



    the pdf can be sorted first to prevent numerical error in the cumulative sum

    this is set as default; for big density functions with high contrast,

    it is absolutely necessary, and for small density functions,

    the overhead is minimal



    a call to this distibution object returns indices into density array

    """

    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):

        self.shape          = pdf.shape

        self.pdf            = pdf.ravel()

        self.sort           = sort

        self.interpolation  = interpolation

        self.transform      = transform



        #a pdf can not be negative

        assert(np.all(pdf>=0))



        #sort the pdf by magnitude

        if self.sort:

            self.sortindex = np.argsort(self.pdf, axis=None)

            self.pdf = self.pdf[self.sortindex]

        #construct the cumulative distribution function

        self.cdf = np.cumsum(self.pdf)

    @property

    def ndim(self):

        return len(self.shape)

    @property

    def sum(self):

        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""

        return self.cdf[-1]

    def __call__(self, N):

        """draw """

        #pick numbers which are uniformly random over the cumulative distribution function

        choice = np.random.uniform(high = self.sum, size = N)

        #find the indices corresponding to this point on the CDF

        index = np.searchsorted(self.cdf, choice)

        #if necessary, map the indices back to their original ordering

        if self.sort:

            index = self.sortindex[index]

        #map back to multi-dimensional indexing

        index = np.unravel_index(index, self.shape)

        index = np.vstack(index)

        #is this a discrete or piecewise continuous distribution?

        if self.interpolation:

            index = index + np.random.uniform(size=index.shape)

        return self.transform(index)





if __name__=='__main__':

    shape = 3,3

    pdf = np.ones(shape)

    pdf[1]=0

    dist = Distribution(pdf, transform=lambda i:i-1.5)

    print dist(10)

    import matplotlib.pyplot as pp

    pp.scatter(*dist(1000))

    pp.show()

And as a more real-world relevant example:

x = np.linspace(-100, 100, 512)

p = np.exp(-x**2)

pdf = p[:,None]*p[None,:]     #2d gaussian

dist = Distribution(pdf, transform=lambda i:i-256)

print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range

import matplotlib.pyplot as pp

pp.scatter(*dist(1000))

pp.show()

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

add a comment |

up vote
8
down vote

class Distribution(object):

    """

    draws samples from a one dimensional probability distribution,

    by means of inversion of a discrete inverstion of a cumulative density function



    the pdf can be sorted first to prevent numerical error in the cumulative sum

    this is set as default; for big density functions with high contrast,

    it is absolutely necessary, and for small density functions,

    the overhead is minimal



    a call to this distibution object returns indices into density array

    """

    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):

        self.shape          = pdf.shape

        self.pdf            = pdf.ravel()

        self.sort           = sort

        self.interpolation  = interpolation

        self.transform      = transform



        #a pdf can not be negative

        assert(np.all(pdf>=0))



        #sort the pdf by magnitude

        if self.sort:

            self.sortindex = np.argsort(self.pdf, axis=None)

            self.pdf = self.pdf[self.sortindex]

        #construct the cumulative distribution function

        self.cdf = np.cumsum(self.pdf)

    @property

    def ndim(self):

        return len(self.shape)

    @property

    def sum(self):

        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""

        return self.cdf[-1]

    def __call__(self, N):

        """draw """

        #pick numbers which are uniformly random over the cumulative distribution function

        choice = np.random.uniform(high = self.sum, size = N)

        #find the indices corresponding to this point on the CDF

        index = np.searchsorted(self.cdf, choice)

        #if necessary, map the indices back to their original ordering

        if self.sort:

            index = self.sortindex[index]

        #map back to multi-dimensional indexing

        index = np.unravel_index(index, self.shape)

        index = np.vstack(index)

        #is this a discrete or piecewise continuous distribution?

        if self.interpolation:

            index = index + np.random.uniform(size=index.shape)

        return self.transform(index)





if __name__=='__main__':

    shape = 3,3

    pdf = np.ones(shape)

    pdf[1]=0

    dist = Distribution(pdf, transform=lambda i:i-1.5)

    print dist(10)

    import matplotlib.pyplot as pp

    pp.scatter(*dist(1000))

    pp.show()

And as a more real-world relevant example:

x = np.linspace(-100, 100, 512)

p = np.exp(-x**2)

pdf = p[:,None]*p[None,:]     #2d gaussian

dist = Distribution(pdf, transform=lambda i:i-256)

print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range

import matplotlib.pyplot as pp

pp.scatter(*dist(1000))

pp.show()

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

class Distribution(object):

    """

    draws samples from a one dimensional probability distribution,

    by means of inversion of a discrete inverstion of a cumulative density function



    the pdf can be sorted first to prevent numerical error in the cumulative sum

    this is set as default; for big density functions with high contrast,

    it is absolutely necessary, and for small density functions,

    the overhead is minimal



    a call to this distibution object returns indices into density array

    """

    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):

        self.shape          = pdf.shape

        self.pdf            = pdf.ravel()

        self.sort           = sort

        self.interpolation  = interpolation

        self.transform      = transform



        #a pdf can not be negative

        assert(np.all(pdf>=0))



        #sort the pdf by magnitude

        if self.sort:

            self.sortindex = np.argsort(self.pdf, axis=None)

            self.pdf = self.pdf[self.sortindex]

        #construct the cumulative distribution function

        self.cdf = np.cumsum(self.pdf)

    @property

    def ndim(self):

        return len(self.shape)

    @property

    def sum(self):

        """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""

        return self.cdf[-1]

    def __call__(self, N):

        """draw """

        #pick numbers which are uniformly random over the cumulative distribution function

        choice = np.random.uniform(high = self.sum, size = N)

        #find the indices corresponding to this point on the CDF

        index = np.searchsorted(self.cdf, choice)

        #if necessary, map the indices back to their original ordering

        if self.sort:

            index = self.sortindex[index]

        #map back to multi-dimensional indexing

        index = np.unravel_index(index, self.shape)

        index = np.vstack(index)

        #is this a discrete or piecewise continuous distribution?

        if self.interpolation:

            index = index + np.random.uniform(size=index.shape)

        return self.transform(index)





if __name__=='__main__':

    shape = 3,3

    pdf = np.ones(shape)

    pdf[1]=0

    dist = Distribution(pdf, transform=lambda i:i-1.5)

    print dist(10)

    import matplotlib.pyplot as pp

    pp.scatter(*dist(1000))

    pp.show()

And as a more real-world relevant example:

x = np.linspace(-100, 100, 512)

p = np.exp(-x**2)

pdf = p[:,None]*p[None,:]     #2d gaussian

dist = Distribution(pdf, transform=lambda i:i-256)

print dist(1000000).mean(axis=1)    #should be in the 1/sqrt(1e6) range

import matplotlib.pyplot as pp

pp.scatter(*dist(1000))

pp.show()

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

edited Jan 13 '14 at 21:53

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

answered Jan 13 '14 at 21:20

Eelco Hoogendoorn

6,26512234

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

add a comment |

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45

add a comment |

up vote
3
down vote

import numpy as np

import scipy.interpolate as interpolate



def inverse_transform_sampling(data, n_bins, n_samples):

    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)

    cum_values = np.zeros(bin_edges.shape)

    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))

    inv_cdf = interpolate.interp1d(cum_values, bin_edges)

    r = np.random.rand(n_samples)

    return inv_cdf(r)

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

9

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

add a comment |

up vote
3
down vote

import numpy as np

import scipy.interpolate as interpolate



def inverse_transform_sampling(data, n_bins, n_samples):

    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)

    cum_values = np.zeros(bin_edges.shape)

    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))

    inv_cdf = interpolate.interp1d(cum_values, bin_edges)

    r = np.random.rand(n_samples)

    return inv_cdf(r)

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

9

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

add a comment |

up vote
3
down vote

import numpy as np

import scipy.interpolate as interpolate



def inverse_transform_sampling(data, n_bins, n_samples):

    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)

    cum_values = np.zeros(bin_edges.shape)

    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))

    inv_cdf = interpolate.interp1d(cum_values, bin_edges)

    r = np.random.rand(n_samples)

    return inv_cdf(r)

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

import numpy as np

import scipy.interpolate as interpolate



def inverse_transform_sampling(data, n_bins, n_samples):

    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)

    cum_values = np.zeros(bin_edges.shape)

    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))

    inv_cdf = interpolate.interp1d(cum_values, bin_edges)

    r = np.random.rand(n_samples)

    return inv_cdf(r)

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

edited Jul 13 '15 at 12:35

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

answered Jul 13 '15 at 10:19

ThePredator

4,69133255

9

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

add a comment |

9

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10

add a comment |

up vote
1
down vote

I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

def metropolis_hastings(target_density, size=500000):

    burnin_size = 10000

    size += burnin_size

    x0 = np.array([[0, 0]])

    xt = x0

    samples = 

    for i in range(size):

        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])

        accept_prob = (target_density(xt_candidate))/(target_density(xt))

        if np.random.uniform(0, 1) < accept_prob:

            xt = xt_candidate

        samples.append(xt)

    samples = np.array(samples[burnin_size:])

    samples = np.reshape(samples, [samples.shape[0], 2])

    return samples

This function requires a function target_density which takes in a data-point and computes its probability.

For details check-out this detailed answer of mine.

answered 2 days ago

Abdul Fatir

4,00421641

add a comment |

up vote
1
down vote

I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

def metropolis_hastings(target_density, size=500000):

    burnin_size = 10000

    size += burnin_size

    x0 = np.array([[0, 0]])

    xt = x0

    samples = 

    for i in range(size):

        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])

        accept_prob = (target_density(xt_candidate))/(target_density(xt))

        if np.random.uniform(0, 1) < accept_prob:

            xt = xt_candidate

        samples.append(xt)

    samples = np.array(samples[burnin_size:])

    samples = np.reshape(samples, [samples.shape[0], 2])

    return samples

This function requires a function target_density which takes in a data-point and computes its probability.

For details check-out this detailed answer of mine.

answered 2 days ago

Abdul Fatir

4,00421641

add a comment |

up vote
1
down vote

I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

def metropolis_hastings(target_density, size=500000):

    burnin_size = 10000

    size += burnin_size

    x0 = np.array([[0, 0]])

    xt = x0

    samples = 

    for i in range(size):

        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])

        accept_prob = (target_density(xt_candidate))/(target_density(xt))

        if np.random.uniform(0, 1) < accept_prob:

            xt = xt_candidate

        samples.append(xt)

    samples = np.array(samples[burnin_size:])

    samples = np.reshape(samples, [samples.shape[0], 2])

    return samples

This function requires a function target_density which takes in a data-point and computes its probability.

For details check-out this detailed answer of mine.

answered 2 days ago

Abdul Fatir

4,00421641

I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).

def metropolis_hastings(target_density, size=500000):

    burnin_size = 10000

    size += burnin_size

    x0 = np.array([[0, 0]])

    xt = x0

    samples = 

    for i in range(size):

        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])

        accept_prob = (target_density(xt_candidate))/(target_density(xt))

        if np.random.uniform(0, 1) < accept_prob:

            xt = xt_candidate

        samples.append(xt)

    samples = np.array(samples[burnin_size:])

    samples = np.reshape(samples, [samples.shape[0], 2])

    return samples

This function requires a function target_density which takes in a data-point and computes its probability.

For details check-out this detailed answer of mine.

answered 2 days ago

Abdul Fatir

4,00421641

answered 2 days ago

Abdul Fatir

4,00421641

answered 2 days ago

Abdul Fatir

4,00421641

answered 2 days ago

Abdul Fatir

4,00421641

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk