Fast arbitrary distribution random sampling











up vote
13
down vote

favorite
8












The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.



I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:



def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f

# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)


where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?










share|improve this question
























  • You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
    – BrenBarn
    Jan 13 '14 at 20:28










  • My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
    – Gabriel
    Jan 13 '14 at 20:30






  • 2




    Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
    – BrenBarn
    Jan 13 '14 at 20:32












  • I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
    – Gabriel
    Jan 13 '14 at 20:34






  • 1




    I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
    – Eelco Hoogendoorn
    Jan 13 '14 at 21:06















up vote
13
down vote

favorite
8












The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.



I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:



def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f

# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)


where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?










share|improve this question
























  • You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
    – BrenBarn
    Jan 13 '14 at 20:28










  • My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
    – Gabriel
    Jan 13 '14 at 20:30






  • 2




    Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
    – BrenBarn
    Jan 13 '14 at 20:32












  • I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
    – Gabriel
    Jan 13 '14 at 20:34






  • 1




    I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
    – Eelco Hoogendoorn
    Jan 13 '14 at 21:06













up vote
13
down vote

favorite
8









up vote
13
down vote

favorite
8






8





The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.



I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:



def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f

# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)


where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?










share|improve this question















The random module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss will sample random point from a normal distribution with a given mean and sigma values.



I'm looking for a way to extract a number N of random samples between a given interval using my own distribution as fast as possible in python. This is what I mean:



def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f

# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)


where ran_func_sample is what I'm after and a, b are the limits from which to draw the samples. Is there anything of that sort in python?







python performance random






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 16 '16 at 15:38

























asked Jan 13 '14 at 20:26









Gabriel

10.8k33114228




10.8k33114228












  • You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
    – BrenBarn
    Jan 13 '14 at 20:28










  • My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
    – Gabriel
    Jan 13 '14 at 20:30






  • 2




    Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
    – BrenBarn
    Jan 13 '14 at 20:32












  • I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
    – Gabriel
    Jan 13 '14 at 20:34






  • 1




    I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
    – Eelco Hoogendoorn
    Jan 13 '14 at 21:06


















  • You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
    – BrenBarn
    Jan 13 '14 at 20:28










  • My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
    – Gabriel
    Jan 13 '14 at 20:30






  • 2




    Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
    – BrenBarn
    Jan 13 '14 at 20:32












  • I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
    – Gabriel
    Jan 13 '14 at 20:34






  • 1




    I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
    – Eelco Hoogendoorn
    Jan 13 '14 at 21:06
















You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28




You can just call your function N times. However, you still need to specify what distribution you want your x values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28












My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30




My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30




2




2




Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32






Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random x values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32














I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34




I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a x**2 function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34




1




1




I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06




I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06












4 Answers
4






active

oldest

votes

















up vote
11
down vote



accepted










You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].



After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:



[inverted_function(random.random()) for x in range(1000)]


More on Inverse Transform Sampling:




  • http://en.wikipedia.org/wiki/Inverse_transform_sampling


Also, there is a good question on StackOverflow related to the topic:




  • Pythonic way to select list elements with different probability






share|improve this answer



















  • 1




    I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
    – Georgy
    Jun 11 at 15:59


















up vote
8
down vote













This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.



class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function

the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal

a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform

#a pdf can not be negative
assert(np.all(pdf>=0))

#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)


if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()


And as a more real-world relevant example:



x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()





share|improve this answer























  • Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
    – Eelco Hoogendoorn
    Jan 21 '14 at 22:45


















up vote
3
down vote













import numpy as np
import scipy.interpolate as interpolate

def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)


So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.






share|improve this answer



















  • 9




    Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
    – Jost
    Jan 20 '17 at 14:10


















up vote
1
down vote













I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).



def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples


This function requires a function target_density which takes in a data-point and computes its probability.



For details check-out this detailed answer of mine.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21100716%2ffast-arbitrary-distribution-random-sampling%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    11
    down vote



    accepted










    You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
    to random numbers having standard uniform distribution in the interval [0,1].



    After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:



    [inverted_function(random.random()) for x in range(1000)]


    More on Inverse Transform Sampling:




    • http://en.wikipedia.org/wiki/Inverse_transform_sampling


    Also, there is a good question on StackOverflow related to the topic:




    • Pythonic way to select list elements with different probability






    share|improve this answer



















    • 1




      I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
      – Georgy
      Jun 11 at 15:59















    up vote
    11
    down vote



    accepted










    You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
    to random numbers having standard uniform distribution in the interval [0,1].



    After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:



    [inverted_function(random.random()) for x in range(1000)]


    More on Inverse Transform Sampling:




    • http://en.wikipedia.org/wiki/Inverse_transform_sampling


    Also, there is a good question on StackOverflow related to the topic:




    • Pythonic way to select list elements with different probability






    share|improve this answer



















    • 1




      I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
      – Georgy
      Jun 11 at 15:59













    up vote
    11
    down vote



    accepted







    up vote
    11
    down vote



    accepted






    You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
    to random numbers having standard uniform distribution in the interval [0,1].



    After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:



    [inverted_function(random.random()) for x in range(1000)]


    More on Inverse Transform Sampling:




    • http://en.wikipedia.org/wiki/Inverse_transform_sampling


    Also, there is a good question on StackOverflow related to the topic:




    • Pythonic way to select list elements with different probability






    share|improve this answer














    You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
    to random numbers having standard uniform distribution in the interval [0,1].



    After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:



    [inverted_function(random.random()) for x in range(1000)]


    More on Inverse Transform Sampling:




    • http://en.wikipedia.org/wiki/Inverse_transform_sampling


    Also, there is a good question on StackOverflow related to the topic:




    • Pythonic way to select list elements with different probability







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited May 23 '17 at 12:17









    Community

    11




    11










    answered Jan 13 '14 at 20:45









    Igor Chubin

    35.9k784108




    35.9k784108








    • 1




      I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
      – Georgy
      Jun 11 at 15:59














    • 1




      I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
      – Georgy
      Jun 11 at 15:59








    1




    1




    I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
    – Georgy
    Jun 11 at 15:59




    I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
    – Georgy
    Jun 11 at 15:59












    up vote
    8
    down vote













    This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.



    class Distribution(object):
    """
    draws samples from a one dimensional probability distribution,
    by means of inversion of a discrete inverstion of a cumulative density function

    the pdf can be sorted first to prevent numerical error in the cumulative sum
    this is set as default; for big density functions with high contrast,
    it is absolutely necessary, and for small density functions,
    the overhead is minimal

    a call to this distibution object returns indices into density array
    """
    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
    self.shape = pdf.shape
    self.pdf = pdf.ravel()
    self.sort = sort
    self.interpolation = interpolation
    self.transform = transform

    #a pdf can not be negative
    assert(np.all(pdf>=0))

    #sort the pdf by magnitude
    if self.sort:
    self.sortindex = np.argsort(self.pdf, axis=None)
    self.pdf = self.pdf[self.sortindex]
    #construct the cumulative distribution function
    self.cdf = np.cumsum(self.pdf)
    @property
    def ndim(self):
    return len(self.shape)
    @property
    def sum(self):
    """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
    return self.cdf[-1]
    def __call__(self, N):
    """draw """
    #pick numbers which are uniformly random over the cumulative distribution function
    choice = np.random.uniform(high = self.sum, size = N)
    #find the indices corresponding to this point on the CDF
    index = np.searchsorted(self.cdf, choice)
    #if necessary, map the indices back to their original ordering
    if self.sort:
    index = self.sortindex[index]
    #map back to multi-dimensional indexing
    index = np.unravel_index(index, self.shape)
    index = np.vstack(index)
    #is this a discrete or piecewise continuous distribution?
    if self.interpolation:
    index = index + np.random.uniform(size=index.shape)
    return self.transform(index)


    if __name__=='__main__':
    shape = 3,3
    pdf = np.ones(shape)
    pdf[1]=0
    dist = Distribution(pdf, transform=lambda i:i-1.5)
    print dist(10)
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()


    And as a more real-world relevant example:



    x = np.linspace(-100, 100, 512)
    p = np.exp(-x**2)
    pdf = p[:,None]*p[None,:] #2d gaussian
    dist = Distribution(pdf, transform=lambda i:i-256)
    print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()





    share|improve this answer























    • Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
      – Eelco Hoogendoorn
      Jan 21 '14 at 22:45















    up vote
    8
    down vote













    This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.



    class Distribution(object):
    """
    draws samples from a one dimensional probability distribution,
    by means of inversion of a discrete inverstion of a cumulative density function

    the pdf can be sorted first to prevent numerical error in the cumulative sum
    this is set as default; for big density functions with high contrast,
    it is absolutely necessary, and for small density functions,
    the overhead is minimal

    a call to this distibution object returns indices into density array
    """
    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
    self.shape = pdf.shape
    self.pdf = pdf.ravel()
    self.sort = sort
    self.interpolation = interpolation
    self.transform = transform

    #a pdf can not be negative
    assert(np.all(pdf>=0))

    #sort the pdf by magnitude
    if self.sort:
    self.sortindex = np.argsort(self.pdf, axis=None)
    self.pdf = self.pdf[self.sortindex]
    #construct the cumulative distribution function
    self.cdf = np.cumsum(self.pdf)
    @property
    def ndim(self):
    return len(self.shape)
    @property
    def sum(self):
    """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
    return self.cdf[-1]
    def __call__(self, N):
    """draw """
    #pick numbers which are uniformly random over the cumulative distribution function
    choice = np.random.uniform(high = self.sum, size = N)
    #find the indices corresponding to this point on the CDF
    index = np.searchsorted(self.cdf, choice)
    #if necessary, map the indices back to their original ordering
    if self.sort:
    index = self.sortindex[index]
    #map back to multi-dimensional indexing
    index = np.unravel_index(index, self.shape)
    index = np.vstack(index)
    #is this a discrete or piecewise continuous distribution?
    if self.interpolation:
    index = index + np.random.uniform(size=index.shape)
    return self.transform(index)


    if __name__=='__main__':
    shape = 3,3
    pdf = np.ones(shape)
    pdf[1]=0
    dist = Distribution(pdf, transform=lambda i:i-1.5)
    print dist(10)
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()


    And as a more real-world relevant example:



    x = np.linspace(-100, 100, 512)
    p = np.exp(-x**2)
    pdf = p[:,None]*p[None,:] #2d gaussian
    dist = Distribution(pdf, transform=lambda i:i-256)
    print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()





    share|improve this answer























    • Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
      – Eelco Hoogendoorn
      Jan 21 '14 at 22:45













    up vote
    8
    down vote










    up vote
    8
    down vote









    This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.



    class Distribution(object):
    """
    draws samples from a one dimensional probability distribution,
    by means of inversion of a discrete inverstion of a cumulative density function

    the pdf can be sorted first to prevent numerical error in the cumulative sum
    this is set as default; for big density functions with high contrast,
    it is absolutely necessary, and for small density functions,
    the overhead is minimal

    a call to this distibution object returns indices into density array
    """
    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
    self.shape = pdf.shape
    self.pdf = pdf.ravel()
    self.sort = sort
    self.interpolation = interpolation
    self.transform = transform

    #a pdf can not be negative
    assert(np.all(pdf>=0))

    #sort the pdf by magnitude
    if self.sort:
    self.sortindex = np.argsort(self.pdf, axis=None)
    self.pdf = self.pdf[self.sortindex]
    #construct the cumulative distribution function
    self.cdf = np.cumsum(self.pdf)
    @property
    def ndim(self):
    return len(self.shape)
    @property
    def sum(self):
    """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
    return self.cdf[-1]
    def __call__(self, N):
    """draw """
    #pick numbers which are uniformly random over the cumulative distribution function
    choice = np.random.uniform(high = self.sum, size = N)
    #find the indices corresponding to this point on the CDF
    index = np.searchsorted(self.cdf, choice)
    #if necessary, map the indices back to their original ordering
    if self.sort:
    index = self.sortindex[index]
    #map back to multi-dimensional indexing
    index = np.unravel_index(index, self.shape)
    index = np.vstack(index)
    #is this a discrete or piecewise continuous distribution?
    if self.interpolation:
    index = index + np.random.uniform(size=index.shape)
    return self.transform(index)


    if __name__=='__main__':
    shape = 3,3
    pdf = np.ones(shape)
    pdf[1]=0
    dist = Distribution(pdf, transform=lambda i:i-1.5)
    print dist(10)
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()


    And as a more real-world relevant example:



    x = np.linspace(-100, 100, 512)
    p = np.exp(-x**2)
    pdf = p[:,None]*p[None,:] #2d gaussian
    dist = Distribution(pdf, transform=lambda i:i-256)
    print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()





    share|improve this answer














    This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.



    class Distribution(object):
    """
    draws samples from a one dimensional probability distribution,
    by means of inversion of a discrete inverstion of a cumulative density function

    the pdf can be sorted first to prevent numerical error in the cumulative sum
    this is set as default; for big density functions with high contrast,
    it is absolutely necessary, and for small density functions,
    the overhead is minimal

    a call to this distibution object returns indices into density array
    """
    def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
    self.shape = pdf.shape
    self.pdf = pdf.ravel()
    self.sort = sort
    self.interpolation = interpolation
    self.transform = transform

    #a pdf can not be negative
    assert(np.all(pdf>=0))

    #sort the pdf by magnitude
    if self.sort:
    self.sortindex = np.argsort(self.pdf, axis=None)
    self.pdf = self.pdf[self.sortindex]
    #construct the cumulative distribution function
    self.cdf = np.cumsum(self.pdf)
    @property
    def ndim(self):
    return len(self.shape)
    @property
    def sum(self):
    """cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
    return self.cdf[-1]
    def __call__(self, N):
    """draw """
    #pick numbers which are uniformly random over the cumulative distribution function
    choice = np.random.uniform(high = self.sum, size = N)
    #find the indices corresponding to this point on the CDF
    index = np.searchsorted(self.cdf, choice)
    #if necessary, map the indices back to their original ordering
    if self.sort:
    index = self.sortindex[index]
    #map back to multi-dimensional indexing
    index = np.unravel_index(index, self.shape)
    index = np.vstack(index)
    #is this a discrete or piecewise continuous distribution?
    if self.interpolation:
    index = index + np.random.uniform(size=index.shape)
    return self.transform(index)


    if __name__=='__main__':
    shape = 3,3
    pdf = np.ones(shape)
    pdf[1]=0
    dist = Distribution(pdf, transform=lambda i:i-1.5)
    print dist(10)
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()


    And as a more real-world relevant example:



    x = np.linspace(-100, 100, 512)
    p = np.exp(-x**2)
    pdf = p[:,None]*p[None,:] #2d gaussian
    dist = Distribution(pdf, transform=lambda i:i-256)
    print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
    import matplotlib.pyplot as pp
    pp.scatter(*dist(1000))
    pp.show()






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 13 '14 at 21:53

























    answered Jan 13 '14 at 21:20









    Eelco Hoogendoorn

    6,26512234




    6,26512234












    • Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
      – Eelco Hoogendoorn
      Jan 21 '14 at 22:45


















    • Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
      – Eelco Hoogendoorn
      Jan 21 '14 at 22:45
















    Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
    – Eelco Hoogendoorn
    Jan 21 '14 at 22:45




    Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
    – Eelco Hoogendoorn
    Jan 21 '14 at 22:45










    up vote
    3
    down vote













    import numpy as np
    import scipy.interpolate as interpolate

    def inverse_transform_sampling(data, n_bins, n_samples):
    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
    cum_values = np.zeros(bin_edges.shape)
    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
    inv_cdf = interpolate.interp1d(cum_values, bin_edges)
    r = np.random.rand(n_samples)
    return inv_cdf(r)


    So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.






    share|improve this answer



















    • 9




      Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
      – Jost
      Jan 20 '17 at 14:10















    up vote
    3
    down vote













    import numpy as np
    import scipy.interpolate as interpolate

    def inverse_transform_sampling(data, n_bins, n_samples):
    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
    cum_values = np.zeros(bin_edges.shape)
    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
    inv_cdf = interpolate.interp1d(cum_values, bin_edges)
    r = np.random.rand(n_samples)
    return inv_cdf(r)


    So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.






    share|improve this answer



















    • 9




      Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
      – Jost
      Jan 20 '17 at 14:10













    up vote
    3
    down vote










    up vote
    3
    down vote









    import numpy as np
    import scipy.interpolate as interpolate

    def inverse_transform_sampling(data, n_bins, n_samples):
    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
    cum_values = np.zeros(bin_edges.shape)
    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
    inv_cdf = interpolate.interp1d(cum_values, bin_edges)
    r = np.random.rand(n_samples)
    return inv_cdf(r)


    So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.






    share|improve this answer














    import numpy as np
    import scipy.interpolate as interpolate

    def inverse_transform_sampling(data, n_bins, n_samples):
    hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
    cum_values = np.zeros(bin_edges.shape)
    cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
    inv_cdf = interpolate.interp1d(cum_values, bin_edges)
    r = np.random.rand(n_samples)
    return inv_cdf(r)


    So if we give our data sample that has a specific distribution, the inverse_transform_sampling function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples variable.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 13 '15 at 12:35

























    answered Jul 13 '15 at 10:19









    ThePredator

    4,69133255




    4,69133255








    • 9




      Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
      – Jost
      Jan 20 '17 at 14:10














    • 9




      Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
      – Jost
      Jan 20 '17 at 14:10








    9




    9




    Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
    – Jost
    Jan 20 '17 at 14:10




    Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
    – Jost
    Jan 20 '17 at 14:10










    up vote
    1
    down vote













    I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).



    def metropolis_hastings(target_density, size=500000):
    burnin_size = 10000
    size += burnin_size
    x0 = np.array([[0, 0]])
    xt = x0
    samples =
    for i in range(size):
    xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
    accept_prob = (target_density(xt_candidate))/(target_density(xt))
    if np.random.uniform(0, 1) < accept_prob:
    xt = xt_candidate
    samples.append(xt)
    samples = np.array(samples[burnin_size:])
    samples = np.reshape(samples, [samples.shape[0], 2])
    return samples


    This function requires a function target_density which takes in a data-point and computes its probability.



    For details check-out this detailed answer of mine.






    share|improve this answer

























      up vote
      1
      down vote













      I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).



      def metropolis_hastings(target_density, size=500000):
      burnin_size = 10000
      size += burnin_size
      x0 = np.array([[0, 0]])
      xt = x0
      samples =
      for i in range(size):
      xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
      accept_prob = (target_density(xt_candidate))/(target_density(xt))
      if np.random.uniform(0, 1) < accept_prob:
      xt = xt_candidate
      samples.append(xt)
      samples = np.array(samples[burnin_size:])
      samples = np.reshape(samples, [samples.shape[0], 2])
      return samples


      This function requires a function target_density which takes in a data-point and computes its probability.



      For details check-out this detailed answer of mine.






      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).



        def metropolis_hastings(target_density, size=500000):
        burnin_size = 10000
        size += burnin_size
        x0 = np.array([[0, 0]])
        xt = x0
        samples =
        for i in range(size):
        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
        accept_prob = (target_density(xt_candidate))/(target_density(xt))
        if np.random.uniform(0, 1) < accept_prob:
        xt = xt_candidate
        samples.append(xt)
        samples = np.array(samples[burnin_size:])
        samples = np.reshape(samples, [samples.shape[0], 2])
        return samples


        This function requires a function target_density which takes in a data-point and computes its probability.



        For details check-out this detailed answer of mine.






        share|improve this answer












        I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).



        def metropolis_hastings(target_density, size=500000):
        burnin_size = 10000
        size += burnin_size
        x0 = np.array([[0, 0]])
        xt = x0
        samples =
        for i in range(size):
        xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
        accept_prob = (target_density(xt_candidate))/(target_density(xt))
        if np.random.uniform(0, 1) < accept_prob:
        xt = xt_candidate
        samples.append(xt)
        samples = np.array(samples[burnin_size:])
        samples = np.reshape(samples, [samples.shape[0], 2])
        return samples


        This function requires a function target_density which takes in a data-point and computes its probability.



        For details check-out this detailed answer of mine.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        Abdul Fatir

        4,00421641




        4,00421641






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21100716%2ffast-arbitrary-distribution-random-sampling%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Costa Masnaga

            Fotorealismo

            Sidney Franklin