Fast arbitrary distribution random sampling
up vote
13
down vote
favorite
The random
module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss
will sample random point from a normal distribution with a given mean and sigma values.
I'm looking for a way to extract a number N
of random samples between a given interval using my own distribution as fast as possible in python
. This is what I mean:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
where ran_func_sample
is what I'm after and a, b
are the limits from which to draw the samples. Is there anything of that sort in python
?
python performance random
|
show 2 more comments
up vote
13
down vote
favorite
The random
module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss
will sample random point from a normal distribution with a given mean and sigma values.
I'm looking for a way to extract a number N
of random samples between a given interval using my own distribution as fast as possible in python
. This is what I mean:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
where ran_func_sample
is what I'm after and a, b
are the limits from which to draw the samples. Is there anything of that sort in python
?
python performance random
You can just call your function N times. However, you still need to specify what distribution you want yourx
values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
2
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the randomx
values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving ax**2
function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34
1
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06
|
show 2 more comments
up vote
13
down vote
favorite
up vote
13
down vote
favorite
The random
module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss
will sample random point from a normal distribution with a given mean and sigma values.
I'm looking for a way to extract a number N
of random samples between a given interval using my own distribution as fast as possible in python
. This is what I mean:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
where ran_func_sample
is what I'm after and a, b
are the limits from which to draw the samples. Is there anything of that sort in python
?
python performance random
The random
module (http://docs.python.org/2/library/random.html) has several fixed functions to randomly sample from. For example random.gauss
will sample random point from a normal distribution with a given mean and sigma values.
I'm looking for a way to extract a number N
of random samples between a given interval using my own distribution as fast as possible in python
. This is what I mean:
def my_dist(x):
# Some distribution, assume c1,c2,c3 and c4 are known.
f = c1*exp(-((x-c2)**c3)/c4)
return f
# Draw N random samples from my distribution between given limits a,b.
N = 1000
N_rand_samples = ran_func_sample(my_dist, a, b, N)
where ran_func_sample
is what I'm after and a, b
are the limits from which to draw the samples. Is there anything of that sort in python
?
python performance random
python performance random
edited Sep 16 '16 at 15:38
asked Jan 13 '14 at 20:26
Gabriel
10.8k33114228
10.8k33114228
You can just call your function N times. However, you still need to specify what distribution you want yourx
values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
2
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the randomx
values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving ax**2
function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34
1
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06
|
show 2 more comments
You can just call your function N times. However, you still need to specify what distribution you want yourx
values to be chosen from.
– BrenBarn
Jan 13 '14 at 20:28
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
2
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the randomx
values to be chosen uniformly from that interval, or in some other way?
– BrenBarn
Jan 13 '14 at 20:32
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving ax**2
function and not a distribution. I'll try to fix that now.
– Gabriel
Jan 13 '14 at 20:34
1
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06
You can just call your function N times. However, you still need to specify what distribution you want your
x
values to be chosen from.– BrenBarn
Jan 13 '14 at 20:28
You can just call your function N times. However, you still need to specify what distribution you want your
x
values to be chosen from.– BrenBarn
Jan 13 '14 at 20:28
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
2
2
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random
x
values to be chosen uniformly from that interval, or in some other way?– BrenBarn
Jan 13 '14 at 20:32
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random
x
values to be chosen uniformly from that interval, or in some other way?– BrenBarn
Jan 13 '14 at 20:32
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a
x**2
function and not a distribution. I'll try to fix that now.– Gabriel
Jan 13 '14 at 20:34
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a
x**2
function and not a distribution. I'll try to fix that now.– Gabriel
Jan 13 '14 at 20:34
1
1
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06
|
show 2 more comments
4 Answers
4
active
oldest
votes
up vote
11
down vote
accepted
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
- http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
- Pythonic way to select list elements with different probability
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
add a comment |
up vote
8
down vote
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
add a comment |
up vote
3
down vote
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling
function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples
variable.
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
add a comment |
up vote
1
down vote
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density
which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
add a comment |
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
11
down vote
accepted
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
- http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
- Pythonic way to select list elements with different probability
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
add a comment |
up vote
11
down vote
accepted
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
- http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
- Pythonic way to select list elements with different probability
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
add a comment |
up vote
11
down vote
accepted
up vote
11
down vote
accepted
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
- http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
- Pythonic way to select list elements with different probability
You need to use Inverse transform sampling method to get random values distributed according to a law you want. Using this method you can just apply inverted function
to random numbers having standard uniform distribution in the interval [0,1].
After you find the inverted function, you get 1000 numbers distributed according to the needed distribution this obvious way:
[inverted_function(random.random()) for x in range(1000)]
More on Inverse Transform Sampling:
- http://en.wikipedia.org/wiki/Inverse_transform_sampling
Also, there is a good question on StackOverflow related to the topic:
- Pythonic way to select list elements with different probability
edited May 23 '17 at 12:17
Community♦
11
11
answered Jan 13 '14 at 20:45
Igor Chubin
35.9k784108
35.9k784108
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
add a comment |
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
1
1
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
I implemented a function doing the inverse transform sampling with a help of SymPy and asked for a review on a Code Review Stack Exchange: Link. Maybe someone can find it helpful.
– Georgy
Jun 11 at 15:59
add a comment |
up vote
8
down vote
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
add a comment |
up vote
8
down vote
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
add a comment |
up vote
8
down vote
up vote
8
down vote
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
This code implements the sampling of n-d discrete probability distributions. By setting a flag on the object, it can also be made to be used as a piecewise constant probability distribution, which can then be used to approximate arbitrary pdf's. Well, arbitrary pdfs with compact support; if you efficiently want to sample extremely long tails, a non-uniform description of the pdf would be required. But this is still efficient even for things like airy-point-spread functions (which I created it for, initially). The internal sorting of values is absolutely critical there to get accuracy; the many small values in the tails should contribute substantially, but they will get drowned out in fp accuracy without sorting.
class Distribution(object):
"""
draws samples from a one dimensional probability distribution,
by means of inversion of a discrete inverstion of a cumulative density function
the pdf can be sorted first to prevent numerical error in the cumulative sum
this is set as default; for big density functions with high contrast,
it is absolutely necessary, and for small density functions,
the overhead is minimal
a call to this distibution object returns indices into density array
"""
def __init__(self, pdf, sort = True, interpolation = True, transform = lambda x: x):
self.shape = pdf.shape
self.pdf = pdf.ravel()
self.sort = sort
self.interpolation = interpolation
self.transform = transform
#a pdf can not be negative
assert(np.all(pdf>=0))
#sort the pdf by magnitude
if self.sort:
self.sortindex = np.argsort(self.pdf, axis=None)
self.pdf = self.pdf[self.sortindex]
#construct the cumulative distribution function
self.cdf = np.cumsum(self.pdf)
@property
def ndim(self):
return len(self.shape)
@property
def sum(self):
"""cached sum of all pdf values; the pdf need not sum to one, and is imlpicitly normalized"""
return self.cdf[-1]
def __call__(self, N):
"""draw """
#pick numbers which are uniformly random over the cumulative distribution function
choice = np.random.uniform(high = self.sum, size = N)
#find the indices corresponding to this point on the CDF
index = np.searchsorted(self.cdf, choice)
#if necessary, map the indices back to their original ordering
if self.sort:
index = self.sortindex[index]
#map back to multi-dimensional indexing
index = np.unravel_index(index, self.shape)
index = np.vstack(index)
#is this a discrete or piecewise continuous distribution?
if self.interpolation:
index = index + np.random.uniform(size=index.shape)
return self.transform(index)
if __name__=='__main__':
shape = 3,3
pdf = np.ones(shape)
pdf[1]=0
dist = Distribution(pdf, transform=lambda i:i-1.5)
print dist(10)
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
And as a more real-world relevant example:
x = np.linspace(-100, 100, 512)
p = np.exp(-x**2)
pdf = p[:,None]*p[None,:] #2d gaussian
dist = Distribution(pdf, transform=lambda i:i-256)
print dist(1000000).mean(axis=1) #should be in the 1/sqrt(1e6) range
import matplotlib.pyplot as pp
pp.scatter(*dist(1000))
pp.show()
edited Jan 13 '14 at 21:53
answered Jan 13 '14 at 21:20
Eelco Hoogendoorn
6,26512234
6,26512234
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
add a comment |
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
Glad I could help. Does approximating the distribution as piecewise continuous suffice for your application? How fast this approach is depends on the resolution you aim for; generating the distribution is N log(N), and sampling has complexity N, with a low time constant. Though I havnt tested it, I could imagine it achieves sufficient accuracy much more efficiently in many scenarios, even where a closed form solution exists. But the main appeal to me is in the flexibility of the approach, permitting arbitrary distributions.
– Eelco Hoogendoorn
Jan 21 '14 at 22:45
add a comment |
up vote
3
down vote
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling
function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples
variable.
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
add a comment |
up vote
3
down vote
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling
function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples
variable.
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
add a comment |
up vote
3
down vote
up vote
3
down vote
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling
function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples
variable.
import numpy as np
import scipy.interpolate as interpolate
def inverse_transform_sampling(data, n_bins, n_samples):
hist, bin_edges = np.histogram(data, bins=n_bins, density=True)
cum_values = np.zeros(bin_edges.shape)
cum_values[1:] = np.cumsum(hist*np.diff(bin_edges))
inv_cdf = interpolate.interp1d(cum_values, bin_edges)
r = np.random.rand(n_samples)
return inv_cdf(r)
So if we give our data sample that has a specific distribution, the inverse_transform_sampling
function will return a dataset with exactly the same distribution. Here the advantage is that we can get our own sample size by specifying it in the n_samples
variable.
edited Jul 13 '15 at 12:35
answered Jul 13 '15 at 10:19
ThePredator
4,69133255
4,69133255
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
add a comment |
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
9
9
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
Either you are the same person or you probably should have quoted the code's origin in this blog. Which has some nice explanatory graphics as well.
– Jost
Jan 20 '17 at 14:10
add a comment |
up vote
1
down vote
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density
which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
add a comment |
up vote
1
down vote
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density
which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
add a comment |
up vote
1
down vote
up vote
1
down vote
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density
which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
I was in a similar situation but I wanted to sample from a multivariate distribution, so, I implemented a rudimentary version of Metropolis-Hastings (which is an MCMC method).
def metropolis_hastings(target_density, size=500000):
burnin_size = 10000
size += burnin_size
x0 = np.array([[0, 0]])
xt = x0
samples =
for i in range(size):
xt_candidate = np.array([np.random.multivariate_normal(xt[0], np.eye(2))])
accept_prob = (target_density(xt_candidate))/(target_density(xt))
if np.random.uniform(0, 1) < accept_prob:
xt = xt_candidate
samples.append(xt)
samples = np.array(samples[burnin_size:])
samples = np.reshape(samples, [samples.shape[0], 2])
return samples
This function requires a function target_density
which takes in a data-point and computes its probability.
For details check-out this detailed answer of mine.
answered 2 days ago
Abdul Fatir
4,00421641
4,00421641
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f21100716%2ffast-arbitrary-distribution-random-sampling%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You can just call your function N times. However, you still need to specify what distribution you want your
x
values to be chosen from.– BrenBarn
Jan 13 '14 at 20:28
My distribution is my function. I need to evaluate that function randomly N times between a certain interval.
– Gabriel
Jan 13 '14 at 20:30
2
Your function isn't a distribution. You need to decide what the distribution is on the arguments you call it with. If you want to pass it N random values "between a certain interval", where are you specifying that interval in your code example? Do you want the random
x
values to be chosen uniformly from that interval, or in some other way?– BrenBarn
Jan 13 '14 at 20:32
I forgot to specify the interval, I'll add it to the code. You are right, I explained myself poorly giving a
x**2
function and not a distribution. I'll try to fix that now.– Gabriel
Jan 13 '14 at 20:34
1
I have such code for discrete distributions. Everything can be approximated with a discrete distribution, and it makes things a lot simpler (though still nontrivial, to get numerical robustness). If that helps you I could wrap it up.
– Eelco Hoogendoorn
Jan 13 '14 at 21:06