Sampling a normal using a t

Generating samples from N(0,1)
Rejection Sampling Implementation
Expectation values
Expectations from Importance Sampling
Some notes on variance
A different importance sampling estimator

%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")

Generating samples from N(0,1)

In what I call ny “fake-news” examples, we are going to generate samples from $N(0,1)$ using a t-distribution with 2 degrees of freedom to majorize it.

from scipy.stats import norm, t
n = norm()
t2 = t(2)

Here is a plot of the two distributions. Note that since the t has larger tails, it is shorter than the normal in the middle..we also make a rugplot of samples from the t, since we will be using these samples

with sns.plotting_context('poster'):
    xx = np.linspace(-6., 6., 100)
    plt.plot(xx, n.pdf(xx), label='N(0,1)');
    plt.plot(xx, t2.pdf(xx), label = 't_2');
    t2samps = t2.rvs(200)
    ax = plt.gca()
    sns.rugplot(t2samps, ax=ax);
    plt.xlim(-6, 6)
    plt.legend();

png

We are interested in finding the $f/g$ for these distributions, since we will want to set $M$ to the supremum of this ratio across the domain we are interested in

fbyg = n.pdf(xx)/t2.pdf(xx)
plt.plot(xx, fbyg);

png

As you can see, the supremum is achived at two points. Lets get one and thus the $M$.

M = fbyg[np.argmax(fbyg)]
M

1.2569255634535634

Lets rescale the t by $M$ and plot it so that we can see the majorization.

with sns.plotting_context('poster'):
    xx = np.linspace(-6., 6., 100)
    plt.plot(xx, n.pdf(xx), label='N(0,1)');
    plt.plot(xx, M*t2.pdf(xx), label = 'M x t_2');
    plt.xlim(-6, 6)
    plt.legend();

png

Rejection Sampling Implementation

# domain limits
xmin = -6 # the lower limit of our domain
xmax = 6 # the upper limit of our domain


N = 10000 # the total of samples we wish to generate
accepted = 0 # the number of accepted samples
samples = np.zeros(N)
count = 0 # the total count of proposals
outside = 0
f = norm.pdf
g = t2.pdf

# generation loop
while (accepted < N):
    
    while 1:
        xproposal = t2.rvs()
        if xproposal > xmin and xproposal < xmax:
            break
        outside+=1
    # pick a uniform number on [0, 1)
    y = np.random.uniform(0,1)
    
    # Do the accept/reject comparison
    if y < f(xproposal)/(M*g(xproposal)):
        samples[accepted] = xproposal
        accepted += 1
    
    count +=1
    
print("Count", count, "Accepted", accepted, "Outside", outside)



# plot the histogram
sns.distplot(samples);
plt.plot(xx, n.pdf(xx), label='N(0,1)');


# turn on the legend
plt.legend();

Count 12149 Accepted 10000 Outside 307

png

Notice that the number of rejected samples is roughly 2500. This is because the probability of acceptance is $1/M$.

Expectation values

Lets calculate the expectation value $E_f[h]$ using the rejection samples obtained.

Here is a $h$:

hfun = lambda x: np.log(x**2) - 2*x**2 +2*x +100
plt.plot(xx, hfun(xx), label='h');
plt.plot(xx, f(xx)*hfun(xx), label='f*h');
plt.plot(xx, 100*g(xx), label = '100*g')
plt.legend();

png

and here is its expectation:

exp_f_of_h = np.mean(hfun(samples))
exp_f_of_h

96.72828784400869

Expectations from Importance Sampling

Remember that in imporance sampling we are calculating

\[E_f[h(x)] = E_g[ \frac{f(x)}{g(x)} h(x)] = E_g[w(x)h(x)]\]

where

\[w(x) = f(x)/g(x)\]

t2.cdf(6) - t2.cdf(-6), n.cdf(6) - n.cdf(-6)

(0.97332852678457527, 0.9999999980268246)

normer = (t2.cdf(6) - t2.cdf(-6))/(n.cdf(6) - n.cdf(-6))
normer

0.97332852870512321

t2samps = t2.rvs(10300)
t2samps = t2samps[ (t2samps<xmax) & (t2samps>xmin)]
print(t2samps.shape)
fbygfunc = lambda x: (normer*n.pdf(x))/t2.pdf(x)
expec = np.mean(hfun(t2samps)*fbygfunc(t2samps))
expec

(10005,)

96.597949242826999

Some notes on variance

Lets check the “vanilla” monte-carlo variance

vanilla=[]
for k in range(1000):
    vm = np.mean(hfun(n.rvs(2000)))
    vanilla.append(vm)
np.mean(vanilla), np.std(vanilla)

(96.730847038738176, 0.068037907988511573)

Lets compare the importance sampler’s variance:

eval_is=[]
for k in range(1000):
    t2samps = t2.rvs(1000)
    t2samps = t2samps[ (t2samps<xmax) & (t2samps>xmin)]
    fbygfunc = lambda x: (normer*n.pdf(x))/t2.pdf(x)
    expec = np.mean((hfun(t2samps)*fbygfunc(t2samps)))
    eval_is.append(expec)

np.mean(eval_is), np.std(eval_is)

(96.774862122671578, 1.0746718309648289)

A different importance sampling estimator

Wow, look at the variance, its really high. An alternative to the standard importance sampling estimator ( $(1/N) \sum_{x_i \sim g} w(x_i) h(x_i)$ ) is the “self normalized” estimator:

\[\frac{\sum_{x_i \sim g} w(x_i) h(x_i)}{\sum_{x_i \sim g} w(x_i)}\]

Notice that in the limit $N \to \infty$, the denominator converges to 1 and the entire expectation to out usual case. But in the case of a finite number of samples (and worse for smaller sample sizes), this latter estimator is much more stable, leading to lower variance. Lets try it (we have only 1000 samples)

eval_is=[]
for k in range(1000):
    t2samps = t2.rvs(1000)
    t2samps = t2samps[ (t2samps<xmax) & (t2samps>xmin)]
    fbygfunc = lambda x: (n.pdf(x))/t2.pdf(x)
    expec = np.mean((hfun(t2samps)*fbygfunc(t2samps)))
    expec = expec/np.mean(fbygfunc(t2samps))
    eval_is.append(expec)

np.mean(eval_is), np.std(eval_is)

(96.734905750040383, 0.085451606923249823)

I leave further experimentation, including trying other h and g, to you.