Identifiability

Keywords: identifiability | mcmc | bayesian | Download Notebook

%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")

We generate some test data from $N(0,1)$:

from scipy.stats import norm
data = norm.rvs(size=100)
data

array([  1.36915564e+00,   1.05529290e+00,  -4.60818168e-01,
         4.43641268e-01,   1.04113884e+00,   5.37649494e-01,
         7.60942560e-01,  -1.19804968e+00,   2.60566303e-01,
        -1.59689277e-01,   1.20547931e+00,   5.49728756e-01,
        -3.98610594e-01,   1.17620621e+00,  -1.02786937e-01,
         5.69037802e-01,  -6.01246985e-01,  -1.13331329e+00,
         8.54294530e-01,  -3.08324755e-01,   1.70618430e-01,
         4.51807215e-01,  -9.09119383e-02,  -1.78929328e-01,
         5.08269848e-01,  -1.24816874e+00,   4.75595913e-01,
         1.54785631e+00,  -4.71245561e-01,   1.62311337e+00,
        -3.41351283e-01,  -1.80469802e-01,   2.11632172e+00,
         8.41353133e-01,  -7.59104066e-01,  -1.55689174e+00,
        -2.41292745e-01,   2.24845053e-01,   3.91140426e-01,
        -6.85331082e-01,   5.79668372e-01,   8.36376400e-01,
        -2.54014208e-01,   1.75048511e+00,  -3.77872885e-01,
        -1.25172135e+00,  -2.17600397e-01,   3.15190627e-01,
         3.09352205e-01,   5.82187822e-03,   8.46971134e-01,
        -1.27378792e+00,  -1.58238529e+00,   3.79882049e-01,
         4.05398087e-01,  -5.24250939e-01,   1.82095389e-01,
        -1.44264482e+00,  -8.30774322e-01,  -1.53947998e+00,
         3.71236071e-01,  -8.84748037e-01,   5.15176219e-01,
         2.75972541e-01,  -7.00062965e-01,   1.48180541e+00,
         2.61253233e-01,  -1.14039049e-01,   8.74695837e-01,
         2.92856746e+00,  -9.60566331e-01,   1.50764549e-01,
        -1.95244936e-03,   6.28764490e-01,   9.96449749e-01,
         6.79706207e-01,   1.79320769e-01,   5.80139066e-01,
        -5.35478677e-01,   1.42260090e+00,  -1.54703643e-01,
         3.67620982e-01,   6.78943636e-01,  -8.96368493e-01,
        -4.90099004e-01,  -7.11463855e-01,  -1.57853576e+00,
         2.33149688e+00,  -6.36936390e-01,   4.93011087e-01,
        -1.55102354e-01,   6.52594170e-01,   2.07283645e+00,
        -1.41202558e+00,  -7.99693611e-01,  -5.45509876e-01,
         1.20850780e+00,   7.32805993e-01,  -6.08890816e-01,
         4.91920477e-01])

We fit this data using the following model:

\[y \sim N(\mu, \sigma)\\ \mu = \alpha_1 + \alpha_2\\ \alpha_1 \sim Unif(-\infty, \infty)\\ \alpha_2 \sim Unif(-\infty, \infty)\\ \sigma \sim HalfCauchy(0,1)\]

import pymc3 as pm

In our sampler, we have chosen njobs=2 which allows us to run on multiple processes, generating two separate chains.

with pm.Model() as ni:
    sigma = pm.HalfCauchy("sigma", beta=1)
    alpha1=pm.Uniform('alpha1', lower=-10**6, upper=10**6)
    alpha2=pm.Uniform('alpha2', lower=-10**6, upper=10**6)
    mu = pm.Deterministic("mu", alpha1 + alpha2)
    y = pm.Normal("data", mu=mu, sd=sigma, observed=data)
    stepper=pm.Metropolis()
    traceni = pm.sample(100000, step=stepper, njobs=2)

Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Metropolis: [alpha2]
>Metropolis: [alpha1]
>Metropolis: [sigma]
Sampling 2 chains: 100%|██████████| 201000/201000 [01:09<00:00, 2875.45draws/s]
The gelman-rubin statistic is larger than 1.2 for some parameters.
The estimated number of effective samples is smaller than 200 for some parameters.

pm.model_to_graphviz(ni)

svg

pm.summary(traceni)

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
sigma	0.929859	0.067029	0.000488	0.798521	1.059213	21927.306462	1.000014
alpha1	0.965169	1.275430	0.125124	-1.363886	3.242092	3.084729	1.371864
alpha2	-0.829637	1.276056	0.125050	-3.128876	1.486237	3.073467	1.372306
mu	0.135532	0.091854	0.001345	-0.037689	0.322161	4918.178358	1.000050

pm.traceplot(traceni);

//anaconda/envs/py3l/lib/python3.6/site-packages/matplotlib/axes/_base.py:3449: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')

png

Look at our traces for $\alpha_1$ and $\alpha_2$. These are bad, and worse, they look entirely different for two chains. Despite this, $\mu$ looks totally fine. Our trac

df=pm.trace_to_dataframe(traceni)
df.corr()

	sigma	alpha1	alpha2	mu
sigma	1.000000	0.001891	-0.002385	-0.006882
alpha1	0.001891	1.000000	-0.997408	0.029191
alpha2	-0.002385	-0.997408	1.000000	0.042806
mu	-0.006882	0.029191	0.042806	1.000000

Just like in our uncentered regression example, we have $\alpha_1$ and $\alpha_2$ sharing information: they are totally negatively correlated and unidentifiable. Indeed our intuition probably told us as much.

pm.autocorrplot(traceni)

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x11c706198>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11cb94240>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11cbe0390>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11cd9e470>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11ce6f160>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11c69f4a8>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x11f4d8438>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11e706780>]], dtype=object)

png

A look at the effective number of samples using two chains tells us that we have only one effective sample for $\alpha_1$ and $\alpha_2$.

pm.effective_n(traceni)

{'alpha1': 3.0847290976843218,
 'alpha2': 3.0734670584705572,
 'mu': 4918.1783580295532,
 'sigma': 21927.306461639713}

The Gelman-Rubin statistic is awful for them. No convergence.

pm.gelman_rubin(traceni)

{'alpha1': 1.3718637421441382,
 'alpha2': 1.3723057686966671,
 'mu': 1.0000502933374407,
 'sigma': 1.0000144819583978}

Its going to be hard to break this unidentifiability. We try by forcing $\alpha_2$ to be negative in our prior

with pm.Model() as ni2:
    sigma = pm.HalfCauchy("sigma", beta=1)
    alpha1=pm.Normal('alpha1', mu=5, sd=1)
    alpha2=pm.Normal('alpha2', mu=-5, sd=1)
    mu = pm.Deterministic("mu", alpha1 + alpha2)
    y = pm.Normal("data", mu=mu, sd=sigma, observed=data)
    #stepper=pm.Metropolis()
    #traceni2 = pm.sample(100000, step=stepper, njobs=2)
    traceni2 = pm.sample(100000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [alpha2, alpha1, sigma]
Sampling 2 chains: 100%|██████████| 201000/201000 [08:29<00:00, 394.59draws/s]

pm.model_to_graphviz(ni2)

svg

Notice we are using the built in NUTS sampler. It takes longer but explores the distributions far better. This is directly related to our priors imposing regions. I could not even run the previous sampler in any reasonable time in NUTS.

pm.traceplot(traceni2);

//anaconda/envs/py3l/lib/python3.6/site-packages/matplotlib/axes/_base.py:3449: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')

png

Our extremely strong priors have helped us do a much better job.

pm.summary(traceni2)

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
alpha1	5.066310	0.708625	0.002569	3.678977	6.456753	77586.667382	1.000020
alpha2	-4.931296	0.708567	0.002562	-6.333776	-3.554861	77570.663271	1.000019
sigma	0.930199	0.066613	0.000198	0.805719	1.064302	96892.104502	0.999996
mu	0.135013	0.093098	0.000204	-0.045187	0.319260	195950.078927	0.999996

Our effective sample size is still poor and our traces still look dodgy, but things are better.

pm.effective_n(traceni2)

{'alpha1': 77586.667381948821,
 'alpha2': 77570.663270675213,
 'mu': 195950.07892697665,
 'sigma': 96892.104501958092}

pm.gelman_rubin(traceni2)

{'alpha1': 1.0000203740430271,
 'alpha2': 1.000018798766553,
 'mu': 0.99999646960401545,
 'sigma': 0.99999594999714658}

..and this shows in our Gelman-Rubin statistics as well…

pm.trace_to_dataframe(traceni2).corr()

	alpha1	alpha2	sigma	mu
alpha1	1.000000	-0.991369	-0.003822	0.066316
alpha2	-0.991369	1.000000	0.003936	0.065067
sigma	-0.003822	0.003936	1.000000	0.000868
mu	0.066316	0.065067	0.000868	1.000000

..but our unidentifiability is still high when we look at the correlation. This reflects the fundamental un-identifiability and sharing of information in our model since $\mu = \alpha_1 +\alpha_2$: all the priors do is artificially peg one of the parameters. And once one is pegged the other is too because of the symmetry.

Contents