Poisson Regression - tools on islands, part 1

Setting up the model and data
- M1
Implementation in pymc
Fixing by centering

%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")
import pymc3 as pm

From Mcelreath:

The island societies of Oceania provide a natural experiment in technological evolution. Different historical island populations possessed tool kits of different size. These kits include fish hooks, axes, boats, hand plows, and many other types of tools. A number of theories predict that larger populations will both develop and sustain more complex tool kits. So the natural variation in population size induced by natural variation in island size in Oceania provides a natural experiment to test these ideas. It’s also suggested that contact rates among populations effectively increase population size, as it’s relevant to technological evolution. So variation in contact rates among Oceanic societies is also relevant. (McElreath 313)

Setting up the model and data

Some points to take into account:

sample size is not umber of rows, after all this is a count model
the data is small, so we will need regularizing to avoid overfitting
outcome will be total_tools which we will model as proportional to log(population) as theory says it depends on order of magnitude
number of tools incereases with contact rate
we will, over multiple attempts, be testing the idea that the impact of population on tool counts is increased by high contact. This is an example of an interaction. Specifically this is a positive interaction between log(population) and contact.

df=pd.read_csv("data/islands.csv", sep=';')
df

	culture	population	contact	total_tools	mean_TU
0	Malekula	1100	low	13	3.2
1	Tikopia	1500	low	22	4.7
2	Santa Cruz	3600	low	24	4.0
3	Yap	4791	high	43	5.0
4	Lau Fiji	7400	high	33	5.0
5	Trobriand	8000	high	19	4.0
6	Chuuk	9200	high	40	3.8
7	Manus	13000	low	28	6.6
8	Tonga	17500	high	55	5.4
9	Hawaii	275000	low	71	6.6

df['logpop']=np.log(df.population)
df['clevel']=(df.contact=='high')*1
df

	culture	population	contact	total_tools	mean_TU	logpop	clevel
0	Malekula	1100	low	13	3.2	7.003065	0
1	Tikopia	1500	low	22	4.7	7.313220	0
2	Santa Cruz	3600	low	24	4.0	8.188689	0
3	Yap	4791	high	43	5.0	8.474494	1
4	Lau Fiji	7400	high	33	5.0	8.909235	1
5	Trobriand	8000	high	19	4.0	8.987197	1
6	Chuuk	9200	high	40	3.8	9.126959	1
7	Manus	13000	low	28	6.6	9.472705	0
8	Tonga	17500	high	55	5.4	9.769956	1
9	Hawaii	275000	low	71	6.6	12.524526	0

Lets write down the model we plan to fit.

M1

\[\begin{eqnarray} T_i & \sim & Poisson(\lambda_i)\\ log(\lambda_i) & = & \alpha + \beta_P log(P_i) + \beta_C C_i + \beta_{PC} C_i log(P_i)\\ \alpha & \sim & N(0,100)\\ \beta_P & \sim & N(0,1)\\ \beta_C & \sim & N(0,1)\\ \beta_{PC} & \sim & N(0,1) \end{eqnarray}\]

The $\beta$s have strongly regularizing priors on them, because the sample is small, while the $\alpha$ prior is essentially a flat prior.

Implementation in pymc

import theano.tensor as t
with pm.Model() as m1:
    betap = pm.Normal("betap", 0, 1)
    betac = pm.Normal("betac", 0, 1)
    betapc = pm.Normal("betapc", 0, 1)
    alpha = pm.Normal("alpha", 0, 100)
    loglam = alpha + betap*df.logpop + betac*df.clevel + betapc*df.clevel*df.logpop
    y = pm.Poisson("ntools", mu=t.exp(loglam), observed=df.total_tools)
    

with m1:
    trace=pm.sample(10000, njobs=2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [alpha, betapc, betac, betap]
Sampling 2 chains: 100%|██████████| 21000/21000 [01:17<00:00, 269.30draws/s]
The acceptance probability does not match the target. It is 0.88659413662, but should be close to 0.8. Try to increase the number of tuning steps.

pm.summary(trace)

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
betap	0.264372	0.035345	0.000443	0.194924	0.333177	6225.835834	1.000744
betac	-0.083529	0.838570	0.009645	-1.700037	1.597633	6954.452047	1.000067
betapc	0.042019	0.091893	0.001062	-0.147786	0.214488	6987.273344	1.000076
alpha	0.934902	0.366691	0.004492	0.208745	1.645146	6253.602039	1.000709

pm.traceplot(trace);

//anaconda/envs/py3l/lib/python3.6/site-packages/matplotlib/axes/_base.py:3449: MatplotlibDeprecationWarning: 
The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.
  alternative='`bottom`', obj_type='argument')

png

pm.autocorrplot(trace);

png

Our traces an autocorrelations look pretty good. pymc3 does quick work on the model

Posteriors

pm.plot_posterior(trace);

png

Looking at the posteriors reveals something interesting. The posterior for $\beta_p$ is, as expected from theory, showing a positive effect. The posterior is fairly tightly constrained. The posteriors for $\beta_c$ and $\beta_{pc}$ both overlap 0 substantially, and seem comparatively poorly constrained.

At this point you might be willing to say that there is no substantial effect of contact rate, directly or through the interaction.

You would be wrong.

Posterior check with counterfactual predictions.

Lets get $\lambda$ traces for high-contact and low contact

lamlow = lambda logpop: trace['alpha']+trace['betap']*logpop
lamhigh = lambda logpop: trace['alpha']+(trace['betap'] + trace['betapc'])*logpop + trace['betac'] 

Now let us see what happens at an intermediate log(pop) of 8:

sns.distplot(lamhigh(8) - lamlow(8));
plt.axvline(0);

//anaconda/envs/py3l/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6499: MatplotlibDeprecationWarning: 
The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead.
  alternative="'density'", removal="3.1")

png

We can see evidence of a fairly strong positive effect of contact in this “counterfactual posterior”, with most of the weight above 0.

So what happened?

Posterior scatter plots

We make posterior scatter plots and this give us the answer.

def postscat(trace, thevars):
    d={}
    for v in thevars:
        d[v] = trace.get_values(v)
    df = pd.DataFrame.from_dict(d)
    return sns.pairplot(df)

postscat(trace,trace.varnames)

<seaborn.axisgrid.PairGrid at 0x116913160>

png

Look at the very strong negative correlations between $\alpha$ and $\beta_p$, and the very strong ones between $\beta_c$ and $\beta_{pc}$. The latter is the cause for the 0-overlaps. When $\beta_c$ is high, $\beta_{pc}$ must be low, and vice-versa. As a result, its not enough to observe just the marginal uncertainty of each parameter; you must look at the joint uncertainty of the correlated variables.

You would have seen that this might be a problem if you looked at $n_{eff}$:

pm.effective_n(trace)

{'alpha': 6253.6020387847238,
 'betac': 6954.4520465741789,
 'betap': 6225.8358335534804,
 'betapc': 6987.2733444159694}

Fixing by centering

As usual, centering the log-population fixes things:

df.logpop_c = df.logpop - df.logpop.mean()

  """Entry point for launching an IPython kernel.

with pm.Model() as m1c:
    betap = pm.Normal("betap", 0, 1)
    betac = pm.Normal("betac", 0, 1)
    betapc = pm.Normal("betapc", 0, 1)
    alpha = pm.Normal("alpha", 0, 100)
    loglam = alpha + betap*df.logpop_c + betac*df.clevel + betapc*df.clevel*df.logpop_c
    y = pm.Poisson("ntools", mu=t.exp(loglam), observed=df.total_tools)

with m1c:
    trace1c = pm.sample(10000, njobs=2)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [alpha, betapc, betac, betap]
Sampling 2 chains: 100%|██████████| 21000/21000 [00:16<00:00, 1284.28draws/s]

pm.effective_n(trace1c)

{'alpha': 13059.09068128995,
 'betac': 13486.057010456128,
 'betap': 14787.07665103561,
 'betapc': 14659.70637220611}

pm.summary(trace1c)

	mean	sd	mc_error	hpd_2.5	hpd_97.5	n_eff	Rhat
betap	0.263160	0.035343	0.000270	0.191687	0.330985	14787.076651	0.999950
betac	0.285020	0.116811	0.000988	0.051495	0.510018	13486.057010	0.999965
betapc	0.064929	0.168961	0.001466	-0.270487	0.394762	14659.706372	1.000043
alpha	3.311022	0.089226	0.000771	3.140614	3.491820	13059.090681	0.999956

postscat(trace1c,trace1c.varnames)

<seaborn.axisgrid.PairGrid at 0x111642160>

png

pm.plot_posterior(trace1c);

png

How do we decide whether the interaction is significant or not? We’ll use model comparison to achieve this!