{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"hide": true
},
"source": [
"# Python tech for distributions\n",
"\n",
"##### Keywords: bernoulli distribution, uniform distribution, empirical distribution, elections"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"hide": true
},
"outputs": [],
"source": [
"# The %... is an iPython thing, and is not part of the Python language.\n",
"# In this case we're just telling the plotting library to draw things on\n",
"# the notebook, instead of on a separate window.\n",
"%matplotlib inline\n",
"# See all the \"as ...\" contructs? They're just aliasing the package names.\n",
"# That way we can call methods like plt.plot() instead of matplotlib.pyplot.plot().\n",
"import numpy as np\n",
"import scipy as sp\n",
"import matplotlib as mpl\n",
"import matplotlib.cm as cm\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import seaborn.apionly as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us consider a table of probabilities that [PredictWise](http://www.predictwise.com/results/2012/president) made on October 2, 2012 for the US presidential elections. \n",
"PredictWise aggregated polling data and, for each state, estimated the probability that the Obama or Romney would win. Here are those estimated probabilities, loaded inmto a `pandas` dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Obama | \n",
" Romney | \n",
" States | \n",
" Votes | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000 | \n",
" 1.000 | \n",
" Alabama | \n",
" 9 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.000 | \n",
" 1.000 | \n",
" Alaska | \n",
" 3 | \n",
"
\n",
" \n",
" 2 | \n",
" 0.062 | \n",
" 0.938 | \n",
" Arizona | \n",
" 11 | \n",
"
\n",
" \n",
" 3 | \n",
" 0.000 | \n",
" 1.000 | \n",
" Arkansas | \n",
" 6 | \n",
"
\n",
" \n",
" 4 | \n",
" 1.000 | \n",
" 0.000 | \n",
" California | \n",
" 55 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Obama Romney States Votes\n",
"0 0.000 1.000 Alabama 9\n",
"1 0.000 1.000 Alaska 3\n",
"2 0.062 0.938 Arizona 11\n",
"3 0.000 1.000 Arkansas 6\n",
"4 1.000 0.000 California 55"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise = pd.read_csv('data/predictwise.csv')\n",
"predictwise.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can set the states as an index..after all, each state has a unique name."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Obama | \n",
" Romney | \n",
" Votes | \n",
"
\n",
" \n",
" States | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 0.000 | \n",
" 1.000 | \n",
" 9 | \n",
"
\n",
" \n",
" Alaska | \n",
" 0.000 | \n",
" 1.000 | \n",
" 3 | \n",
"
\n",
" \n",
" Arizona | \n",
" 0.062 | \n",
" 0.938 | \n",
" 11 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 0.000 | \n",
" 1.000 | \n",
" 6 | \n",
"
\n",
" \n",
" California | \n",
" 1.000 | \n",
" 0.000 | \n",
" 55 | \n",
"
\n",
" \n",
" Colorado | \n",
" 0.807 | \n",
" 0.193 | \n",
" 9 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 1.000 | \n",
" 0.000 | \n",
" 7 | \n",
"
\n",
" \n",
" Delaware | \n",
" 1.000 | \n",
" 0.000 | \n",
" 3 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 1.000 | \n",
" 0.000 | \n",
" 3 | \n",
"
\n",
" \n",
" Florida | \n",
" 0.720 | \n",
" 0.280 | \n",
" 29 | \n",
"
\n",
" \n",
" Georgia | \n",
" 0.004 | \n",
" 0.996 | \n",
" 16 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 1.000 | \n",
" 0.000 | \n",
" 4 | \n",
"
\n",
" \n",
" Idaho | \n",
" 0.000 | \n",
" 1.000 | \n",
" 4 | \n",
"
\n",
" \n",
" Illinois | \n",
" 1.000 | \n",
" 0.000 | \n",
" 20 | \n",
"
\n",
" \n",
" Indiana | \n",
" 0.036 | \n",
" 0.964 | \n",
" 11 | \n",
"
\n",
" \n",
" Iowa | \n",
" 0.837 | \n",
" 0.163 | \n",
" 6 | \n",
"
\n",
" \n",
" Kansas | \n",
" 0.000 | \n",
" 1.000 | \n",
" 6 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 0.000 | \n",
" 1.000 | \n",
" 8 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 0.000 | \n",
" 1.000 | \n",
" 8 | \n",
"
\n",
" \n",
" Maine | \n",
" 1.000 | \n",
" 0.000 | \n",
" 4 | \n",
"
\n",
" \n",
" Maryland | \n",
" 1.000 | \n",
" 0.000 | \n",
" 10 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 1.000 | \n",
" 0.000 | \n",
" 11 | \n",
"
\n",
" \n",
" Michigan | \n",
" 0.987 | \n",
" 0.013 | \n",
" 16 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 0.982 | \n",
" 0.018 | \n",
" 10 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 0.000 | \n",
" 1.000 | \n",
" 6 | \n",
"
\n",
" \n",
" Missouri | \n",
" 0.074 | \n",
" 0.926 | \n",
" 10 | \n",
"
\n",
" \n",
" Montana | \n",
" 0.046 | \n",
" 0.954 | \n",
" 3 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 0.000 | \n",
" 1.000 | \n",
" 5 | \n",
"
\n",
" \n",
" Nevada | \n",
" 0.851 | \n",
" 0.149 | \n",
" 6 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 0.857 | \n",
" 0.143 | \n",
" 4 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 0.998 | \n",
" 0.002 | \n",
" 14 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 0.985 | \n",
" 0.015 | \n",
" 5 | \n",
"
\n",
" \n",
" New York | \n",
" 1.000 | \n",
" 0.000 | \n",
" 29 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 0.349 | \n",
" 0.651 | \n",
" 15 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 0.025 | \n",
" 0.975 | \n",
" 3 | \n",
"
\n",
" \n",
" Ohio | \n",
" 0.890 | \n",
" 0.110 | \n",
" 18 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 0.000 | \n",
" 1.000 | \n",
" 7 | \n",
"
\n",
" \n",
" Oregon | \n",
" 0.976 | \n",
" 0.024 | \n",
" 7 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 0.978 | \n",
" 0.022 | \n",
" 20 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 1.000 | \n",
" 0.000 | \n",
" 4 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 0.000 | \n",
" 1.000 | \n",
" 9 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 0.001 | \n",
" 0.999 | \n",
" 3 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 0.001 | \n",
" 0.999 | \n",
" 11 | \n",
"
\n",
" \n",
" Texas | \n",
" 0.000 | \n",
" 1.000 | \n",
" 38 | \n",
"
\n",
" \n",
" Utah | \n",
" 0.000 | \n",
" 1.000 | \n",
" 6 | \n",
"
\n",
" \n",
" Vermont | \n",
" 1.000 | \n",
" 0.000 | \n",
" 3 | \n",
"
\n",
" \n",
" Virginia | \n",
" 0.798 | \n",
" 0.202 | \n",
" 13 | \n",
"
\n",
" \n",
" Washington | \n",
" 0.999 | \n",
" 0.001 | \n",
" 12 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 0.002 | \n",
" 0.998 | \n",
" 5 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 0.925 | \n",
" 0.075 | \n",
" 10 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 0.000 | \n",
" 1.000 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Obama Romney Votes\n",
"States \n",
"Alabama 0.000 1.000 9\n",
"Alaska 0.000 1.000 3\n",
"Arizona 0.062 0.938 11\n",
"Arkansas 0.000 1.000 6\n",
"California 1.000 0.000 55\n",
"Colorado 0.807 0.193 9\n",
"Connecticut 1.000 0.000 7\n",
"Delaware 1.000 0.000 3\n",
"District of Columbia 1.000 0.000 3\n",
"Florida 0.720 0.280 29\n",
"Georgia 0.004 0.996 16\n",
"Hawaii 1.000 0.000 4\n",
"Idaho 0.000 1.000 4\n",
"Illinois 1.000 0.000 20\n",
"Indiana 0.036 0.964 11\n",
"Iowa 0.837 0.163 6\n",
"Kansas 0.000 1.000 6\n",
"Kentucky 0.000 1.000 8\n",
"Louisiana 0.000 1.000 8\n",
"Maine 1.000 0.000 4\n",
"Maryland 1.000 0.000 10\n",
"Massachusetts 1.000 0.000 11\n",
"Michigan 0.987 0.013 16\n",
"Minnesota 0.982 0.018 10\n",
"Mississippi 0.000 1.000 6\n",
"Missouri 0.074 0.926 10\n",
"Montana 0.046 0.954 3\n",
"Nebraska 0.000 1.000 5\n",
"Nevada 0.851 0.149 6\n",
"New Hampshire 0.857 0.143 4\n",
"New Jersey 0.998 0.002 14\n",
"New Mexico 0.985 0.015 5\n",
"New York 1.000 0.000 29\n",
"North Carolina 0.349 0.651 15\n",
"North Dakota 0.025 0.975 3\n",
"Ohio 0.890 0.110 18\n",
"Oklahoma 0.000 1.000 7\n",
"Oregon 0.976 0.024 7\n",
"Pennsylvania 0.978 0.022 20\n",
"Rhode Island 1.000 0.000 4\n",
"South Carolina 0.000 1.000 9\n",
"South Dakota 0.001 0.999 3\n",
"Tennessee 0.001 0.999 11\n",
"Texas 0.000 1.000 38\n",
"Utah 0.000 1.000 6\n",
"Vermont 1.000 0.000 3\n",
"Virginia 0.798 0.202 13\n",
"Washington 0.999 0.001 12\n",
"West Virginia 0.002 0.998 5\n",
"Wisconsin 0.925 0.075 10\n",
"Wyoming 0.000 1.000 3"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise = predictwise.set_index('States')\n",
"predictwise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## numpy\n",
"\n",
"Lets do a short intro to numpy. You can obtain a numpy array from a Pandas dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0. , 0.062, 0. , 1. , 0.807, 1. , 1. ,\n",
" 1. , 0.72 , 0.004, 1. , 0. , 1. , 0.036, 0.837,\n",
" 0. , 0. , 0. , 1. , 1. , 1. , 0.987, 0.982,\n",
" 0. , 0.074, 0.046, 0. , 0.851, 0.857, 0.998, 0.985,\n",
" 1. , 0.349, 0.025, 0.89 , 0. , 0.976, 0.978, 1. ,\n",
" 0. , 0.001, 0.001, 0. , 0. , 1. , 0.798, 0.999,\n",
" 0.002, 0.925, 0. ])"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise.Obama.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A numpy array has a shape:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(51,)"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise.Obama.values.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And a type...this makes them efficient..."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise.Obama.values.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can construct a dub-dataframe in pandas with this strange selection notation below:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Obama | \n",
" Romney | \n",
"
\n",
" \n",
" States | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Alabama | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Alaska | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Arizona | \n",
" 0.062 | \n",
" 0.938 | \n",
"
\n",
" \n",
" Arkansas | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" California | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Colorado | \n",
" 0.807 | \n",
" 0.193 | \n",
"
\n",
" \n",
" Connecticut | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Delaware | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" District of Columbia | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Florida | \n",
" 0.720 | \n",
" 0.280 | \n",
"
\n",
" \n",
" Georgia | \n",
" 0.004 | \n",
" 0.996 | \n",
"
\n",
" \n",
" Hawaii | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Idaho | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Illinois | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Indiana | \n",
" 0.036 | \n",
" 0.964 | \n",
"
\n",
" \n",
" Iowa | \n",
" 0.837 | \n",
" 0.163 | \n",
"
\n",
" \n",
" Kansas | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Kentucky | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Louisiana | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Maine | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Maryland | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Massachusetts | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Michigan | \n",
" 0.987 | \n",
" 0.013 | \n",
"
\n",
" \n",
" Minnesota | \n",
" 0.982 | \n",
" 0.018 | \n",
"
\n",
" \n",
" Mississippi | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Missouri | \n",
" 0.074 | \n",
" 0.926 | \n",
"
\n",
" \n",
" Montana | \n",
" 0.046 | \n",
" 0.954 | \n",
"
\n",
" \n",
" Nebraska | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Nevada | \n",
" 0.851 | \n",
" 0.149 | \n",
"
\n",
" \n",
" New Hampshire | \n",
" 0.857 | \n",
" 0.143 | \n",
"
\n",
" \n",
" New Jersey | \n",
" 0.998 | \n",
" 0.002 | \n",
"
\n",
" \n",
" New Mexico | \n",
" 0.985 | \n",
" 0.015 | \n",
"
\n",
" \n",
" New York | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" North Carolina | \n",
" 0.349 | \n",
" 0.651 | \n",
"
\n",
" \n",
" North Dakota | \n",
" 0.025 | \n",
" 0.975 | \n",
"
\n",
" \n",
" Ohio | \n",
" 0.890 | \n",
" 0.110 | \n",
"
\n",
" \n",
" Oklahoma | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Oregon | \n",
" 0.976 | \n",
" 0.024 | \n",
"
\n",
" \n",
" Pennsylvania | \n",
" 0.978 | \n",
" 0.022 | \n",
"
\n",
" \n",
" Rhode Island | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" South Carolina | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" South Dakota | \n",
" 0.001 | \n",
" 0.999 | \n",
"
\n",
" \n",
" Tennessee | \n",
" 0.001 | \n",
" 0.999 | \n",
"
\n",
" \n",
" Texas | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Utah | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
" Vermont | \n",
" 1.000 | \n",
" 0.000 | \n",
"
\n",
" \n",
" Virginia | \n",
" 0.798 | \n",
" 0.202 | \n",
"
\n",
" \n",
" Washington | \n",
" 0.999 | \n",
" 0.001 | \n",
"
\n",
" \n",
" West Virginia | \n",
" 0.002 | \n",
" 0.998 | \n",
"
\n",
" \n",
" Wisconsin | \n",
" 0.925 | \n",
" 0.075 | \n",
"
\n",
" \n",
" Wyoming | \n",
" 0.000 | \n",
" 1.000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Obama Romney\n",
"States \n",
"Alabama 0.000 1.000\n",
"Alaska 0.000 1.000\n",
"Arizona 0.062 0.938\n",
"Arkansas 0.000 1.000\n",
"California 1.000 0.000\n",
"Colorado 0.807 0.193\n",
"Connecticut 1.000 0.000\n",
"Delaware 1.000 0.000\n",
"District of Columbia 1.000 0.000\n",
"Florida 0.720 0.280\n",
"Georgia 0.004 0.996\n",
"Hawaii 1.000 0.000\n",
"Idaho 0.000 1.000\n",
"Illinois 1.000 0.000\n",
"Indiana 0.036 0.964\n",
"Iowa 0.837 0.163\n",
"Kansas 0.000 1.000\n",
"Kentucky 0.000 1.000\n",
"Louisiana 0.000 1.000\n",
"Maine 1.000 0.000\n",
"Maryland 1.000 0.000\n",
"Massachusetts 1.000 0.000\n",
"Michigan 0.987 0.013\n",
"Minnesota 0.982 0.018\n",
"Mississippi 0.000 1.000\n",
"Missouri 0.074 0.926\n",
"Montana 0.046 0.954\n",
"Nebraska 0.000 1.000\n",
"Nevada 0.851 0.149\n",
"New Hampshire 0.857 0.143\n",
"New Jersey 0.998 0.002\n",
"New Mexico 0.985 0.015\n",
"New York 1.000 0.000\n",
"North Carolina 0.349 0.651\n",
"North Dakota 0.025 0.975\n",
"Ohio 0.890 0.110\n",
"Oklahoma 0.000 1.000\n",
"Oregon 0.976 0.024\n",
"Pennsylvania 0.978 0.022\n",
"Rhode Island 1.000 0.000\n",
"South Carolina 0.000 1.000\n",
"South Dakota 0.001 0.999\n",
"Tennessee 0.001 0.999\n",
"Texas 0.000 1.000\n",
"Utah 0.000 1.000\n",
"Vermont 1.000 0.000\n",
"Virginia 0.798 0.202\n",
"Washington 0.999 0.001\n",
"West Virginia 0.002 0.998\n",
"Wisconsin 0.925 0.075\n",
"Wyoming 0.000 1.000"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise[['Obama', 'Romney']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...and when you get the `.values`, you get a 2D numpy array (see further down in this document)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0. , 1. ],\n",
" [ 0. , 1. ],\n",
" [ 0.062, 0.938],\n",
" [ 0. , 1. ],\n",
" [ 1. , 0. ],\n",
" [ 0.807, 0.193],\n",
" [ 1. , 0. ],\n",
" [ 1. , 0. ],\n",
" [ 1. , 0. ],\n",
" [ 0.72 , 0.28 ],\n",
" [ 0.004, 0.996],\n",
" [ 1. , 0. ],\n",
" [ 0. , 1. ],\n",
" [ 1. , 0. ],\n",
" [ 0.036, 0.964],\n",
" [ 0.837, 0.163],\n",
" [ 0. , 1. ],\n",
" [ 0. , 1. ],\n",
" [ 0. , 1. ],\n",
" [ 1. , 0. ],\n",
" [ 1. , 0. ],\n",
" [ 1. , 0. ],\n",
" [ 0.987, 0.013],\n",
" [ 0.982, 0.018],\n",
" [ 0. , 1. ],\n",
" [ 0.074, 0.926],\n",
" [ 0.046, 0.954],\n",
" [ 0. , 1. ],\n",
" [ 0.851, 0.149],\n",
" [ 0.857, 0.143],\n",
" [ 0.998, 0.002],\n",
" [ 0.985, 0.015],\n",
" [ 1. , 0. ],\n",
" [ 0.349, 0.651],\n",
" [ 0.025, 0.975],\n",
" [ 0.89 , 0.11 ],\n",
" [ 0. , 1. ],\n",
" [ 0.976, 0.024],\n",
" [ 0.978, 0.022],\n",
" [ 1. , 0. ],\n",
" [ 0. , 1. ],\n",
" [ 0.001, 0.999],\n",
" [ 0.001, 0.999],\n",
" [ 0. , 1. ],\n",
" [ 0. , 1. ],\n",
" [ 1. , 0. ],\n",
" [ 0.798, 0.202],\n",
" [ 0.999, 0.001],\n",
" [ 0.002, 0.998],\n",
" [ 0.925, 0.075],\n",
" [ 0. , 1. ]])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise[['Obama', 'Romney']].values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A \"reshape\" can convert a 1-D numpy array into a 2D one..."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0. ],\n",
" [ 0. ],\n",
" [ 0.062],\n",
" [ 0. ],\n",
" [ 1. ],\n",
" [ 0.807],\n",
" [ 1. ],\n",
" [ 1. ],\n",
" [ 1. ],\n",
" [ 0.72 ],\n",
" [ 0.004],\n",
" [ 1. ],\n",
" [ 0. ],\n",
" [ 1. ],\n",
" [ 0.036],\n",
" [ 0.837],\n",
" [ 0. ],\n",
" [ 0. ],\n",
" [ 0. ],\n",
" [ 1. ],\n",
" [ 1. ],\n",
" [ 1. ],\n",
" [ 0.987],\n",
" [ 0.982],\n",
" [ 0. ],\n",
" [ 0.074],\n",
" [ 0.046],\n",
" [ 0. ],\n",
" [ 0.851],\n",
" [ 0.857],\n",
" [ 0.998],\n",
" [ 0.985],\n",
" [ 1. ],\n",
" [ 0.349],\n",
" [ 0.025],\n",
" [ 0.89 ],\n",
" [ 0. ],\n",
" [ 0.976],\n",
" [ 0.978],\n",
" [ 1. ],\n",
" [ 0. ],\n",
" [ 0.001],\n",
" [ 0.001],\n",
" [ 0. ],\n",
" [ 0. ],\n",
" [ 1. ],\n",
" [ 0.798],\n",
" [ 0.999],\n",
" [ 0.002],\n",
" [ 0.925],\n",
" [ 0. ]])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictwise['Obama'].values.reshape(-1,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can construct a numpy array directly as well"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_array = np.array([1, 2, 3, 4])\n",
"my_array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**In general you should manipulate numpy arrays by using numpy module functions** (`np.mean`, for example). This is for efficiency purposes."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.5\n",
"2.5\n"
]
}
],
"source": [
"print(my_array.mean())\n",
"print(np.mean(my_array))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way we constructed the numpy array above seems redundant. After all we already had a regular python list. Indeed, it is the other ways we have to construct numpy arrays that make them super useful. \n",
"\n",
"There are many such numpy array *constructors*. Here are some commonly used constructors. Look them up in the documentation."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n"
]
},
{
"data": {
"text/plain": [
"array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(np.ones(10))\n",
"np.ones(10, dtype='int') # generates 10 integer ones"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dtype(float).itemsize # in bytes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy gains a lot of its efficiency from being typed. That is, all elements in the array have the same type, such as integer or floating point. The default type, as can be seen above, is a float of size appropriate for the machine (64 bit on a 64 bit machine)."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n",
"[ 0.09500079 0.56750287 0.88898257 0.03858948 0.22709491 0.46560001\n",
" 0.9110299 0.87729626 0.47123426 0.64749469]\n",
"The sample mean and standard devation are 0.031173 and 0.986213, respectively.\n"
]
}
],
"source": [
"print(np.zeros(10))\n",
"print(np.random.random(10))\n",
"normal_array = np.random.randn(1000)\n",
"print(\"The sample mean and standard devation are %f and %f, respectively.\" %(np.mean(normal_array), np.std(normal_array)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot a histogram of this normal distribution"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAADcVJREFUeJzt3X+sX/Vdx/Hna4WNDZyDcNNUfnj5o5nWRcXckOnMQlLmcF1W/EMCcaYqSbNkc8xotrIlEjUkXTTLFqPGBtAaG5YKGBqZSu0gc3/AdvmhAwqWbHSA/XG3BTc02dbx9o97mFfo7f3e77nf+/3ez30+kuZ7zud7vt/zbtO++r7nfM45qSokSe163bgLkCSNlkEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJatxZ4y4A4MILL6zp6elxlyFJa8rDDz/8jaqaWmq7iQj66elpZmdnx12GJK0pSY4Osp2HbiSpcUsGfZLbk5xM8viCsQuSHExypHs9f8F7NyV5JsnTSd49qsIlSYMZpKP/a+DqV43tAg5V1WbgULdOki3AdcBPdZ/58yQbVqxaSdKyLRn0VfUF4FuvGt4O7O2W9wLXLBj/bFV9t6q+BjwDXLFCtUqShjDsMfqNVXWsWz4ObOyWLwKeW7Dd893YayTZmWQ2yezc3NyQZUiSltL7ZGzNP7lk2U8vqao9VTVTVTNTU0vODpIkDWnYoD+RZBNA93qyG38BuGTBdhd3Y5KkMRk26A8AO7rlHcA9C8avS/KGJJcBm4Ev9StRktTHkhdMJbkDuBK4MMnzwM3AbmB/khuAo8C1AFX1RJL9wJPAKeCDVfWDEdUuSRrAkkFfVdcv8tbWRba/BbilT1HSapjede9A2z27e9uIK5FGyytjJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1Lizxl2ANIjpXfcOtN2zu7eNuJL+Wvq9aG2wo5ekxhn0ktQ4g16SGmfQS1LjegV9kt9J8kSSx5PckeScJBckOZjkSPd6/koVK0lavqGDPslFwIeBmap6G7ABuA7YBRyqqs3AoW5dkjQmfadXngW8Mcn3gTcB/wncBFzZvb8XeAD4WM/9SBNv0GmT0mobuqOvqheAPwG+DhwD/quq7gM2VtWxbrPjwMbeVUqShjZ0R98de98OXAa8CPxdkvcv3KaqKkkt8vmdwE6ASy+9dNgypJGzU9da1+dk7FXA16pqrqq+D9wN/AJwIskmgO715Ok+XFV7qmqmqmampqZ6lCFJOpM+Qf914O1J3pQkwFbgMHAA2NFtswO4p1+JkqQ+hj50U1UPJbkTeAQ4BTwK7AHOA/YnuQE4Cly7EoVKkobTa9ZNVd0M3Pyq4e8y391LkiaAV8ZKUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXF9HyUoTRQfEiK9lh29JDXOoJekxhn0ktQ4j9FrrDymLo2eHb0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXG9gj7JW5LcmeSpJIeT/HySC5IcTHKkez1/pYqVJC1f347+M8A/VdVPAD8DHAZ2AYeqajNwqFuXJI3J0EGf5EeBdwK3AVTV96rqRWA7sLfbbC9wTd8iJUnD69PRXwbMAX+V5NEktyY5F9hYVce6bY4DG/sWKUkaXp+gPwv4OeAvqupy4L951WGaqiqgTvfhJDuTzCaZnZub61GGJOlM+gT988DzVfVQt34n88F/IskmgO715Ok+XFV7qmqmqmampqZ6lCFJOpOhg76qjgPPJXlrN7QVeBI4AOzoxnYA9/SqUJLUy1k9P//bwL4krwe+Cvwm8/957E9yA3AUuLbnPiRJPfQK+qp6DJg5zVtb+3yvJGnleGWsJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJalzfefSSRmR6170Dbffs7m0jrkRrnR29JDXOoJekxhn0ktQ4j9FrJAY9vixp9OzoJalxdvRaFjt1ae2xo5ekxhn0ktQ4g16SGmfQS1LjDHpJapyzbqQ1znviaCl29JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxTq+U1onl3JDOqZhtsaOXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNa530CfZkOTRJP/QrV+Q5GCSI93r+f3LlCQNayU6+huBwwvWdwGHqmozcKhblySNSa+gT3IxsA24dcHwdmBvt7wXuKbPPiRJ/fTt6D8NfBR4ecHYxqo61i0fBzae7oNJdiaZTTI7NzfXswxJ0mKGDvok7wVOVtXDi21TVQXUIu/tqaqZqpqZmpoatgxJ0hL63L3yHcD7krwHOAd4c5K/BU4k2VRVx5JsAk6uRKGSpOEM3dFX1U1VdXFVTQPXAZ+vqvcDB4Ad3WY7gHt6VylJGtoo5tHvBt6V5AhwVbcuSRqTFXnwSFU9ADzQLX8T2LoS3ytJ6s8rYyWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMatyL1utPZN77p33CVIGhE7eklqnB29pNcY9Ce8Z3dvG3ElWgl29JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGuejBCUNzUcOrg1Dd/RJLklyf5InkzyR5MZu/IIkB5Mc6V7PX7lyJUnL1efQzSngd6tqC/B24INJtgC7gENVtRk41K1LksZk6KCvqmNV9Ui3/B3gMHARsB3Y2222F7imb5GSpOGtyMnYJNPA5cBDwMaqOta9dRzYuBL7kCQNp3fQJzkPuAv4SFV9e+F7VVVALfK5nUlmk8zOzc31LUOStIheQZ/kbOZDfl9V3d0Nn0iyqXt/E3DydJ+tqj1VNVNVM1NTU33KkCSdQZ9ZNwFuAw5X1acWvHUA2NEt7wDuGb48SVJffebRvwP4deArSR7rxj4O7Ab2J7kBOApc269ESVIfQwd9VX0RyCJvbx32eyVJK8tbIEhS47wFwho06GXnkgR29JLUPINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc559JJGzkcOjpcdvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqc0yslrTlO11weO3pJapwd/QTxgSKSRsGOXpIaZ9BLUuMMeklqnEEvSY0z6CWpcc66kTQxnHk2Gnb0ktQ4O3pJzfIK2nl29JLUODv6HuwWJK0FdvSS1Dg7+lXgTAJJ42RHL0mNM+glqXEjO3ST5GrgM8AG4Naq2j2qfQ3Kk6eSVssk5c1IOvokG4A/A34Z2AJcn2TLKPYlSTqzUXX0VwDPVNVXAZJ8FtgOPDmKna30yU5Pnkrry3L+za/Fn/hHdYz+IuC5BevPd2OSpFU2tumVSXYCO7vVl5I8vUq7vhD4xirtayVZ9+pbq7Vb9wjlk6cdHrr2Rb5vUD8+yEajCvoXgEsWrF/cjf1QVe0B9oxo/4tKMltVM6u9376se/Wt1dqte/VNeu2jOnTzZWBzksuSvB64Djgwon1Jks5gJB19VZ1K8iHgn5mfXnl7VT0xin1Jks5sZMfoq+pzwOdG9f09rPrhohVi3atvrdZu3atvomtPVY27BknSCHkLBElq3LoL+iR/lOTfkzyW5L4kPzbumgaV5I+TPNXV//dJ3jLumgaR5FeTPJHk5SQTOzPhFUmuTvJ0kmeS7Bp3PYNKcnuSk0keH3cty5HkkiT3J3my+3ty47hrGkSSc5J8Kcm/dXX/wbhrWsy6O3ST5M1V9e1u+cPAlqr6wJjLGkiSXwI+353s/iRAVX1szGUtKclPAi8Dfwn8XlXNjrmkRXW37/gP4F3MX+j3ZeD6qhrJVd0rKck7gZeAv6mqt427nkEl2QRsqqpHkvwI8DBwzaT/mScJcG5VvZTkbOCLwI1V9eCYS3uNddfRvxLynXOBNfM/XVXdV1WnutUHmb8+YeJV1eGqWq0L4vr64e07qup7wCu375h4VfUF4FvjrmO5qupYVT3SLX8HOMwauJK+5r3UrZ7d/ZrIPFl3QQ+Q5JYkzwG/Bvz+uOsZ0m8B/zjuIhrk7TvGKMk0cDnw0HgrGUySDUkeA04CB6tqIutuMuiT/EuSx0/zaztAVX2iqi4B9gEfGm+1/99StXfbfAI4xXz9E2GQuqUzSXIecBfwkVf95D2xquoHVfWzzP90fUWSiTxk1uSjBKvqqgE33cf8XP+bR1jOsixVe5LfAN4LbK0JOsGyjD/zSbfk7Tu08rpj3HcB+6rq7nHXs1xV9WKS+4GrgYk7Gd5kR38mSTYvWN0OPDWuWpare5jLR4H3VdX/jLueRnn7jlXWndS8DThcVZ8adz2DSjL1ysy3JG9k/gT+RObJepx1cxfwVuZngRwFPlBVa6JjS/IM8Abgm93Qg2thxlCSXwH+FJgCXgQeq6p3j7eqxSV5D/Bp/u/2HbeMuaSBJLkDuJL5OymeAG6uqtvGWtQAkvwi8K/AV5j/dwnw8e7q+omV5KeBvcz/PXkdsL+q/nC8VZ3eugt6SVpv1t2hG0labwx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIa978n1GOnN9EDtQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.hist(normal_array, bins=30);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can normalize this histogram:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAADQxJREFUeJzt3X+oX/ddx/Hna9k6pToG5oI1iSZiag1af3CN/jH8gUbTIqZFxdbhqHOEgHFbZdjiYEPLwCLYidTFsAUVimXQbQTNaKsOpsxqbkftlrYpoTKSMu1d55xlYhf79o/7zfz27ib33OT7vd9733k+4ML3nPPJPe+E5JX3/ZzPOSdVhSSpl9fMugBJ0uQZ7pLUkOEuSQ0Z7pLUkOEuSQ0Z7pLUkOEuSQ0Z7pLUkOEuSQ29dlYn3rp1a+3cuXNWp5ekTenxxx//QlXNrTZuZuG+c+dOFhYWZnV6SdqUknxuyDinZSSpIcNdkhoy3CWpIcNdkhoy3CWpIcNdkhoy3CWpIcNdkhoy3CWpoZndoSqtt/sefXbw2Dv3XT/FSqTps3OXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYMd0lqyHCXpIYGhXuS/UlOJzmT5O5LjPvhJOeT/OLkSpQkrdWq4Z5kC3A/cBOwB7g9yZ6LjLsXeGTSRUqS1mZI574XOFNVz1XVy8CDwIEVxv0m8BDwwgTrkyRdhiHhvg04O7Z9brTva5JsA24FPjC50iRJl2tSF1TfD9xVVa9calCSg0kWkiwsLi5O6NSSpOVeO2DM88COse3to33j5oEHkwBsBW5Ocr6qPjY+qKqOAkcB5ufn63KLliRd2pBwPwnsTrKLpVC/DfiV8QFVtevC5yR/BvzV8mCXJK2fVcO9qs4nOQw8DGwBjlXVqSSHRsePTLlGSdIaDencqaoTwIll+1YM9aq648rLkiRdCe9QlaSGDHdJamjQtIw0C/c9+uygcXfuu37KlVy5Tr8XbQ527pLUkOEuSQ0Z7pLUkOEuSQ0Z7pLUkOEuSQ25FFK6AkOXOErrzc5dkhqyc5dWYEeuzc7OXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSHDXZIaMtwlqSFf1qFNzxdrSF/Pzl2SGjLcJakhw12SGnLOXevOOXJp+uzcJakhw12SGjLcJakhw12SGhoU7kn2Jzmd5EySu1c4fiDJk0meSLKQ5E2TL1WSNNSqq2WSbAHuB/YB54CTSY5X1VNjw/4WOF5VleRG4MPADdMoWJK0uiGd+17gTFU9V1UvAw8CB8YHVNVLVVWjzWuBQpI0M0PCfRtwdmz73GjfqyS5NckzwF8Db51MeZKkyzGxC6pV9dGqugG4BbhnpTFJDo7m5BcWFxcndWpJ0jJDwv15YMfY9vbRvhVV1SeB70yydYVjR6tqvqrm5+bm1lysJGmYIeF+EtidZFeSa4DbgOPjA5J8V5KMPv8Q8HrgxUkXK0kaZtXVMlV1Pslh4GFgC3Csqk4lOTQ6fgT4BeAtSb4K/Dfwy2MXWCVJ62zQg8Oq6gRwYtm+I2Of7wXunWxpkqTL5R2qktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDRnuktSQ4S5JDQ16zZ6k9XHfo88OGnfnvuunXIk2Ozt3SWrIcJekhgx3SWrIOXdNzND5YknTZ+cuSQ3ZuWtVduTS5mPnLkkNGe6S1JDhLkkNGe6S1JDhLkkNuVpG2oR8Bo1WY+cuSQ0Z7pLUkOEuSQ0Z7pLUkOEuSQ0Z7pLUkEshpcZcMnn1GtS5J9mf5HSSM0nuXuH4m5M8meQzST6V5PsnX6okaahVwz3JFuB+4CZgD3B7kj3Lhv0r8ONV9X3APcDRSRcqSRpuSOe+FzhTVc9V1cvAg8CB8QFV9amq+o/R5mPA9smWKUlaiyHhvg04O7Z9brTvYn4d+PhKB5IcTLKQZGFxcXF4lZKkNZnoapkkP8lSuN+10vGqOlpV81U1Pzc3N8lTS5LGDFkt8zywY2x7+2jfqyS5EfggcFNVvTiZ8iRJl2NI534S2J1kV5JrgNuA4+MDknw78BHgV6vKF25K0oyt2rlX1fkkh4GHgS3Asao6leTQ6PgR4D3AtwB/kgTgfFXNT69sSdKlDLqJqapOACeW7Tsy9vltwNsmW5ok6XL5+AFJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGDHdJamjQO1TV032PPjvrEiRNiZ27JDVk5y5pTT/F3bnv+ilWokmxc5ekhgx3SWrIcJekhgx3SWrIcJekhgx3SWrIcJekhgx3SWrIcJekhgx3SWrIcJekhgx3SWrIcJekhgx3SWrIcJekhgaFe5L9SU4nOZPk7hWO35DkH5P8T5J3Tb5MSdJarPqyjiRbgPuBfcA54GSS41X11NiwLwJvB26ZSpWSpDUZ0rnvBc5U1XNV9TLwIHBgfEBVvVBVJ4GvTqFGSdIaDXnN3jbg7Nj2OeBHplOOpI1u6Cv5fB3fbK3rBdUkB5MsJFlYXFxcz1NL0lVlSLg/D+wY294+2rdmVXW0quaran5ubu5yvoUkaYAh4X4S2J1kV5JrgNuA49MtS5J0JVadc6+q80kOAw8DW4BjVXUqyaHR8SNJvhVYAN4AvJLkncCeqvryFGuXJF3EkAuqVNUJ4MSyfUfGPv8bS9M1kqQNwDtUJakhw12SGjLcJakhw12SGjLcJamhQatlNHtDb/mWJLBzl6SWDHdJashwl6SGDHdJashwl6SGDHdJashwl6SGXOcuaSp8Hd9s2blLUkOGuyQ1ZLhLUkOGuyQ1ZLhLUkOGuyQ15FJISZuCSyvXxs5dkhqyc58xX8IhaRrs3CWpIcNdkhoy3CWpIcNdkhoy3CWpIVfLSJopV4xNh527JDVk5y6pFe9kXWLnLkkN2bmvkV2BpM3Azl2SGrJznxJXAEiaJTt3SWrIcJekhgZNyyTZD/wRsAX4YFX9/rLjGR2/GfgKcEdVfXrCtV4WL4BKWi8bKW9W7dyTbAHuB24C9gC3J9mzbNhNwO7R10HgAxOuU5K0BkM6973Amap6DiDJg8AB4KmxMQeAv6iqAh5L8sYk11XV5ydeMdO5WOkFUOnqspG67GkYMue+DTg7tn1utG+tYyRJ62Rdl0ImOcjStA3AS0lOr9OptwJfWKdzTZJ1r7/NWrt1T8lvXfzQZdd+ie85xHcMGTQk3J8Hdoxtbx/tW+sYquoocHRIYZOUZKGq5tf7vFfKutffZq3dutffRq99yLTMSWB3kl1JrgFuA44vG3MceEuW/Cjwn9Oab5ckrW7Vzr2qzic5DDzM0lLIY1V1Ksmh0fEjwAmWlkGeYWkp5K9Nr2RJ0moGzblX1QmWAnx835GxzwX8xmRLm6h1nwqaEOtef5u1dutefxu69izlsiSpEx8/IEkNXRXhnuSeJE8meSLJI0m+bdY1DZXkD5I8M6r/o0neOOuahkjyS0lOJXklyYZdUXBBkv1JTic5k+TuWdczVJJjSV5I8tlZ17IWSXYk+USSp0Z/T94x65qGSPINSf45yb+M6v7dWdd0MVfFtEySN1TVl0ef3w7sqapDMy5rkCQ/A/zd6ML2vQBVddeMy1pVku8BXgH+FHhXVS3MuKSLGj1i41lgH0s34J0Ebq+qpy75CzeAJD8GvMTSHeLfO+t6hkpyHXBdVX06yTcDjwO3bPQ/89FztK6tqpeSvA74B+AdVfXYjEv7OldF534h2EeuBTbN/2hV9UhVnR9tPsbSPQQbXlU9XVXrdZPalfraIzaq6mXgwiM2Nryq+iTwxVnXsVZV9fkLDxesqv8CnmYT3NVeS14abb5u9LUh8+SqCHeAJO9LchZ4M/CeWddzmd4KfHzWRTTk4zNmKMlO4AeBf5ptJcMk2ZLkCeAF4NGq2pB1twn3JH+T5LMrfB0AqKp3V9UO4AHg8GyrfbXVah+NeTdwnqX6N4QhdUuXkuSbgIeAdy77CXvDqqr/raofYOmn6L1JNuR0WJvX7FXVTw8c+gBLa/bfO8Vy1mS12pPcAfwc8FO1gS6SrOHPfKMb9PgMTdZozvoh4IGq+sis61mrqvpSkk8A+4ENd0G7Ted+KUl2j20eAJ6ZVS1rNXpRym8DP19VX5l1PU0NecSGJmh0YfJDwNNV9YezrmeoJHMXVqwl+UaWLsJvyDy5WlbLPAR8N0urNz4HHKqqTdGZJTkDvB54cbTrsc2w0ifJrcAfA3PAl4AnqupnZ1vVxSW5GXg///+IjffNuKRBkvwl8BMsPaHw34H3VtWHZlrUAEneBPw98BmW/l0C/M7obvgNK8mNwJ+z9PfkNcCHq+r3ZlvVyq6KcJekq81VMS0jSVcbw12SGjLcJakhw12SGjLcJakhw12SGjLcJakhw12SGvo/i2tcXMAR/+cAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.hist(normal_array, bins=30, normed=True, alpha=0.5);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Even better, you can use Seaborn's distplot, which overlays a kernel density estimate."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.distplot(normal_array)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]\n"
]
},
{
"data": {
"text/plain": [
"array([ 0.8, 0.1, 0.2, 0.4, 0.6])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid = np.arange(0., 1.01, 0.1)\n",
"print(grid)\n",
"np.random.choice(grid, 5, replace=False)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.8, 0. , 0.6, 1. , 0.6, 0. , 0.8, 0.6, 0.3, 0.6,\n",
" 0. , 0.7, 1. , 0.6, 0.8, 0.2, 0.3, 0.2, 0.6, 0.6, 0.8,\n",
" 0.8, 0.6, 0.9])"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.choice(grid, 25, replace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Vector operations\n",
"\n",
"What does this mean? It means that instead of adding two arrays, element by element, you can just say: add the two arrays. Note that this behavior is very different from python lists."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 2., 2., 2., 2., 2.])"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first = np.ones(5)\n",
"second = np.ones(5)\n",
"first + second"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy supports a concept known as *broadcasting*, which dictates how arrays of different sizes are combined together. There are too many rules to list here, but importantly, multiplying an array by a number multiplies each element by the number. Adding a number adds the number to each element."
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 2. 2. 2. 2. 2.]\n",
"[ 4. 4. 4. 4. 4.]\n",
"[ 1. 1. 1. 1. 1.]\n",
"5.0\n"
]
}
],
"source": [
"print(first + 1)\n",
"print(first*4)\n",
"print(first*second) # itemwise\n",
"print(first@second) # dot product, identical to np.dot(first, second)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2D arrays\n",
"Similarly, we can create two-dimensional arrays."
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1. 1. 1. 1.]\n",
" [ 1. 1. 1. 1.]\n",
" [ 1. 1. 1. 1.]]\n",
"[[ 0.99510521 0.99719296 0.99950497 0.99616235]\n",
" [ 0.9913809 1.00196707 1.00545023 1.00486261]\n",
" [ 1.01868421 0.99814142 1.00291637 0.99805381]]\n",
"[[ 1. 0. 0.]\n",
" [ 0. 1. 0.]\n",
" [ 0. 0. 1.]]\n"
]
}
],
"source": [
"my_array2d = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12] ])\n",
"\n",
"# 3 x 4 array of ones\n",
"ones_2d = np.ones([3, 4])\n",
"print(ones_2d)\n",
"# 3 x 4 array of ones with random noise\n",
"ones_noise = ones_2d + .01*np.random.randn(3, 4)\n",
"print(ones_noise)\n",
"# 3 x 3 identity matrix\n",
"my_identity = np.eye(3)\n",
"print(my_identity)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy arrays have set length (array dimensions), can be sliced, and can be iterated over with loop. Below is a schematic illustrating slicing two-dimensional arrays. \n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Earlier when we generated the one-dimensional arrays of ones and random numbers, we gave `ones` and `random` the number of elements we wanted in the arrays. In two dimensions, we need to provide the shape of the array, ie, the number of rows and columns of the array."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 1., 1., 1.],\n",
" [ 1., 1., 1., 1.],\n",
" [ 1., 1., 1., 1.]])"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"onesarray = np.ones([3,4])\n",
"onesarray"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(3, 4)\n"
]
},
{
"data": {
"text/plain": [
"array([[ 1., 1., 1.],\n",
" [ 1., 1., 1.],\n",
" [ 1., 1., 1.],\n",
" [ 1., 1., 1.]])"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(onesarray.shape)\n",
"onesarray.T"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"onesarray.T.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Matrix multiplication is accomplished by `np.dot` (or `@`). "
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 4. 4. 4.]\n",
" [ 4. 4. 4.]\n",
" [ 4. 4. 4.]]\n"
]
},
{
"data": {
"text/plain": [
"array([[ 3., 3., 3., 3.],\n",
" [ 3., 3., 3., 3.],\n",
" [ 3., 3., 3., 3.],\n",
" [ 3., 3., 3., 3.]])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(np.dot(onesarray, onesarray.T)) # 3 x 3 matrix\n",
"np.dot(onesarray.T, onesarray) # 4 x 4 matrix"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12.0"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(onesarray)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The axis 0 is the one going downwards (the $y$-axis, so to speak), whereas axis 1 is the one going across (the $x$-axis). You will often use functions such as `mean`, `sum`, with an axis."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([ 3., 3., 3., 3.]), array([ 4., 4., 4.]))"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sum(onesarray, axis=0), np.sum(onesarray, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simulating a simple election model\n",
"\n",
"\n",
"Say you toss a coin and have a model which says that the probability of heads is 0.5 (you have figured this out from symmetry, or physics, or something). Still, there will be sequences of flips in which more or less than half the flips are heads. These **fluctuations** induce a distribution on the number of heads (say k) in N coin tosses (this is a binomial distribution).\n",
"\n",
"Similarly, here, if the probability of Romney winning in Arizona is 0.938, it means that if somehow, there were 10000 replications (as if we were running the election in 10000 parallel universes) with an election each, Romney would win in 9380 of those Arizonas **on the average** across the replications. And there would be some replications with Romney winning more, and some with less. We can run these **simulated** universes or replications on a computer though not in real life.\n",
"\n",
"\n",
"To do this, \n",
"we will assume that the outcome in each state is the result of an independent coin flip whose probability of coming up Obama is given by the Predictwise state-wise win probabilities. Lets write a function `simulate_election` that uses this **predictive model** to simulate the outcome of the election given a table of probabilities.\n",
"\n",
"### Bernoulli Random Variables (in scipy.stats)\n",
"\n",
"The **Bernoulli Distribution** represents the distribution for coin flips. Let the random variable X represent such a coin flip, where X=1 is heads, and X=0 is tails. Let us further say that the probability of heads is p (p=0.5 is a fair coin). \n",
"\n",
"We then say:\n",
"\n",
"$$X \\sim Bernoulli(p),$$\n",
"\n",
"which is to be read as **X has distribution Bernoulli(p)**. The **probability distribution function (pdf)** or **probability mass function** associated with the Bernoulli distribution is\n",
"\n",
"$$\\begin{eqnarray}\n",
"P(X = 1) &=& p \\\\\n",
"P(X = 0) &=& 1 - p \n",
"\\end{eqnarray}$$\n",
"\n",
"for p in the range 0 to 1. \n",
"The **pdf**, or the probability that random variable $X=x$ may thus be written as \n",
"\n",
"$$P(X=x) = p^x(1-p)^{1-x}$$\n",
"\n",
"for x in the set {0,1}.\n",
"\n",
"The Predictwise probability of Obama winning in each state is a Bernoulli Parameter. You can think of it as a different loaded coin being tossed in each state, and thus there is a bernoulli distribution for each state"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: **some of the code, and ALL of the visual style for the distribution plots below was shamelessly stolen from https://gist.github.com/mattions/6113437/ **."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 1 0 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 0 1]\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from scipy.stats import bernoulli\n",
"#bernoulli random variable\n",
"brv=bernoulli(p=0.3)\n",
"print(brv.rvs(size=20))\n",
"event_space=[0,1]\n",
"plt.figure(figsize=(12,8))\n",
"colors=sns.color_palette()\n",
"for i, p in enumerate([0.1, 0.2, 0.5, 0.7]):\n",
" ax = plt.subplot(1, 4, i+1)\n",
" plt.bar(event_space, bernoulli.pmf(event_space, p), label=p, color=colors[i], alpha=0.5)\n",
" plt.plot(event_space, bernoulli.cdf(event_space, p), color=colors[i], alpha=0.5)\n",
"\n",
" ax.xaxis.set_ticks(event_space)\n",
" \n",
" plt.ylim((0,1))\n",
" plt.legend(loc=0)\n",
" if i == 0:\n",
" plt.ylabel(\"PDF at $k$\")\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running the simulation using the Uniform distribution\n",
"\n",
"In the code below, each column simulates a single outcome from the 50 states + DC by choosing a random number between 0 and 1. Obama wins that simulation if the random number is $<$ the win probability. If he wins that simulation, we add in the electoral votes for that state, otherwise we dont. We do this `n_sim` times and return a list of total Obama electoral votes in each simulation."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([328, 319, 308, ..., 329, 272, 332])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n_sim = 10000\n",
"simulations = np.random.uniform(size=(51, n_sim))\n",
"obama_votes = (simulations < predictwise.Obama.values.reshape(-1, 1)) * predictwise.Votes.values.reshape(-1, 1)\n",
"#summing over rows gives the total electoral votes for each simulation\n",
"obama_votes.sum(axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first thing to pick up on here is that `np.random.uniform` gives you a random number between 0 and 1, uniformly. In other words, the number is equally likely to be between 0 and 0.1, 0.1 and 0.2, and so on. This is a very intuitive idea, but it is formalized by the notion of the **Uniform Distribution**.\n",
"\n",
"We then say:\n",
"\n",
"$$X \\sim Uniform([0,1),$$\n",
"\n",
"which is to be read as **X has distribution Uniform([0,1])**. The **probability distribution function (pdf)** associated with the Uniform distribution is\n",
"\n",
"\\begin{eqnarray}\n",
"P(X = x) &=& 1 \\, for \\, x \\in [0,1] \\\\\n",
"P(X = x) &=& 0 \\, for \\, x \\notin [0,1]\n",
"\\end{eqnarray}\n",
"\n",
"What assigning the vote to Obama when the random variable **drawn** from the Uniform distribution is less than the Predictwise probability of Obama winning (which is a Bernoulli Parameter) does for us is this: if we have a large number of simulations and $p_{Obama}=0.7$ , then 70\\% of the time, the random numbes drawn will be below 0.7. And then, assigning those as Obama wins will hew to the frequentist notion of probability of the Obama win. But remember, of course, that in 30% of the simulations, Obama wont win, and this will induce fluctuations and a distribution on the total number of electoral college votes that Obama gets. And this is what we will see in the histogram below. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"def simulate_election(model, n_sim):\n",
" simulations = np.random.uniform(size=(51, n_sim))\n",
" obama_votes = (simulations < model.Obama.values.reshape(-1, 1)) * model.Votes.values.reshape(-1, 1)\n",
" #summing over rows gives the total electoral votes for each simulation\n",
" return obama_votes.sum(axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running the simulation using the Bernoulli distribution\n",
"\n",
"We can directly use the Bernoulli distribution instead"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 333., 327., 333., ..., 332., 322., 299.])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n_sim=10000\n",
"simulations = np.zeros(shape=(51, n_sim))\n",
"obama_votes = np.zeros(shape=(51, n_sim))\n",
"for i in range(51):\n",
" simulations[i,:] = bernoulli(p=predictwise.Obama.values[i]).rvs(size=n_sim)\n",
" obama_votes[i,:] = simulations[i]*predictwise.Votes.values[i]\n",
"obama_votes.sum(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def simulate_election2(model, n_sim):\n",
" simulations = np.zeros(shape=(51, n_sim))\n",
" obama_votes = np.zeros(shape=(51, n_sim))\n",
" for i in range(51):\n",
" simulations[i,:] = bernoulli(p=predictwise.Obama.values[i]).rvs(size=n_sim)\n",
" obama_votes[i,:] = simulations[i]*predictwise.Votes.values[i]\n",
" return obama_votes.sum(axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following code takes the necessary probabilities for the Predictwise data, and runs 10000 simulations. If you think of this in terms of our coins, think of it as having 51 biased coins, one for each state, and tossing them 10,000 times each.\n",
"\n",
"We use the results to compute the number of simulations, according to this predictive model, that Obama wins the election (i.e., the probability that he receives 269 or more electoral college votes)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9960\n"
]
}
],
"source": [
"result = simulate_election(predictwise, 10000)\n",
"print((result >= 269).sum())"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9947\n"
]
}
],
"source": [
"result2 = simulate_election2(predictwise, 10000)\n",
"print((result2 >= 269).sum())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are roughly only 50 simulations in which Romney wins the election!\n",
"\n",
"## Displaying the prediction\n",
"\n",
"Now, lets visualize the simulation. We will build a histogram from the result of `simulate_election`. We will **normalize** the histogram by dividing the frequency of a vote tally by the number of simulations. We'll overplot the \"victory threshold\" of 269 votes as a vertical black line and the result (Obama winning 332 votes) as a vertical red line.\n",
"\n",
"We also compute the number of votes at the 5th and 95th quantiles, which we call the spread, and display it (this is an estimate of the outcome's uncertainty). By 5th quantile we mean that if we ordered the number of votes Obama gets in each simulation in increasing order, the 5th quantile is the number below which 5\\% of the simulations lie. \n",
"\n",
"We also display the probability of an Obama victory \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"def plot_simulation(simulation): \n",
" plt.hist(simulation, bins=np.arange(200, 538, 1), \n",
" label='simulations', align='left', normed=True)\n",
" plt.axvline(332, 0, .5, color='r', label='Actual Outcome')\n",
" plt.axvline(269, 0, .5, color='k', label='Victory Threshold')\n",
" p05 = np.percentile(simulation, 5.)\n",
" p95 = np.percentile(simulation, 95.)\n",
" iq = int(p95 - p05)\n",
" pwin = ((simulation >= 269).mean() * 100)\n",
" plt.title(\"Chance of Obama Victory: %0.2f%%, Spread: %d votes\" % (pwin, iq))\n",
" plt.legend(frameon=False, loc='upper left')\n",
" plt.xlabel(\"Obama Electoral College Votes\")\n",
" plt.ylabel(\"Probability\")\n",
" sns.despine()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.plotting_context('poster'):\n",
" plot_simulation(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model created by combining the probabilities we obtained from Predictwise with the simulation of a biased coin flip corresponding to the win probability in each states leads us to obtain a histogram of election outcomes. We are plotting the probabilities of a prediction, so we call this distribution over outcomes the **predictive distribution**. Simulating from our model and plotting a histogram allows us to visualize this predictive distribution. In general, such a set of probabilities is called a **probability mass function**. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Empirical Distribution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an **empirical Probability Mass Function**. \n",
"\n",
"Lets summarize: the way the mass function arose here that we did ran 10,000 tosses (for each state), and depending on the value, assigned the state to Obama or Romney, and then summed up the electoral votes over the states.\n",
"\n",
"There is a second, very useful question, we can ask of any such probability mass or probability density: what is the probability that a random variable is less than some value. In other words: $P(X < x)$. This is *also* a probability distribution and is called the **Cumulative Distribution Function**, or CDF (sometimes just called the **distribution**, as opposed to the **density**, or **mass function**). Its obtained by \"summing\" the probability density function for all $X$ less than $x$."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Obama Win CDF at votes= 200 is 0.0\n",
"Obama Win CDF at votes= 300 is 0.152\n",
"Obama Win CDF at votes= 320 is 0.4561\n",
"Obama Win CDF at votes= 340 is 0.8431\n",
"Obama Win CDF at votes= 360 is 0.9968\n",
"Obama Win CDF at votes= 400 is 1.0\n",
"Obama Win CDF at votes= 500 is 1.0\n"
]
}
],
"source": [
"CDF = lambda x: np.float(np.sum(result < x))/result.shape[0]\n",
"for votes in [200, 300, 320, 340, 360, 400, 500]:\n",
" print(\"Obama Win CDF at votes=\", votes, \" is \", CDF(votes))"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"votelist=np.arange(0, 540, 5)\n",
"plt.plot(votelist, [CDF(v) for v in votelist], '.-');\n",
"plt.xlim([200,400])\n",
"plt.ylim([-0.1,1.1])\n",
"plt.xlabel(\"votes for Obama\")\n",
"plt.ylabel(\"probability of Obama win\");"
]
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Edit Metadata",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}