A Graduate Introduction to Probability and Statistics for Scientists and Engineers

Philip B. Stark, Department of Statistics, University of California, Berkeley

First offering: a 10-hour short course at University of Tokyo, August 2015

Software requirements

Supplemental Texts

Index

These notes are in draft form, with large gaps. I'm happy to hear about any errors, and I hope eventually to fill in some of the missing pieces.

  1. Overview
  2. Introduction to Jupyter and Python
  3. Sets, Combinatorics, & Probability
  4. Theories of Probability
  5. Random Variables, Expectation, Random Vectors, and Stochastic Processes
  6. Probability Inequalities
  7. Inference
  8. Confidence Sets

Rough Syllabus for Tokyo Short Course

Preamble: Introduction to Jupyter and Python

  1. Jupyter notebook
    • Cells, markdown, MathJax
  2. Less Python than you need

Lecture 1: Probability

  1. What's the difference between Probability and Statistics?
  2. Counting and combinatorics

    • Sets: unions, intersections, partitions
    • De Morgan's Laws
    • The Inclusion-Exclusion principle
    • The Fundamental Rule of Counting
    • Combinations
    • Permutations
    • Strategies for counting
  3. Axiomatic Probability

    • Outcome space and events, events as sets
    • Kolmogorov's axioms (finite and countable)
    • Analogies between probability and area or mass
    • Consequences of the axioms
      • Probabilities of unions and intersections
      • Bounds on probabilities
      • Bonferroni's inequality
      • The inclusion-exclusion rule for probabilities
    • Conditional probability
      • The Multiplication Rule
      • Independence
      • Bayes Rule ## Lecture 2: Probability, continued
  4. Theories of probability
    • Equally likely outcomes
    • Frequency Theory
    • Subjective Theory
    • Shortcomings of the theories
    • Rates versus probabilities
    • Measurement error
    • Where does probability come from in physical problems?
    • Making sense of geophysical probabilities
      • Earthquake probabilities
      • Probability of magnetic reversals
      • Probability that Earth is more than 5B years old
  5. Random variables.
    • Probability distributions of real-valued random variables
    • Cumulative distribution functions
    • Discrete random variables
      • Probability mass functions
      • The uniform distribution on a finite set
      • Bernoulli random variables
      • Random variables derived from the Bernoulli
        • Binomial random variables
        • Geometric
        • Negative binomial
      • Hypergeometric random variables
      • Poisson random variables: countably infinite outcome spaces

Lecture 3: Random variables, contd.

  1. Random variables, continued
    • Continuous and "mixed" random variables
    • Probability densities
      • The uniform distribution on an interval
      • The Gaussian distribution
    • The CDF of discrete, continuous, and mixed distributions
    • Distribution of measurement errors
      • The box model for random error
      • Systematic and stochastic error
  2. Independence of random variables
    • Events derived from random variables
    • Definitions of independence
    • Independence and "informativeness"
    • Examples of independent and dependent random variables
    • IID random variables
    • Exchangeability of random variables
  3. Marginal distributions
  4. Point processes
    • Poisson processes
      • Homogeneous and inhomogeneous Poisson processes
      • Spatially heterogeneous, temporally homogenous Poisson processes as a model for seismicity
      • The conditional distribution of Poisson processes given N
    • Marked point processes
    • Inter-arrival times and inter-arrival distributions
    • Branching processes
      • ETAS

Lecture 4: Expectation, Probability Inequalities, and Simulation

  1. Expectation
    • The Law of Large Numbers
    • The Expected Value
      • Expected value of a discrete univariate distribution
        • Special cases: Bernoulli, Binomial, Geometric, Hypergeometric, Poisson
      • Expected value of a continuous univariate distribution
        • Special cases: uniform, exponential, normal
      • Expected value of a multivariate distribution
    • Standard Error and Variance.
      • Discrete examples
      • Continuous examples
      • The square-root law
      • Standardization and Studentization
      • The Central Limit Theorem
    • The tail-sum formula for the expected value
    • Conditional expectation
      • The conditional expectation is a random variable
      • The expectation of the conditional expectation is the unconditional expectation
    • Useful probability inequalities
      • Markov's Inequality
      • Chebychev's Inequality
      • Hoeffding's Inequality
      • Jensen's inequality
  2. Simulation
    • Pseudo-random number generation
      • Importance of the PRNG. Period, DIEHARD
    • Assumptions
    • Uncertainties
    • Sampling distributions

Lecture 5: Testing

  1. Hypothesis tests
    • Null and alternative hypotheses, "omnibus" hypotheses
    • Type I and Type II errors
    • Significance level and power
    • Approximate, exact, and conservative tests
    • Families of tests
    • P-values
      • Estimating P-values by simulation
    • Test statistics
      • Selecting a test statistic
      • The null distribution of a test statistic
      • One-sided and two-sided tests
    • Null hypotheses involving actual, hypothetical, and counterfactual randomness
    • Multiplicity
      • Per-comparison error rate (PCER)
      • Familywise error rate (FWER)
      • The False Discovery Rate (FDR)

Lecture 6: Tests and Confidence sets

  1. Tests, continued
    • Parametric and nonparametric tests
      • The Kolmogorov-Smirnov test and the MDKW inequality
      • Example: Testing for uniformity
      • Conditional test for Poisson behavior
    • Permutation and randomization tests
      • Invariances of distributions
      • Exchangeability
      • The permutation distribution of test statistics
      • Approximating permutation distributions by simulation
      • The two-sample problem
    • Testing when there are nuisance parameters
  2. Confidence sets
    • Definition
    • Interpretation
    • Duality between hypothesis tests and confidence sets
    • Tests and confidence sets for Binomial p
    • Pivoting
      • Confidence sets for a normal mean
        • known variance
        • unknown variance; Student's t distribution
    • Approximate confidence intervals using the normal approximation
      • Empirical coverage
      • Failures
    • Nonparametric confidence bounds for the mean of a nonnegative population
    • Multiplicity
      • Simultaneous coverage
      • Selective coverage

Rough Syllabus for complete 45-hour course


Descriptive Statistics

  1. Summarizing data.
    1. Types of data: categorical, ordinal, quantitative
    2. Univariate data.
      1. Measures of location and spread: mean, median, mode, quantiles, inter-quartile range, range, standard deviation, RMS
      2. Markov's and Chebychev's inequalities for quantitative lists
      3. Ranks and ordinal categorical data
      4. Frequency tables and histograms
      5. Bar charts
    3. Multivariate data
      1. Scatterplots
      2. Measures of association: Pearson and Spearman correlation coefficients
      3. Linear regression
        1. The Least Squares principle
        2. The Projection Theorem
        3. The Normal Equations
          1. Numerical solution of the normal equations
          2. Numerical linear algebra is not the same as abstract linear algebra
          3. Condition number
          4. Do not invert matrices to solve linear systems: use backsubstitution or factorization
        4. Errors in regression: RMS error of linear regression
        5. Least Absolute Value regression
      4. Principal components and approximation by subspaces: another application of the Projection Theorem
      5. Clustering
        1. Distance functions
        2. Hierarchical methods, tree-based methods
        3. Centroid methods: K-means
        4. Density-based clustering: kernel methods, DBSCAN

Probability

  1. Counting and combinatorics

    1. Sets: unions, intersections, partitions
    2. De Morgan's Laws
    3. The Inclusion-Exclusion principle.
    4. The Fundamental Rule of Counting
    5. Combinations. Application (using the Inclusion-Exclusion Principle): counting derangements
    6. Permutations
    7. Strategies for complex counting problems
  2. Theories of probability

    1. Equally likely outcomes
    2. Frequency Theory
    3. Subjective Theory
    4. Shortcomings of the theories
  3. Axiomatic Probability

    1. Outcome space and events, events as sets
    2. Kolmogorov's axioms (finite and countable)
    3. Analogies between probability and area or mass
    4. Consequences of the axioms
      1. Probabilities of unions and intersections
      2. Bounds on probabilities
      3. Bonferroni's inequality
      4. The inclusion-exclusion rule for probabilities
    5. Conditional probability
      1. The Multiplication Rule
      2. Independence
      3. Bayes Rule
  4. Random variables.

    1. Probability distributions
    2. Cumulative distribution functions for real-valued random variables
    3. Discrete random variables
      1. Probability mass functions
      2. The uniform distribution on a finite set
      3. Bernoulli random variables
      4. Random variables derived from the Bernoulli
        1. Binomial random variables
        2. Geometric
        3. Negative binomial
      5. Poisson random variables: countably infinite outcome spaces
      6. Hypergeometric random variables
      7. Examples of other discrete random variables
    4. Continuous and "mixed" random variables
      1. Probability densities
      2. The uniform distribution on an interval
      3. The exponential distribution and double-exponential distributions
      4. The Gaussian distribution
      5. The CDF of discrete, continuous, and mixed distributions
    5. Survival functions and hazard functions
      1. Counting processes
    6. Joint distributions of collections of random variables, random vectors
      1. The multivariate uniform distribution
      2. The multivariate normal distribution
      3. Independence of random variables
        1. Events derived from random variables
        2. Definitions of independence
      4. Marginal distributions
      5. Conditional distributions
        1. The "memoryless property" of the exponential distribution
      6. The Central Limit Theorem
    7. Stochastic processes
      1. Point processes
        1. Intensity functions and conditional intensity functions
        2. Poisson processes
          1. Homogeneous and inhomogeneous Poisson processes
          2. The conditional distribution of Poisson processes given N
        3. Marked point processes
        4. Inter-arrival times and inter-arrival distributions
        5. The conditional distribution of a Poisson process
      2. Random walks
      3. Markov chains
      4. Brownian motion
  5. Expectation

    1. The Law of Large Numbers
    2. The Expected Value
      1. Expected value of a discrete univariate distribution
        1. Special cases: Bernoulli, Binomial, Geometric, Hypergeometric, Poisson
      2. Expected value of a continuous univariate distribution
        1. Special cases: uniform, exponential, normal
      3. (Aside: measurability, Lebesgue integration, and the CDF as a measure)
      4. Expected value of a multivariate distribution
    3. Expected values of functions of a random variable
      1. Change-of-variables formulas for probability mass functions and densities
    4. Standard Error and Variance.
      1. Discrete examples
      2. Continuous examples
      3. The square-root law
    5. The tail-sum formula for the expected value
    6. Conditional expectation
      1. The expectation of the conditional expectation is the unconditional expectation
    7. Useful probability inequalities
      1. Markov's Inequality
      2. Chebychev's Inequality
      3. Hoeffding's Inequality

Sampling

  1. Empirical distributions

    1. The ECDF for univariate distributions
    2. The Kolmogorov-Smirnov statistic and The Massart-Dvoretzky-Kiefer-Wolfowitz inequality
    3. Inference: inverting the MDKW inequality
    4. Q-Q plots
  2. Random sampling.

    1. Types of samples
      1. Samples of convenience
      2. Quota sampling
      3. Systematic sampling
      4. The importance of random sampling: stirring the soup.
      5. Systematic random sampling
      6. Random sampling with replacement
      7. Simple random sampling
      8. Stratified random sampling.
      9. Cluster sampling
      10. Multistage sampling
      11. Weighted random samples
      12. Sampling with probability proportional to size
    2. Sampling frames
    3. Nonresponse and missing data
    4. Sampling bias
  3. Simulation

    1. Pseudo-random number generators
      1. Why the PRNG matters
      2. Uniformity, period, independence
      3. Assessing PRNGs. DIEHARD and other tests
      4. Linear congruential PRNGs, including the Wichmann-Hill. Group-induced patterns
      5. Statistically "adequate" PRNGs, including the Mersenne Twister
      6. Cryptographic quality PRNGs, including cryptographic hashes
    2. Generating pseudorandom permutations
    3. Taking pseudorandom samples
    4. Simulating sampling distributions

Estimation and Inference

  1. Estimating parameters using random samples

    1. Sampling distributions
    2. The Central Limit Theorem
    3. Measures of accuracy: mean squared error, median absolute deviation, etc.
    4. Maximum likelihood
    5. Loss functions, Risk, and decision theory
    6. Minimax estimates
    7. Bayes estimates
    8. The Bootstrap
    9. Shrinkage and regularization
  2. Inference

    1. Hypothesis tests
      1. Null and alternative hypotheses, "omnibus" hypotheses
      2. Type I and Type II errors
      3. Significance level and Power
      4. Approximate, exact, and conservative tests
      5. Families of tests
      6. P-values
        1. Estimating P-values by simulation
      7. Test statistics
        1. Selecting a test statistic
        2. The null distribution of a test statistic
        3. One-sided and two-sided tests
      8. Null hypotheses involving actual, hypothetical, and counterfactual randomness
      9. Multiplicity
        1. Per-comparison error rate
        2. Familywise error rate
        3. The False Discovery Rate
    2. Approaches to testing
      1. Parametric and nonparametric tests
      2. Likelihood ratio tests
      3. Permutation and randomization tests
        1. Invariances of distributions
        2. Exchangeability
        3. Other symmetries
        4. The permutation distribution of test statistics
        5. Approximating permutation distributions by simulation
    3. Confidence sets
    4. Duality between hypothesis tests and confidence sets
    5. Conditional tests, conditional and unconditional significance levels
  3. Tests of particular hypotheses

    1. The Neyman model of a randomized experiment.
      1. Strong and weak null hypotheses
      2. Testing the strong null hypothesis
        1. The distribution of a test statistic under the strong null
      3. "Interference"
      4. Blocking and other designs
        1. Ensuring that the null hypothesis matches the experiment
    2. Tests for Binomial p
    3. The Sign test
      1. The sign test for the median; tests for other quantiles
      2. The sign test for a difference in medians
    4. Tests based on the normal approximation
      1. The Z statistic and the Z test
      2. The t statistic and the t test
      3. 2-sample problems, paired and unpaired tests
      4. Tests based on ranks
        1. The Wilcoxon test
        2. The Wilcoxon signed rank test
      5. Tests using actual values
    5. Tests of association
      1. The hypothesis of exchangeability
      2. The Spearman test
      3. The permutation distribution of the Pearson correlation
    6. Tests of randomness and independence
      1. The runs test
    7. Tests of symmetry
      1. Tests of exchangeability
      2. Tests of spherical symmetry
    8. The two-sample problem
      1. Selecting the test statistic: what's the alternative?
        1. Mean, sum, Student t
        2. Smirnov statistic
        3. Other choices
      2. The permutation distribution of the test statistic
      3. The two-sample problem for complex data
        1. Test statistics
      4. The k-sample problem
    9. Stratified permutation tests
    10. Fisher's Exact Test
    11. Tests of homogeneity and ANOVA
      1. The F statistic
      2. The permutation distribution of the F statistic
      3. Other statistics
      4. Ordered alternatives
    12. Tests based on the distribution function: The Kolmogorov-Smirnov Test
      1. The universality of the null distribution for continuous variables
      2. Using the K-S test to test for Poisson behavior
    13. Sequential tests and Wald's SPRT
      1. Random walks and Gambler's ruin
      2. Wald's Theorem
  4. Confidence intervals for particular parameters

    1. Confidence intervals for a shift in the Neyman model
    2. Confidence intervals for Binomial p
      1. Application: confidence bounds for P-values estimated by simulation
      2. Application: intervals for quantiles by inverting binomial tests
    3. Confidence intervals for a Normal mean using the Z and t distributions
    4. Confidence intervals for the mean
      1. Nonparametric confidence bounds for a population mean
        1. The need for a priori bounds
        2. Nonnegative random variables
        3. Bounded random variables
    5. Confidence sets for multivariate parameters
  5. Density estimation

    1. Histogram estimates
    2. Kernel estimates
    3. Confidence bounds for monotone and shape-restricted densities
    4. Lower confidence bounds on the number of modes
  6. Function estimation

    1. Splines and penalized splines
      1. Polynomial splines
      2. Periodic splines
      3. Smoothing splines as least-squares
      4. B-splines
      5. L1 splines
    2. Constraints
      1. Balls and ellipsoids
        1. Smoothness and norms
        2. Lipschitz conditions
        3. Sobolev conditions
      2. Cones
        1. Nonnegativity
        2. Shape restrictions
          1. Monotonicity
          2. Convexity
      3. Star-shaped constraints
        1. Sparsity and minimum L1 methods

Sketchy from here down

Experiments

  1. Experiments versus observational studies

    1. Controls and the Method of Comparison
    2. Randomization
    3. Blinding
  2. Experimental design

    1. Blocking
    2. Orthogonal designs
    3. Latin hypercube design
In [1]:
# Version information
%load_ext version_information
%version_information scipy, numpy, pandas, matplotlib
//anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:827: FormatterWarning: JSON expects JSONable list/dict containers, not JSON strings
  FormatterWarning)
Out[1]:
SoftwareVersion
Python2.7.10 64bit [GCC 4.2.1 (Apple Inc. build 5577)]
IPython3.2.1
OSDarwin 14.5.0 x86_64 i386 64bit
scipy0.14.0
numpy1.9.2
pandas0.14.1
matplotlib1.4.3
Sun Aug 23 17:00:13 2015 PDT
In [ ]: