## Correlation

Like variance, covariance is kind of useless by itself, but it is used in other computations.

Pearson's correlation is a standardized covariance, where "standardizing" means dividing through by the standard deviation.

What are the units of covariance? Correlation?

What's Cov(X,X)?

Anscombe's quartet makes a similar point:

Correlation measures linear relationships. If the relationship is not linear, it understates the strength of the relationship, possibly by a lot!

### Linear least squares

What's so great about least squares fits?

Generally good properties and easy to compute.

But it's not always the right choice.

Coefficient of determination: fraction of variability explained by the model, OR

reduction in MSE if you have to make a guess.

Exercise 9.8: SAT scores and IQ.

Correlation and causation

As always, xkcd says it best:

This is probably familiar territory for you, but just so I feel like I've done my job:

In general, a relationship between two variables does not tell you for sure whether one causes the other, or the other way around, or both, or whether they might both be caused by something else altogether.

So what can you do to provide evidence of causation?

1) **Use time**. If A comes before B, then A can cause B but not the other way around.

But this does not preclude spurious relationships.

2) **Use randomness**. If you divide a large population into two groups at random, then for any property, X, you expect the difference in the mean of X to be small (with what caveat?).

If the groups are nearly identical in all properties but one, you can eliminate spurious relationships.

This works even if you don't know what the confounding variables are.

But it works even better if you do, because you can check that the groups are identical.

If you combine these two ideas, the result is a randomized controlled trial, which is the most reliable way (we know) to demonstrate a causal relationship, and the defining characteristic of so-called "Western medicine."

### Everything is correlated with income

Unfortunately, controlled trials are only possible in a few domains of knowledge.

*Introduction to Stochastic Processes*

*The course will study basic random processes and their applications. **Topics covered will include random walks, Markov chains, Bernoulli and **Poisson processes, and if time permits, Brownian Motion, Gaussian Random Processes, and Martingale Theory. Applications in Operations Research (queuing, data networks, traffic), communication systems and information theory (modeling data, signals) and mathematical finance (portfolio theory, gambling).*