Regression and inference

Materials for class on Monday, September 9, 2019

Contents

Slides

Download the slides from today’s class.

First slide

Regression with R

Open the RStudio.cloud project for today.

You can also download the project to your computer and run the R Markdown files locally:

Different styles of formulas

There’s unfortunately no standard notation for regression model math. Since everyone does it differently, here’s a brief guide to translating between different forms. I’ll write two different models in a bunch of different ways. The models come from chapter 2 of Mastering ’Metrics where they estimated the effect of private university on earnings:

R code

When specifying a model with R, you don’t need to worry about Greek letters (or any coefficient letters really), since the point of running the model is to find what the actual values of those letters would be. The code for these two models would look something like this:

model_simple <- lm(earnings ~ private + group_a, data = schools)
model_complex <- lm(log(earnings) ~ private + group_a + sat_score + parental_income,
                    data = schools)

Mastering ’Metrics and econometrics in general

In Mastering ’Metrics, Angrist and Pischke like to use lots of different Greek letters to help distinguish between the different parts of a model. For instance, Equation 2.1 on page 57 is

\[ Y_i = \alpha + \beta P_i + \gamma A_i + e_i \]

Here’s what all these things mean:

Each of these terms has a subscripted i to show that the model is being fit for individuals, not groups. It’s more of an esoteric point and we don’t care much about that distinction for this class.

For the more complex model, Angrist and Pischke use Equation 2.2 on page 61:

\[ \ln Y_i = \alpha + \beta P_i + \gamma A_i + \delta_1 \text{SAT}_i + \delta_2 \text{PI}_i + e_i \]

The alpha, beta, and gamma terms are all the same as before (intercept, treatment coefficient, idenfication coefficient), but there are some new pieces:

All \(\beta\)s

My preferred method is to not distinguish between the different types of coefficients (i.e. beta vs. gamma vs. delta) and just call everything beta. Here’s what the two models look like when written this way:

\[ \begin{aligned} Y =& \beta_0 + \beta_1 P + \beta_2 A + \epsilon \\ \ln Y =& \beta_0 + \beta_1 P + \beta_2 A + \beta_3 \text{SAT} + \beta_4 \text{PI} + \epsilon \end{aligned} \]

You’ll sometimes see the intercept \(\beta_0\) written as \(\alpha\), which is fine—just start with \(\beta_1\):

\[ \begin{aligned} Y =& \alpha + \beta_1 P + \beta_2 A + \epsilon \\ \ln Y =& \alpha + \beta_1 P + \beta_2 A + \beta_3 \text{SAT} + \beta_4 \text{PI} + \epsilon \end{aligned} \]

Use real names

If you’re not constrained with space, feel free to use actual words instead of things like \(P\), \(A\), or \(PI\):

\[ \begin{aligned} Y =& \beta_0 + \beta_1 \text{Private} + \beta_2 \text{Group A} + \epsilon \\ \ln Y =& \beta_0 + \beta_1 \text{Private} + \beta_2 \text{Group A} + \\ & \beta_3 \text{SAT score} + \beta_4 \text{Parental income} + \epsilon \end{aligned} \]

Clearest and muddiest things

Go to this form and answer these three questions:

  1. What was the muddiest thing from class today? What are you still wondering about?
  2. What was the clearest thing from class today?
  3. What was the most exciting thing you learned?

I’ll compile the questions and send out answers after class.