In this lab, we will estimate the average effect of years of schooling on individuals’ earnings. This classic empirical relationship — often called the Mincer earnings function — is one of the most influential findings in labor economics and the social sciences.
Jacob Mincer (1974) proposed a simple model relating the logarithm of an individual’s wage to their years of schooling and work experience. The model provides an intuitive measure of the “returns to education” — how much wages increase, on average, for each additional year of schooling.
For simplicity, we’ll start with a simple regression model using only schooling as a predictor, then extend it to include experience.
We will estimate two models. The first model we’ll estimate is:
\[ \log(\text{wage}) = \alpha + \beta_1 \times \text{schooling} + \varepsilon \] And the second model is:
\[ \log(\text{wage}) = \alpha + \beta_1 \times \text{schooling} + \beta_2 \times \text{experience} + \varepsilon \]
We’ll use the dataset SchoolingReturns, where each
observation represents one individual. The variables of interest
are:
lwage: log of hourly wageeducation: years of schooling completedexperience: years of labor market experience =
female: 1 if respondent is female, 0 otherwiseLoad the data with the following chunk:
library(ivreg)
data("SchoolingReturns", package = "ivreg")
df <- SchoolingReturns
Please create the variable lwage in the dataframe using
mutate().
To get an intuitive sense of the data, compute the average log wage for people with 12 or fewer years of schooling and for those with more than 12 years. What is the difference between these two averages?
# [Your Code Here]
Using our first model, estimate a linear regression model where log wages are predicted by years of schooling.
df <- df %>%
mutate(lwage = log(wage))
lm(lwage ~ education, data = df)
##
## Call:
## lm(formula = lwage ~ education, data = df)
##
## Coefficients:
## (Intercept) education
## 5.57088 0.05209
Write out the fitted regression equation and provide a substantive interpretation of the coefficient \(\beta_1\). What does it tell us about the relationship between schooling and log wages? How would you express this in percentage terms?
Now let’s use our second model, which includes experience in the regression.
lm(lwage ~ education + experience, data = df)
##
## Call:
## lm(formula = lwage ~ education + experience, data = df)
##
## Coefficients:
## (Intercept) education experience
## 4.66603 0.09317 0.04066
Create a scatterplot of the relationship between schooling and log
wages. Add the best-fit line based on the first model in
"red", and a best-fit line based on the first model in
"darkred". What does this tell us about the effect of
adding controls in a regression study?