Part 1

Scenario: You are a data assistant to a newsroom editor. They want a one-page brief answering:
1) How has U.S. voter turnout changed since 1980?
2) Which measure— VEP turnout or VAP turnout—better matches ANES self-reports?
3) How do presidential and midterm elections compare on VEP turnout?

The R Markdown file for this lab can be found here for Part 1 and here for Part 2. The data for this lab can be found here.

Part A — Loading Data

Load the data into a data frame turnout and quickly verify its structure. Keep the vector of year and the list of column names.

Part B — Construct the two turnout measures

Compute VAP-based turnout and VEP-based turnout (percent). Label both with the corresponding year so your later subsetting is self-documenting.

VAP turnout: total votes divided by (VAP + overseas) × 100
VEP turnout: total votes divided by VEP × 100

Name your vectors VAPtr, VEPtr.

Part C — Reliability vs. ANES

Create the gaps between ANES self-reported turnout and each measure: gapVAP and gapVEP. Summarize both and decide which measure aligns better.

Part D — Presidential vs. Midterms using indices

Define index vectors for presidential (pres) and midterm (mids) rows (years alternate). Subset VEPtr into pVEPtr and mVEPtr. Compute average VEP turnout for each and the difference.

Part 2

Let’s apply some of the syntax we’ve learned this week to some problems. This should be similar to Problem Set 1, but be warned that the data is different.

Your problem set is based on resources produced by Elena Llaudet and Kosuke Imai. It draws on the data and research in the following publication:

Raghabendra Chattopadhyay and Esther Duflo. 2004. “Women as Policy Makers: Evidence from a Randomized Policy Experiment in India.” Econometrica, 72 (5): 1409–43.)

Today our data will be completely made up. The scenario is as follows:

I hypothesize that the ingestion of spiders while asleep causes individuals’ propensity to buy blind boxes to increase. To tes this, I survey some number of individuals in my neighborhood, collecting data on:

adult: whether they are an adult. =1 if yes and =0 if no.
spiders: how many spiders they ingested in their sleep the previous week.
blindbox: how many blindboxes they bought this week.

Getting Started

Read the CSV file “lab2data.csv” into an object called df. Read the first few observations of the dataset.

What does each observation in this dataset represent?
Please substantively interpret the first observation in the dataset.
What is the type of each variable in the dataset?

Exploratory Data Analaysis

Please provide your code for the following statistics:

How many observations are in the dataset?
What percent of survey respondents were adults?
On average, how many spiders did respondents ingest the previous week?

Methodology

We want to estimate the average causal effect of ingesting spiders on buying blind boxes.

What would be the treatment variable?
What would be the outcome variable?
What would be the treatment group?
What would be the control group?

We also wanted to estimate the average causal effect of being an adult on the propensity to buy blind boxes:

What would be the treatment variable?
What would be the outcome variable?
What would be the treatment group?
What would be the control group?

Lab 2 - Basic Data Management and Manipulation