Chapter 19 - Instrumental Variables

19.2 How Is It Performed?

19.2.1 Instrumental Variables Estimators

We can start with our approach to estimating instrumental variables by taking it pretty literally. Instrumental variables as a research design is all about isolating the variation in the treatment that is explained by the instrument. So let’s just, uh, do that.

Two-stage least squares (2SLS)568568 Or TSLS, but that one just looks wrong if you ask me. is a method that uses two regressions to estimate an instrumental variables model. The “first stage” uses the instrument (and other controls) to predict the treatment variable. Then, you take the predicted (explained) values of the treatment variable from that first stage, and use that to predict the outcome in the second stage (along with the controls again).

Given our instrument \(Z\), treatment \(X\), outcome \(Y\), and controls \(W\), we would estimate the models:

\[\begin{equation} \tag{19.1} X = \gamma_0 + \gamma_1Z+ \gamma_2W+ \nu \end{equation}\]

\[\begin{equation} \tag{19.2} Y = \beta_0 + \beta_1\hat{X} + \beta_2W+ \varepsilon \end{equation}\]

where \(\nu\) and \(\varepsilon\) are both error terms, \(\hat{X}\) are the predicted values of \(X\), predicted using an OLS estimation of the first equation, and \(\gamma\) are regression coefficients just like \(\beta\), only given a different Greek letter to avoid confusing them with the \(\beta\)s.569569 If the goal is to close the back doors associated with the parts of \(X\) not explained by \(Z\), why don’t we take the residuals instead of the predicted values, and then control for them in the second stage, alongside regular ol’ \(X\)? Well… you could! This is called the control function approach. In standard linear IV, this produces basically the same results as 2SLS. But it has some important applications for nonlinear IV, which I’ll get to later in the chapter.

The procedure is quite easy to do by hand (although I wouldn’t recommend it, for reasons I’ll get to in a moment). Simply run OLS of \(X\) on \(Z\) (lm in R, regress in Stata, sm.ols().fit() in Python with statsmodels). Then, predict \(X\) using the results of that regression (predict() in R, predict in Stata, sm.ols().fit().predict() in Python). Finally, do a regression of \(Y\) on the predicted values. Don’t forget to include any controls in both the first and second stages.

We’re not entirely done yet - if we simply do this procedure as I’ve described it, our standard errors will be wrong. So there’s a standard error adjustment to be done, changing them to account for the fact that we’ve estimated \(\hat{X}\) rather than measuring it, and therefore there’s more uncertainty in those values than OLS will pick up on.570570 For this reason, you generally will want to use a software command specifically designed for IV to run IV.

But with that under our belt, we have created our own instrumental variables estimate!

What is it doing, precisely, anyway? 2SLS produces a ratio of effects, dividing the effect of \(Z\) on \(Y\) by the effect of \(Z\) on \(X\). It asks “for each movement in \(X\) we get by changing \(Z\), how much movement in \(Y\) does that lead to?” The answer to this question, since \(Z\) has no back doors, should give us the causal effect of \(X\) on \(Y\).

2SLS has some nice features - it’s easy to estimate, it’s flexible (adding more instruments to the first stage is super easy, although adding more treatment variables is less easy), and since it really just uses OLS, it’s easy to understand. These are the reasons why 2SLS is by far the most common way of implementing instrumental variables.

2SLS has some downsides too. It doesn’t perform that well in small samples, for one. While the instrument in theory has no back doors, in an actual data set the relationship between \(Z\) and the non-\(X\) parts of \(Y\) is going to be at least a little nonzero, just by random chance. The smaller the sample is, the more often this “nonzero by random chance” is going to be not just nonzero but fairly large, driving \(Z\) to not be quite valid in a given sample and giving you bias. Additionally, 2SLS doesn’t perform particularly well when the errors are heteroskedastic.571571 Heteroskedasticity-robust standard errors only do an okay job fixing this.

While two-stage least squares is the most literal way of thinking about instrumental variables, it is only one estimator of many. And there’s a good case to be made that it isn’t even that great a pick among the different estimators, despite its popularity.

Here, I’ll talk a bit about the generalized method of moments (GMM) approach to estimating instrumental variables. Some other methods, each with its own strengths, will pop up throughout the chapter.

GMM is an approach to estimation that’s much broader than instrumental variables, but in this chapter at least we’re just using it for IV.572572 Many good textbooks out there go into more detail. A lot of them are titled, surprisingly enough, “Generalized Method of Moments” with a few extra words tacked on. The basic idea is this: based on your assumptions and theory, construct some statistical moments (means, variances, covariances, etc.) that should have certain values.

For example, if we want to use GMM to estimate the expected value (mean, basically) of a variable \(Y\), we’d say that the difference between our sample estimation of the expected value \(\mu\) and the population expected value \(E(Y)\) should be zero. So we’d make \(\mu - E(Y) = 0\) a condition of our estimation. Replace \(E(Y)\) with its sample value \(\frac{1}{N}\sum Y\) and solve the equation to get \(\hat{\mu} = \frac{1}{N}\sum Y\). GMM will pick the \(\mu\) that makes the moment condition true on average. Or for OLS, we assume that \(X\) is unrelated to the error term \(\varepsilon\). So we use the condition that \(Cov(X,\varepsilon) = 0\). In the actual data, that works out to \(\sum Xr = 0\), where \(r\) is the residual. Do a little math and you end up back at the same estimate for \(\beta_1\) that we already had.573573 How? Let’s simplify by assuming everything is mean-zero so we don’t need an intercept. Start with \(\sum Xr = 0\). Then plug in \(r = Y - \beta_1X\) to get \(\sum X(Y-\beta_1X) = \sum(XY - \beta_1X^2) = 0\). Solve for \(\beta_1\) to get \(\beta_1 = \sum(XY)/\sum(X^2)\), which is what we had before. \(^,\)574574 In the case of the mean and OLS, the solution it comes to is the same as we’ve already gotten - take the mean of the sample data. But the way we got there is different.

You might be able to guess how GMM works with IV from the OLS example. We assume that \(Z\) is unrelated to the second-stage error term \(\varepsilon\). So we use the condition \(Cov(Z,\varepsilon) = 0\). Substitute in the sample-data version of the covariance and do a little math to end up with \(\beta_1 = \sum(ZY)/\sum(ZX)\), which is what we had before.

So, same answer. Big whoop, right? Except that things get a bit more interesting when we either have heteroskedasticity (which we probably do) or add more instruments.

Overidentified. An IV model with more instrumental variables than treatment/endogenous variables.

Just identified or exactly identified. An IV model with the same number of instrumental variables as treatment/endogenous variables.

GMM and 2SLS are both capable of handling heteroskedasticity. GMM does so naturally, as it doesn’t really make assumptions about the error terms in the same way that the OLS equations making up 2SLS do. But we can get there with 2SLS by simply using heteroskedasticity-robust standard errors.

But things get more interesting when the number of instruments is bigger than the number of treatment/endogenous variables you have, which is called being “overidentified.”

When your model is overidentified, 2SLS and GMM diverge.575575 In overidentified cases, GMM can’t actually satisfy all of the moment conditions exactly, so it needs to pick weights for the conditions to decide which are more important. Roughly, it weights the conditions by how difficult they are to satisfy. GMM is going to be more precise, at least if there’s heteroskedasticity involved. GMM will have less sampling variation (and thus smaller standard errors) under heteroskedasticity than 2SLS, even if you add heteroskedasticity-robust standard errors. This continues to be true if you start applying clustered standard errors or corrections for autocorrelation.

Keep in mind - GMM isn’t just a different way of adjusting the standard errors. The estimates themselves will actually be different when there’s overidentification. The GMM standard errors are smaller not because we choose to claim we have more information and thus more precision, but because the method itself produces more precise estimates, and the smaller standard errors reflect that.

Let’s code up some instrumental variables. For this exercise we’ll be following along with the paper “Social Networks and the Decision to Insure” by Cai, De Janvry, and Sadoulet (⊕2015Cai, Jing, Alain De Janvry, and Elisabeth Sadoulet. 2015. “Social Networks and the Decision to Insure.” American Economic Journal: Applied Economics 7 (2): 81–108.). The authors are looking into the decision that farmers make about whether to buy insurance against weather events. In particular, they’re interested in whether information about insurance travels through social networks.

They look at a randomized experiment in rural China, where households were randomized into two rounds of different informational sessions about insurance. The question then is: how much does what your friends learn about insurance affect your own takeup of insurance? They look at people in the second round of sessions, and they look at what their friends did and saw in the first round of sessions, to see how the former is affected by the latter.576576 By structuring things in this way where they look only at the effect of the first round on the second, they avoid “reflection,” a common problem with studies about how your friends affect what you do; these studies are also known as studies of “peer effects.” The reflection problem, attributed to perennial Nobel overlookee Charles Manski, points out the identification problem you face if you want to learn about how your friends affect you… the problem being that you affect your friends too! There’s a feedback loop, and we know that a causal diagram hates a feedback loop. By having a first round and a second round, we know which direction the causal arrow points.

Cai, De Janvry, and Sadoulet do in general find that farmers’ decisions were affected by what their friends saw and the information they received. I’ll be looking at a particular one of their analyses where they ask whether farmers’ decisions were affected by what their friends did: does your friends actually buying insurance make you more likely to buy?

We want to identify the effect of \(FriendsPurchaseBehavior\) on \(YourPurchaseBehavior\) among people in the second-round informational sessions, looking at the average purchasing behavior of their friends who were in the first-round informational sessions. This effect has some obvious back doors. Preferences for insurance may be higher or lower by region, or you may simply be more likely to have friends with preferences similar to yours, including on topics like insurance.

As an instrument for \(FriendsPurchaseBehavior\) they use the variable \(FirstRoundDefault\), which is a binary indicator for whether your friends were randomly assigned to a “default buy” informational session, where attendees were assigned to buy insurance by default, and had to specify their preference not to buy it, or a “default no buy” session, where attendees were assigned to not buy insurance by default, and had to specify their preference to buy it. Everyone had the same options and got the same information, but the defaults were different. People follow defaults! Those in the “default buy” sessions were twelve percentage points more likely to buy insurance than those in the “default no buy” sessions. The fact that \(FirstRoundDefault\) is randomized makes the argument that it’s a valid instrument pretty believable. Plus, a twelve percentage point jump seems like plenty to satisfy the relevance assumption.577577 We’re going to use the analysis they did on the subsample of second-round participants who were told what purchasing decisions their first-round friends had made. They did another analysis with second-round participants who were not told, and found no effect.

Okay, now, finally let’s code the thing. We’ll start with 2SLS, and then, using the same data, will do GMM, LIML, and IV with fixed effects (“panel IV”). Wait, I slipped “LIML” in there - what’s that? That’s limited-information maximum likelihood. I’ll show how to code it up here, since it’s easy enough to switch out methods, but I’ll actually talk about the method later in the chapter.

R Code

# There are many ways to run 2SLS; 
# the most common is ivreg from the AER package. 
# But we'll use feols from fixest for speed and ease 
# of fixed-effects additions later
library(tidyverse); library(modelsummary); library(fixest)
d <- causaldata::social_insure

# Include just the outcome and controls first, then endogenous ~ instrument 
# in the second part, and for this study we cluster on address
m <- feols(takeup_survey ~ male + age + agpop + ricearea_2010 +
            literacy + intensive + risk_averse + disaster_prob +
            factor(village) | pre_takeup_rate ~ default, 
            cluster = ~address, data = d)

# Show the first and second stage, omitting all
# the controls for ease of visibility
msummary(list('First Stage' = m$iv_first_stage$pre_takeup_rate,
                'Second Stage' = m),
                coef_map = c(default = 'First Round Default',
                fit_pre_takeup_rate = "Friends' Purchase Behavior"),         
                stars = c('*' = .1, '**' = .05, '***' = .01))

Stata Code

causaldata social_insure.dta, use clear download

* We want village fixed effects, but that's currently a string
encode village, g(villid)

* The order doesn't matter, but we need controls here
* as well as (endogenous = instrument)
* don't forget to specify the estimator 2sls!
* and we cluster on address
* and also show the first stage with the first option
ivregress 2sls takeup_survey (pre_takeup_rate = default) male age agpop ///
    ricearea_2010 literacy intensive risk_averse disaster_prob ///
    i.villid, cluster(address) first

Python Code

import pandas as pd
from linearmodels.iv import IV2SLS
from causaldata import social_insure
d = social_insure.load_pandas().data

# Add a [endogenous ~ instrument] segment to the formula
m = IV2SLS.from_formula('''takeup_survey ~ 
male + age + agpop + ricearea_2010 + literacy + 
intensive + risk_averse + disaster_prob + C(village) + 
[pre_takeup_rate ~ default]''', data = d)

# since we want to cluster, and will use
# m.notnull to see which observations to drop
second_stage = m.fit(cov_type = 'clustered', 
clusters = d['address'][m.notnull])

# If we want the first stage we must do it ourselves!
# move the endogenous variable to the dependent position
# and make the instrument a predictor, removing the []
first_stage = IV2SLS.from_formula('''pre_takeup_rate ~ 
male + age + agpop + ricearea_2010 + literacy + 
intensive + risk_averse + disaster_prob + C(village) + 
default''', data = d).fit(cov_type = 'clustered', 
clusters = d['address'][m.notnull])

first_stage
second_stage

This gives us the result in Table 19.1 (keeping in mind this table doesn’t show a bunch of coefficients for all the control variables).578578 Copyright American Economic Association; reproduced with permission of the American Economic Journal: Applied Economics. The first stage regression has the endogenous variable (whether your friends purchased insurance) as the dependent variable, and a coefficient for our instrument. The coefficient is .118 and statistically significant. It’s showing that your friends being assigned to the “default-purchase” experimental condition leads to a 11.8 percentage point increase in the probability that they’ll buy insurance. We predict whether your friends bought insurance using that .118 bump (as well as the other predictors not shown on the table) and use those predicted values in the second stage.

Table 19.1: Instrumental Variables Regression from Cai, de Janvry, and Sadoulet (2015)
	First Stage	Second Stage
First Round Default	0.118***
	(0.034)
Friends Purchase Behavior		0.791***
		(0.273)
Num.Obs.	1378	1378
R2	0.469	0.127
R2 Adj.	0.448	0.092
RMSE	0.18	0.47
Std.Errors	by: address	by: address
* p < 0.1, p < 0.05, * p < 0.01
Controls for gender, age, agricultural proportion, farming area, literacy, intensiveness of assigned treatment, risk aversion, perceived disaster probability, and village excluded from table.

The second stage has the actual outcome of you buying insurance as the outcome. A one-unit increase in the rate at which your friends buy insurance, using only the random variation driven by the random experimental assignment and the controls, increases your chances of buying insurance by .791. That’s a pretty strong spillover effect!

That’s 2SLS. How about GMM and LIML? In Stata and Python the transition is easy. In Stata, run ivregress gmm or ivregress liml instead of ivregress 2sls. Unfortunately, there are some important LIML parameters \(\alpha\) and \(\kappa\) we’ll discuss later that ivregress liml won’t let you set on your own. In Python, you can use IVGMM or IVLIML instead of IV2SLS, and IVLIML does give you control over \(\alpha\) and \(\kappa\).

And in R? In R we unfortunately have to switch packages, at least for now. The ivmodel() function in the ivmodel package is capable of doing LIML, with options for setting \(\alpha\) and \(\kappa\). How about GMM? For the moment, you’ll have to set up the whole two-equation GMM model yourself using the gmm() function in the gmm package, although your options for standard error adjustments are a bit more limited. There is also the momentfit package which has some nice improvements but is similarly difficult to set up. The syntax for this example is:

R Code

library(modelsummary); library(gmm)
d <- causaldata::social_insure
# Remove all missing observations ourselves
d <- d %>%
    select(takeup_survey, male, age, agpop, ricearea_2010,
           literacy, intensive, risk_averse, disaster_prob,
           village, address, pre_takeup_rate, default) %>%
    na.omit()

m <- gmm(takeup_survey ~ male + age + agpop + ricearea_2010 +
         literacy + intensive + risk_averse + disaster_prob +
         factor(village) + pre_takeup_rate,
         ~ male + age + agpop + ricearea_2010 +
         literacy + intensive + risk_averse + disaster_prob +
         factor(village) + default, data = d)

# We can apply the address clustering most easily in msummary
msummary(m, vcov = ~address, stars = c('*' = .1, '**' = .05, '***' = .01))

Finally, what if we have a lot of fixed effects in our IV model? There are some technical adjustments that must be made in these cases. This time it’s R that has the easiest transition. The feols() function we already used can easily incorporate fixed effects - the +factor(village) just becomes | village. Stata and Python aren’t too hard though. In Stata you switch from ivregress to xtivreg. This comes with a few other changes - you must xtset your data to tell Stata what the panel structure is, it only does 2SLS rather than GMM or LIML, and you must tell it whether you want fixed effects (fe), random effects (re), or something else (see the help file). At the moment, there is no Python equivalent for this, although our typical Python syntax will let you estimate IV models with fixed effects in them by adding sets of binary indicator variables.

19.2.2 Instrumental Variables and Treatment Effects

What does instrumental variables estimate? We can refer back to Chapter 10 when thinking about what kind of treatment effect average instrumental variables produces.

In general, we know that IV is a method all about isolating the variation in treatment that is explained by the instruments. This means we are looking at a local average treatment effect, where the individual treatment effects are weighted by how responsive that individual observation is to the instrument.

In the case of a standard estimator like 2SLS or GMM with one treatment/endogneous variable and one instrument, the weights are what the individual effect of the instrument would be for you in the first stage.

For example, say there has been a recent set of television advertisements that encourage people to exercise more. You want to use exposure to the advertisements as an instrument for how much you exercise, and then you will look at the effect of exercise on blood pressure.

Consider three people in the sample: Jakeila, Kyle, and Li. The advertisements would make Jakeila exercise an additional half hour each week, and an additional hour of exercise each week would lower her blood pressure by 2 points. You can see these values, as well as the values for Kyle and Li, on Table 19.2.

Table 19.2: Effect Sizes for Three People in our Exercise Study

Name	Effect of Ads on Exercise Hours	Effect of Exercise Hours on Blood Pressure
Jakeila	0.50	-2
Kyle	0.25	-8
Li	0.00	-10

Keep in mind - those effects of ads on exercise hours are what the effect theoretically would be for those individual people. Obviously we can’t see Jakeila both advertised to and not advertised to at the same time. But this is saying that Jakeila-with-ads exercises half an hour more each week than Jakeila-without-ads.

What will 2SLS tell us the effect of exercise hours on blood pressure is? Well, Jakeila responds the strongest to the ads, so the -2 effect of exercise that she gets will be more heavily weighted. Specifically, it gets the .5 weight she has on the effect of ads. Similarly, Kyle gets a weight of .25. Li, on the other hand, doesn’t respond to the ads at all - they make no difference to him. So it turns out he makes no difference to the 2SLS estimate. He gets a weight of 0.

When we perform our 2SLS estimation, it can’t see any of these theoretical individual-person effects. Regardless, the LATE it gives us is based on them! Specifically, the LATE is

\[\begin{equation} \tag{19.3} \frac{(.5\times (-2) + .25 \times (-8) + 0\times(-10)}{(.5 + .25 + 0)} = \frac{(-1 + -2)}{(.75)} = -4 \end{equation}\]

That’s what 2SLS will give us. This is in contrast to the average treatment effect which is \(((-2) + (-8) + (-10))/3 = -6.67\).

This immediately points us to a very important result: 2SLS will give different results depending on which IV is used. If we had picked an instrument that was really effective at getting Li to exercise but not so effective for Jakeila, then 2SLS would estimate a much stronger effect of exercise on blood pressure.

What if there’s more than one instrument? The same logic applies - the stronger the effect of the instrument for you, the more strongly your treatment effect will be weighted. But at this point the math gets a bit more complex, because it’s going to be a mix of the different weights you’d have had from the different instruments. So if one instrument would give Jakeila a .5 weight, but a different one would give her .8, the weight she’d get from using both instruments would be some mix of the .5 and .8 (and not necessarily just an average of the two).

One thing to emphasize here is that the specific weights we get are dependent on the estimator. Different ways of estimating instrumental variables can produce different weighted average treatment effects. If you’re not using 2SLS, do look into what precisely you are getting. Although outside of some corner cases you are generally still getting something that weights you more strongly the more you are affected by the instrument. The idea that we’re estimating a LATE does hold up in most cases, although the specifics on which LATE we’re getting change from estimator to estimator.

There’s a common terminology used in instrumental variables when thinking about these weights, and in particular the 0s. We can divide the sample into three groups:

Compliers: For compliers, the effect of the instrument on the treatment is in the expected direction. Jakeila and Kyle were compliers in Table 19.2 because the ad telling them to exercise more got them to exercise more.
Always-takers/never-takers: Always-takers/never-takers are completely unaffected by the instrument. Li was an always-taker/never-taker in Table 19.2 because the effect of the ads on his exercise level was zero.579579 The terminology here comes from cases where the treatment variable is binary. “Always-takers” always get the treatment, regardless of the instrument, and “never-takers” never get the treatment, regardless of the instrument. The terms are a little odd applied to continuous treatments like we have here in exercise, but it’s the same idea.
Defiers: Defiers are affected by the instrument in the opposite of the expected direction.580580 If there are people affected in both directions in the sample, it’s a bit arbitrary to call one of them “compliers” and the other “defiers,” but the important thing really is that there are people in both directions.

From this terminology we can get one result and one assumption we need to make:

First, the result: if all of the compliers are affected by the instrument to the same degree, then 2SLS gives the average treatment effect… among compliers.581581 You will find no shortage of people telling you that the LATE is the same thing as the average treatment effect among compliers. But that’s not really true, and relies on this assumption that the instrument is equally effective on all compliers. Neat! Although it is unlikely that everyone is affected in the same way by the instrument.

Second, the assumption: for all of this to work, we need to assume that there are no defiers. Imagine we add a fourth person to Table 19.2: Macy, who is so annoyed by the ads that she decides to exercise .25 hours less if she sees them. An hour of weekly exercise would reduce her blood pressure by 8 points.

How do the LATE weights work out now? Macy gets a weight of -.25. The weighted-average calculation now gives us

\[\begin{multline} \tag{19.4} \frac{(.5\times (-2) + .25 \times (-8) + 0\times(-10) + -.25\times (-4))}{(.5 + .25 + 0 + -.25)} = \\ \frac{(-1 + -2 + 2)}{(.5)} = -2 \end{multline}\]

Exercise was actually more effective for Macy than the effect we already estimated, but adding her made the effect shrink! This is because she had a negative weight, which makes the math of weighted averages get all wonky so they’re not really weighted averages any more.582582 Importantly, the problem here isn’t that Macy is affected negatively, it’s that she’s affected in a different direction to the others. Conformity is pretty helpful here. It wouldn’t be a problem if everyone had a negative effect. Then, all the negatives on the weights would just factor out. It’s like multiplying the original LATE calculation by \(-1/-1 = 1\). Makes no difference.

Monotonicity. In the context of instrumental variables, the assumption that for everyone in the data, the instrument either affects them in the same direction (positively or negatively), or not at all (zero effect).

The assumption that there are no defiers is also known as the monotonicity assumption. Along with validity and relevance, this is another key assumption we need to make for instrumental variables, although this one tends to receive a lot less attention.

So if you have an instrument that has an effect on average, think carefully about whether that effect is likely to be in the same direction for everyone. There are plenty of cases where it wouldn’t be, for example, if there are people out there who would be so annoyed by an intervention that it would have the opposite effect of what’s intended.

Or, more broadly, people are just different and react in different ways. Angrist and Evans (⊕1998Angrist, Joshua D., and William N. Evans. 1998. “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review 88 (3): 450–77.) are a good example for thinking about this. They observed that families appeared to have a preference for having both a boy and a girl. A family that happens to have two boys as their first two children, or two girls, would be more likely to have a third child so as to try for a mix. So, “your first two kids being the same gender” has been used as an instrument for “having a third child” in a whole bunch of studies, following on from Angrist and Evans.

But for this to work, there have to be no defiers. Even if most people would be more likely to have the third kid if the first two are the same gender (or not base their third-kid decision on that at all), if there are some people who would be less likely to have the third kid because the first two are the same gender, then monotonicity is violated. Maybe some parents are terrified by the possibility of three kids of the same gender? Whatever story we come up with, people have a lot of complex reasons for choosing to have more kids, and some of them might contradict what we need them to be. So studies using this instrument need to think carefully about whether monotonicity is likely to be satisfied, and what they can do about it.

Page built: 2025-10-17 using R version 4.5.0 (2025-04-11 ucrt)

Chapter 19 - Instrumental Variables

19.1 How Does It Work?

19.1.1 Isolating Variation

19.1.2 Assumptions for Instrumental Variables

19.1.3 Canonical Designs

19.2 How Is It Performed?

19.2.1 Instrumental Variables Estimators

19.2.2 Instrumental Variables and Treatment Effects

19.2.3 Checking the Instrumental Variables Assumptions

19.3 How the Pros Do It

19.3.1 Don’t Just TEST for Weakness, Fix It!

19.3.2 Way Past LATE

19.3.3 Nonlinear Instrumental Variables

19.3.4 Okay, You Can Have a Little Validity Violation

Chapter 19 - Instrumental Variables

19.1 How Does It Work?Copy link

19.1.1 Isolating VariationCopy link

19.1.2 Assumptions for Instrumental VariablesCopy link

19.1.3 Canonical DesignsCopy link

19.2 How Is It Performed?Copy link

19.2.1 Instrumental Variables EstimatorsCopy link

19.2.2 Instrumental Variables and Treatment EffectsCopy link

19.2.3 Checking the Instrumental Variables AssumptionsCopy link

19.3 How the Pros Do ItCopy link

19.3.1 Don’t Just TEST for Weakness, Fix It!Copy link

19.3.2 Way Past LATECopy link

19.3.3 Nonlinear Instrumental VariablesCopy link

19.3.4 Okay, You Can Have a Little Validity ViolationCopy link

19.1 How Does It Work?

19.1.1 Isolating Variation

19.1.2 Assumptions for Instrumental Variables

19.1.3 Canonical Designs

19.2 How Is It Performed?

19.2.1 Instrumental Variables Estimators

19.2.2 Instrumental Variables and Treatment Effects

19.2.3 Checking the Instrumental Variables Assumptions

19.3 How the Pros Do It

19.3.1 Don’t Just TEST for Weakness, Fix It!

19.3.2 Way Past LATE

19.3.3 Nonlinear Instrumental Variables

19.3.4 Okay, You Can Have a Little Validity Violation