• The Effect

    Nick Huntington-Klein

  • ▾ Chapters
    The Effect: An Introduction to Research Design and Causality
    Introduction The Design of Research 1 - Designing Research 2 - Research Questions 3 - Describing Variables 4 - Describing Relationships 5 - Identification 6 - Causal Diagrams 7 - Drawing Causal Diagrams 8 - Causal Paths and Closing Back Doors 9 - Finding Front Doors 10 - Treatment Effects 11 - Causality with Less Modeling The Toolbox 12 - Opening the Toolbox
    • Concept and Execution
    • The Toolbox Chapters
    • Code Examples
    13 - Regression 14 - Matching 15 - Simulation 16 - Fixed Effects 17 - Event Studies 18 - Difference-in-Differences 19 - Instrumental Variables 20 - Regression Discontinuity 21 - A Gallery of Rogues: Other Methods 22 - Under the Rug References

Chapter 12 - Opening the Toolbox

A drawing of an open toolbox.

12.1 Concept and Execution

Based on the content of this book thus far, we have covered the concepts of data generating processes and causality. We’ve discussed how to isolate our paths of interest, and how to identify the paths we want by either shutting down the back door paths we don’t want, or isolating the paths we want directly.

But that’s all conceptual. How do we actually do those things?

While the first half of this book covered concepts and intuition, the second half covers execution. We’ll be delving into the toolbox of methods that are commonly used by researchers.

Many of these methods, especially those past the “Regression,” “Matching,” and “Simulation” chapters, are based around the idea of simplifying a causal diagram. That is, in the real world, causal diagrams get so complex and intricate that it would be very difficult to measure and adjust for all the variables we need, like I talked about in Chapter 11.

But there are certain kinds of what we might call template causal diagrams that can be solved easily.186 I attribute this concept of “template” diagrams to researcher Jason Abaluck, who has successfully used it to deeply annoy causal-diagram purists. We can ask ourselves whether our context and research question of interest fits one of those templates. If it does, the associated method will give us a shortcut to identification that may be more plausible to people reading your work than trying to convince them you’ve really thought of and closed every back door.187 Depending on who you talk to, these methods might be called “reduced form,” or “quasiexperimental.”

So how to use these methods?

As always, first we want to model our data generating process, and draw a causal diagram! These methods are no replacement for understanding our data generating process, and indeed knowing which method to use relies on it.

Second, we want to ask ourselves does our diagram look how it needs to look to use one of these methods? For example, as you’ll read in Chapter 19, to use the instrumental variables method there must be an “instrumental” variable that causes our treatment, and for which all paths from the instrumental variable to the outcome go through the treatment.

If it does, we can use the method. We’ve solved our research design problem. From that point, we can start concerning ourselves with statistical issues rather than design issues.

12.2 The Toolbox Chapters

The Toolbox chapters from Chapter 16 through Chapter 20 focus on “template” research designs in which the same sort of causal diagram, and thus design, applies in lots of different settings. These chapters will be structured the same, with three sub-chapters.

  1. How Does It Work?: A conceptual overview of how the method identifies causal effects, a look at the kinds of diagrams that each of the methods works with, and a demonstration of how the method manipulates data to give you what you need.
  2. How Is It Performed?: This sub-chapter gets a little more into the weeds, showing how the method is executed, usually using regression (which we’ll talk more about in Chapter 13).
  3. How the Pros Do It: If you want to actually use a given method in a real-world research project, the basic version is often not enough. This sub-chapter will discuss some of the additional considerations, adjustments, or methods that actual researchers often use when implementing these methods. It’s impossible to cover everything that researchers actually do in these chapters. So these are designed not so much to show you everything a researcher knows, but rather to make you aware of the kinds of things actual researchers are thinking about, and why.

Because this book focuses more on research design than econometrics proper, there is little in the way of statistical proofs. If you are interested in these, I recommend the excellent textbooks by Jeffrey Wooldridge (2016Wooldridge, Jeffrey M. 2016. Introductory Econometrics: A Modern Approach. Nelson Education.). Or, if you want the real advanced stuff, William Greene (2003Greene, William H. 2003. Econometric Analysis. Pearson Education India.).

It’s also an undeniable fact that the How the Pros Do It sections do not tell you all the information you need to actually do it like the pros do it. This is because the way the pros actually do it is to read a voluminous and ever-changing literature on the newest approaches to these methods, or at least just read a bunch of other studies using the same general method and then largely follow their lead.188 There are, of course, some pros who never move beyond what they learned in their textbooks. Depending on what they’re doing, this is sometimes fine. Some research questions can be answered handily with tools that have been around long enough to make it into textbooks. Other times, well… there’s plenty of work out there by pros that could be better. Trying to keep up in textbook form would be fruitless, and would require nearly a whole book on each method.

Rather, the How the Pros Do It sections focus on highlighting some of the most important caveats and extensions, and giving you what you need to go learn about the state of the art on your own. If you are hoping for additional up-to-date applications of these methods, or information on their history, I recommend Causal Inference: The Mixtape by Scott Cunningham (2021Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale University Press.).

12.3 Code Examples

All of the chapters in The Toolbox will include code examples in R, Stata, and Python, showing you how methods can be executed in code.

These code chunks may rely on packages that you have to install. Anywhere you see library(X) or X:: in R or import X or from X import in Python, that’s a package X that will need to be installed if it isn’t already installed. You can do this with install.packages('X') in R, or using a package manager like pip or conda in Python. In Stata, packages don’t need to be loaded each time they’re used, so I’ll always specify in the code example if there’s a package that might need to be installed. In all three languages, you only have to install each package once, and then you can load it as many times as you want.

One additional package you’ll want to install to run these code examples is causaldata, which is a package of data sets I’ve made for this book (and several other books) and is available for all three languages. Do install.packages('causaldata') in R, ssc install causaldata in Stata, or pip install causaldata (if using pip) in Python.

The datasets all come with documentation. Using the mortgages data as an example: in R, see the description with help(mortgages, package = 'causaldata') (or just help(mortgages) if you already loaded the package with library(causaldata)). You can also see the description of each variable as you work with library(vtable) and then vtable(mortgages) after loading the data. In Stata, variable labels can be seen in the Variable Explorer as normal, and you can get a description of the data set with the command causaldata mortgages. In Python, after loading the data with from causaldata import mortgages, you can see the data and variable descriptions with, respectively, print(mortgages.DESCRLONG) and print(mortgages.NOTE).

These code examples have been run using R 4.1, Stata 15.1, and Python 3.8. If your version of the language is at that level or newer, you should be good to go! If it’s older than that, you may want to upgrade, but while I haven’t tested the code examples on all old versions, you’re probably still fine as long as your R is version 3+ and your Stata is 14+ (except for the one example that relies on 16+, but I’ll warn you about it). For Python, it’s strongly recommended that you at least use 3.0+, as there are a number of major syntax changes from Python 2 to Python 3. For all three languages, you may still get somewhat different results based on updates to downloadable packages that occur after the publication of this book.189 For example, the modelsummary R package updated just before publication of this book and changed the significance star levels it displays—I had to change all the example code so it would keep producing the results I already had in the book! Who knows what package updates will occur after publication. If you spot such a change, please feel free to contact me.

One final note on code, specifically in Stata: Stata doesn’t naturally allow you to split one command onto two lines. However, some lines of code are going to be too long to fit on one line of the book and so must be split! This can be accomplished with the use of /// at the end of a line, which means “the line isn’t over yet, keep reading on to the next one.” However, for some reason, Stata has decided that this only will only work if you are running code using the “Execute” button in the do-file editor. It doesn’t work if you’re just copy/pasting code into the Stata console. So if you see a /// at the end of a line, either be sure to run that code using Execute from your do-file editor, or just erase the /// and combine that line with the following line of code (and keep going until you hit a line that doesn’t end in ///).

Page built: 2023-02-13 using R version 4.2.2 (2022-10-31 ucrt)