Essays on panel data and sample selection methods
OA Version
Citation
Abstract
The availability of panel data allows researchers to control for unobserved heterogeneity in economic models, but raises important computational and statistical challenges. For instance, fixed effects estimators suffer from the incidental parameter problem and lead to high-dimensional estimation problems. In this dissertation, I aim to address both theoretical and practical issues in the estimation of panel data models.
Sample selection is one of the most common forms of endogeneity in empirical economics. It arises when the main dependent variable is selected into the sample through a nonrandom process. The classical solution to account for sample selection is the Heckman selection model (HSM). In this dissertation, I extend the HSM in two dimensions: (1) I relax the homogeneity restrictions that the HSM imposes; and (2) I develop a panel data version of the model that accounts for unobserved heterogeneity.
In Chapter 1, I develop a distribution regression model with sample selection for panel and network data. The model is a semiparametric generalization of the HSM that accommodates much richer patterns of heterogeneity in the selection process, covariates and unobserved effects. I provide a computationally attractive two-step fixed-effect estimation procedure, a bias correction method and a multiplier bootstrap algorithm to conduct uniform inference on the function-valued model parameters. I apply this model to the gravity equation of international trade network accounting for possibly endogenous zero trade decisions and unobserved country heterogeneity.
Chapter 2 focuses on the distribution regression model with sample selection for cross-sectional data. In this chapter, I study the identification of the model and apply the model to wage decompositions in the UK accounting for possibly endogeneous selection into employment. Here I decompose the difference between the male and female wage distributions into four effects: composition, wage structure, selection structure and selection sorting.
In Chapter 3, I propose a novel estimation algorithm for panel data models with multiple high-dimensional fixed effects and missing data. The algorithm absorbs the fixed effects iteratively until they are eventually eliminated. Applying this algorithm to a large-scale US employer-based health insurance data, I conclude that narrow network plans reduce health care utilization.