This thesis addresses the problem of linear regression estimation with selectively observed response data when selection is endogenous.

The approach relies critically on the existence of an instrument that is independent of the selection, conditional on potential outcomes and other covariates. A parametric two-step estimation procedure its proposed. In a
first step the probability of selection is estimated employing a generalized method of moments estimator. The second step uses the estimated probability weights in order to perform an inverse probability weighted least squares estimation. Two potential estimators are presented and expressions for their asymptotic variance-covariance matrices are provided.
As an extension, it is shown how the concept could be used in multiple period setup, using a pooled weighted least squares estimator. Finite sample properties are illustrated in a Monte Carlo simulation study. An empirical illustration is given, using the Survey of Health, Ageing and Retirement in Europe dataset, applying the theory to wage regressions.

All proofs and background for mathematical statements are provided in the extensive Annex.

Leseprobe

1 Introduction

2 Theory and Models

2.1 Literature Review and General Theory

2.2 Application to the Linear Model in a Cross-Sectional Setting

2.2.1 A Tale of Two Estimators

2.2.2 On the Estimation of Standard Errors

2.3 Testability of the Independence Condition

2.4 Application to the Linear Model in a Pooled OLS Setting

2.4.1 Pooled OLS with Two Periods

2.4.2 On the Estimation of Standard Errors with Potential Serial Correlation

3 Simulations

3.1 Cross-Sectional Simulations

3.2 Pooled OLS Simulations

3.3 Sargan-Hansen J-Test Simulations

3.4 Conclusions from the Simulations

4 Data and Statistical Analysis

4.1 Empirical Literature Review

4.2 Data Description, Variables and Manipulations

4.2.1 Data Description

4.2.2 Variables and Manipulations

4.3 Statistical Analysis

4.3.1 Cross-Sectional Analysis in Wave One

4.3.2 Cross-Sectional Analysis in Wave Two

4.3.3 Pooled OLS with Two Waves

5 Conclusions

Objectives and Topics

This master's thesis addresses the problem of linear regression estimation when response data is selectively observed and the selection process is endogenous. The primary research goal is to develop and analyze a parametric two-step estimation procedure that utilizes an instrumental variable approach and inverse probability weighting (IPW) to recover conditional expectations and correct for selection bias. Key thematic areas include:

Methodological development of IPW estimators for cross-sectional and pooled OLS frameworks.
Estimation of asymptotic variance-covariance matrices, accounting for the first-step estimation influence.
Implementation of GMM for parametric modeling of the selection mechanism.
Validation through extensive Monte Carlo simulation studies regarding finite sample properties.
Empirical application investigating educational returns on wages using the Survey of Health, Ageing and Retirement in Europe (SHARE) dataset.

Excerpt from the Book

2.1 Literature Review and General Theory

In order to formalize the problem, assume that the researcher is interested in the relationship of a response variable Y* and a vector of covariates X. More precisely, she wants to investigate the conditional expectation E[Y*|X]. The Econometrician observes realizations of the random variables (Y, Δ, X), where Δ is an indicator function, being one if Y* is observed and zero otherwise. The actual observed variable Y is defined as Y = ΔY*.

In such a setting, it is central to know how the non-response mechanism works in order to identify the conditional expectation of interest. By the law of total expectation it holds that: E[Y*|X] = E[Y*|X, Δ = 1]P(Δ = 1|X) + E[Y*|X, Δ = 0]P(Δ = 0|X).

If the nature of the selection mechanism is missing-completely-at-random (MCAR), then it would hold that Δ ⊥ (Y*, X), meaning that the selection is independent of the joint distribution of Y* and X. Provided MCAR is assumed, one can write E[Y*|Δ = 0] = E[Y*|Δ = 1] = E[Y*] or E[Y*|X, Δ = 0] = E[Y*|X, Δ = 1] = E[Y*|X]. Thus, deletion of non-response observations is validated and the Econometrician must not care about the missing values too much.

Summary of Chapters

1 Introduction: Discusses the prevalence of non-response in economic data and illustrates the potential for significant bias through an anecdote regarding poverty risk estimation.

2 Theory and Models: Establishes the theoretical foundation for handling endogenous selection using instrumental variables and inverse probability weighting, including the derivation of two-step estimators and their asymptotic properties.

3 Simulations: Evaluates the finite sample performance of the developed estimators and the validity of the Sargan-Hansen J-test via Monte Carlo methods.

4 Data and Statistical Analysis: Applies the theoretical framework to estimate the returns to education using the SHARE dataset, comparing results across different estimation specifications.

5 Conclusions: Summarizes the key findings, confirming that the estimator weighting both responses and covariates performs superiorly, while acknowledging limitations regarding the validity of available instruments.

Keywords

Endogenous Selection, Inverse Probability Weighting, GMM, Instrumental Variables, Wage Regressions, Pooled OLS, Sample Selection Bias, Asymptotic Theory, Monte Carlo Simulation, SHARE dataset, Returns to Education, Missing Data, Robust Standard Errors, M-Estimators, Identification

Frequently Asked Questions

What is the core issue this thesis investigates?

The thesis explores the problem of estimating linear regression models when the response variable is not observed for all units and the missingness is endogenously related to the variable of interest, causing bias in standard estimation techniques.

Which central thematic areas does the author explore?

The work focuses on the development of parametric two-step estimators using inverse probability weighting (IPW), large sample variance-covariance estimation, and empirical applications in labor economics.

What is the primary goal of the study?

The main objective is to provide a parametric, two-step estimation procedure that corrects for selection bias by using an instrumental variable to model the probability of non-response via the Generalized Method of Moments (GMM).

Which statistical methodology is primarily utilized?

The author employs a two-step approach: first, estimating response probabilities using GMM with instrumental variables, and second, performing inverse probability weighted least squares estimation on the outcome equation.

What is covered in the main body of the work?

The main body covers the formal theory and identification assumptions, the derivation of estimators for standard errors in two-step procedures, extensions to pooled OLS settings for multi-period data, and comprehensive Monte Carlo simulation studies.

Which keywords best describe the research?

Key concepts include Endogenous Selection, Inverse Probability Weighting (IPW), GMM, Instrumental Variables, and Sample Selection Bias.

Why is the Sargan-Hansen J-test criticized in this thesis?

The thesis finds through simulations that the J-test is unreliable in the studied context, as it tends to over-reject the null hypothesis (that the instrument is valid) even when the assumption holds, making it a poor diagnostic tool for this specific problem.

What does the empirical application suggest about educational returns?

The analysis indicates that standard available-case regressions likely overestimate the returns to educational attainment, and applying the proposed IPW correction typically leads to a downward adjustment of these returns.

Ende der Leseprobe aus 108 Seiten - nach oben

Details

Titel: Estimation in Case of Endogenous Selection with Application to Wage Regression
Hochschule: Humboldt-Universität zu Berlin (Institute for Statistics and Econometrics)
Note: 1,0
Autor: Michael Lebacher (Autor:in)
Erscheinungsjahr: 2016
Seiten: 108
Katalognummer: V369392
ISBN (eBook): 9783668480162
ISBN (Buch): 9783668480179
Sprache: Englisch
Schlagworte: Inverse Probability Sample Selection GMM Endogenous Selection Wage Regression IV Estimation OLS Two-Stage-Estimation
Produktsicherheit: GRIN Publishing GmbH

Arbeit zitieren: Michael Lebacher (Autor:in), 2016, Estimation in Case of Endogenous Selection with Application to Wage Regression, München, GRIN Verlag, https://www.grin.com/document/369392

Estimation in Case of Endogenous Selection with Application to Wage Regression