Missing data are an ubiquitous problem in statistical analyses that has become an important research field in applied statistics. A highly useful technique to handle missing values in many settings is multiple imputation, that was first proposed by Rubin (1977, 1978) and extended in Rubin (1987). Due to the ongoing improvement in computer power in the last 10 years, multiple imputation has become a well known and often used tool in statistical analyses.
However, there still exists a problem in generally obtaining significance levels from multiply-imputed data, because the application of multiple imputation requires normally distributed or t-distributed complete-data estimators. Today there are basically three methods that extend the suggestions given in Rubin (1987). First, Li, Raghunathan, and Rubin (1991) proposed a procedure, where significance levels are created by computing a modified Wald-test statistic that is then referred to an F-distribution. This procedure is essentially calibrated and the loss of power due to a finite number of imputations is quite modest in cases likely to occur in practice. But this procedure requires access to the completed-data estimates and their variance-covariance matrices, that may not be available in practice with standard software. Second, Meng and Rubin (1992) proposed a complete-data two-stage-likelihood-ratio-test-based procedure that in large samples is equivalent to the previous one. This procedure requires access to the code for the calculation of the log-likelihood-ratio statistics. Common statistical software does not provide access to the code in their standard analyses routines. Third, Li, Meng, Raghunathan, and Rubin (1991) developed an improved version of a method in Rubin (1987) that only requires the chi-square-statistics from a usual complete-data Wald-test. This method is only approximately calibrated and has a substantial loss of power compared to the previous two.
To sum, there exist several procedures to generate significance levels in general from multiply-imputed data, but none of them has satisfactory applicability due to the facts mentioned above. Since many statistical analyses are based on hypothesis tests, especially on the Wald-test in regression analyses, it is very important to find a method that retains the advantages and overcomes the disadvantages of the existing procedures. Developing such a method was the aim of the present thesis.
Inhaltsverzeichnis (Table of Contents)
- 1 Introduction
- 2 Multiple imputation
- 3 Significance levels from multiply-imputed data
- 3.1 Significance levels from multiply-imputed data using moment-based statistics and an improved F-reference-distribution
- 3.2 Significance levels from multiply-imputed data using parameter estimates and likelihood-ratio statistics.
- 3.3 Significance levels from repeated p-values with multiply-imputed data.
- 4 z-transformation procedure for combining repeated p-values
- 4.1 The new z-transformation procedure.
- 4.2 z-test
- 4.3 t-test
- 4.4 Wald-test
- 5 How to handle the multi-dimensional test problem
- 5.1 Idea.
- 5.2 Simulation study
- 5.3 Further problems
- 6 Small-sample significance levels from repeated p-values using a componentwise-moment-based method
- 6.1 Small-sample degrees of freedom with multiple imputation
- 6.2 Significance levels from multiply imputed data with small sample size based on Sa ·
- 7 Comparing the four methods for generating significance levels from multiply-imputed data
- 7.1 Simulation study
- 7.2 Results
- 7.2.1 ANOVA
- 7.2.2 Combination of method and appropriate degrees of freedom
- 7.2.3 Rejection rates.
- 7.2.4 Conclusions
- 8 Summary and practical advices
- 9 Future tasks and outlook
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
The main objective of this dissertation is to develop a method for generating significance levels from multiply-imputed data that overcomes the limitations of existing procedures. This method aims to provide a reliable and efficient way to conduct hypothesis tests in situations where data is missing, particularly within the context of regression analysis. The dissertation focuses on improving upon the existing methods for generating significance levels, specifically addressing the shortcomings of existing procedures in terms of applicability and power.
- Multiple imputation techniques for handling missing data.
- Generating significance levels from multiply-imputed data.
- Comparison and evaluation of different methods for generating significance levels.
- Development of a new method for generating significance levels that addresses the limitations of existing procedures.
- Simulation studies to assess the performance of the proposed method and compare it to existing methods.
Zusammenfassung der Kapitel (Chapter Summaries)
Chapter 1 introduces the concept of missing data and its significance in statistical analysis. It highlights the importance of multiple imputation as a technique for handling missing data and outlines the existing methods for generating significance levels from multiply-imputed data. The chapter also identifies the limitations of these existing methods.
Chapter 2 provides a comprehensive overview of multiple imputation, describing its principles, benefits, and limitations. It discusses the theoretical foundation of multiple imputation and explores various approaches for imputing missing data. The chapter also reviews existing software packages and resources available for implementing multiple imputation.
Chapter 3 delves into the different methods for generating significance levels from multiply-imputed data. It discusses the three main approaches: moment-based statistics and an improved F-reference-distribution, parameter estimates and likelihood-ratio statistics, and repeated p-values with multiply-imputed data. This chapter provides a thorough analysis of each method, highlighting their strengths and weaknesses.
Chapter 4 focuses on a new z-transformation procedure for combining repeated p-values. It introduces the theoretical framework for this procedure and discusses its applications for various statistical tests, including z-test, t-test, and Wald-test. The chapter also presents a detailed explanation of the methodology and its advantages over existing methods.
Chapter 5 examines the problem of handling multi-dimensional test problems. It introduces a simulation study to assess the performance of the proposed method in handling such scenarios and discusses further challenges and potential solutions.
Chapter 6 presents a componentwise-moment-based method for generating significance levels from repeated p-values in small-sample scenarios. The chapter focuses on deriving appropriate degrees of freedom for such situations and discusses the use of Sa· for obtaining significance levels from multiply imputed data with small sample size.
Chapter 7 provides a comprehensive comparison of the four methods for generating significance levels from multiply-imputed data. It presents results from a simulation study conducted to evaluate the performance of each method, including their rejection rates and overall effectiveness. The chapter also analyzes the interplay between different methods and their respective degrees of freedom. Finally, it draws conclusions based on the simulation results and identifies the most suitable method for different scenarios.
Schlüsselwörter (Keywords)
Missing data, multiple imputation, significance levels, hypothesis testing, Wald-test, repeated p-values, z-transformation procedure, simulation studies, small-sample sizes, regression analysis, statistical software.
- Arbeit zitieren
- Christine Aust (Autor:in), 2010, New methods for generating significance levels from multiply-imputed data, München, GRIN Verlag, https://www.grin.com/document/418195