Excerpt

## Contents

**1 Introduction 3**

**2 Multiple imputation 6**

**3 Significance levels from multiply-imputed data 9**

3.1 Significance levels from multiply-imputed data using moment-based

statistics and an improved *F*-reference-distribution 9

3.2 Significance levels from multiply-imputed data using parameter estimates

and likelihood-ratio statistics 12

3.3 Significance levels from repeated p-values with multiply-imputed

data 14

**4 **
*z*
**
-transformation procedure for combining repeated p-values 16
**

4.1 The new *z*-transformation procedure 16

4.2 *z*-test 17

4.3 *t*-test 22

4.4 Wald-test 26

**5 How to handle the multi-dimensional test problem 31**

5.1 Idea 31

5.2 Simulation study 32

5.3 Further problems 35

**
6 Small-sample significance levels from repeated p-values using a
componentwise-
**

**moment-based method 39**

6.1 Small-sample degrees of freedom with multiple imputation 39

6.2 Significance levels from multiply imputed data with small sample

size based on ˜ *S**d *40

**
7 Comparing the four methods for generating significance levels from
**

**multiply-imputed data 44**

7.1 Simulation study 44

7.2 Results 49

7.2.1 ANOVA 49

7.2.2 Combination of method and appropriate degrees of

freedom 55

7.2.3 Rejection rates 61

7.2.4 Conclusions 78

**8 Summary and practical advices 81**

**9 Future tasks and outlook 85**

**List of figures 87**

**List of tables 89**

**A Derivation of (3.1)-(3.5) from Section 3.1 92**

**B Derivation of the degrees of freedom **
*δ*
* *
**and **
*w *
**in the moment-based**

**procedure described in Section 3.1 97**

**References 101**

** **

## Introduction

Missing data are an ubiquitous problem in statistical analyses that has
become an important research field in applied statistics because missing
values are frequently encountered in practice, especially in survey data.
Many statistical methods have been developed to deal with this issue.
Substantial advances in computing power, as well as in theory, in the last
30 years enables the application of these methods for applied researchers.
A highly useful technique to handle missing values in many settings is
multiple imputation, which was first proposed by Rubin (1977, 1978) and
extended in Rubin (1987). The key idea of multiple imputation is to replace
the missing values with more than one, say *m*, sets of plausible
values, thereby generating *m *completed data sets. Each of these
completed data sets is then analyzed using standard complete-data methods.
These repeated analyses are combined to create one imputation inference,
that takes correctly account into the uncertainty due to missing data.
Multiple imputation retains the major advantages and simultaneously
overcomes the major disadvantages inherent in single imputation techniques.

Due to the ongoing improvement in computer power in the last 10 years,
multiple imputation has become a well known and often used tool in
statistical analyses. Multiple imputation routines are now implemented in
many statistical software packages. However, there still exists a problem
in generally obtaining significance levels from multiply-imputed data,
because Rubin’s combining rules (1978) for the completed-data estimates
require normally distributed or *t*-distributed complete-data
estimators. Some procedures were offered in Rubin (1987), but they had
limitations. Today there are basically three methods that extend the
suggestions given in Rubin (1987). First, Li, Raghunathan, and Rubin (1991)
proposed a procedure, where significance levels are created by computing a
modified Wald-test statistic which is then referred to an *F*
-distribution. This procedure is essentially calibrated and the loss of
power due to a finite number of imputations is quite modest in cases likely
to occur in practice. But this procedure requires access to the
completed-data estimates and their variance-covariance matrices. The full
variance-covariance matrix may not be available in practice with standard
software, especially when the dimensionality of the estimand is high. This
can easily occur, e.g., with partially classified multidimensional
contingency tables. Second, Meng and Rubin (1992) proposed a complete-data
two-stage-likelihoodratio- test-based procedure, which was motivated by the
well-known relationship between the Wald-test statistic and the
likelihood-ratio-test statistic. In large samples this procedure is
equivalent to the previous one and only requires the complete-data
log-likelihood-ratio statistic for each multiply-imputed data set. However,
common statistical software does not provide access to the code for the
calculation of the log-likelihood-ratio statistics in their standard
analyses routines. Third, Li, Meng, Raghunathan, and Rubin (1991) developed
an improved version of a method in Rubin (1987) that only requires the *χ*2 *k*-statistics from a usual complete-data Wald-test.
These statistics are provided by every statistical software. Unfortunately,
this method is only approximately calibrated and has a substantial loss of
power compared to the previous two.

To sum, there exist several relatively ”easy” to use procedures to generate significance levels in general from multiply-imputed data, but none of them has satisfactory applicability due to the facts mentioned above. Since many statistical analyses are based on hypothesis tests, especially on the Wald-test in regression analyses, it is very important to find a method that retains the advantages and overcomes the disadvantages of the existing procedures, just as multiple imputation does with the existing techniques to handle missing data. Developing such a method was the aim of the present thesis, that results from a close co-operation with my advisor Susanne Raessler and especially with my second advisor - the ”father” of multiple imputation - Donald B. Rubin.

In Chapter 2 we briefly introduce the multiple imputation theory and give some
important notations and definitions. In Chapter 3 we describe in detail the
three existing procedures mentioned above that create significance levels
from multiply-imputed data. In Chapter 4 we present a new procedure based
on a *z*-transformation. First we examine this new *z*
-transformation-based procedure for simple hypothesis tests like the *z*-test in Section 4.1 and the *t*-test in Section 4.2,
before we consider the Wald-test in Section 4.3. Despite the success of
this new *z*-transformation procedure in several practical settings,
problems arise when two-sided tests are performed. Therefore we develop and
discuss a possible solution in the first section of Chapter 5. Based on a
comprehensive simulation study described in Section 5.2, in Section 5.3 we
discover an interesting general statistical problem: Using a *χ*2 *k*-distribution rather than an *F**k,n*-distribution,
can lead to a not negligible error for small sample sizes *n*,
especially with larger *k*. This problem seems to be unnoticed until
now. In addition, we show the influence of the sample size for generating
accurate significance levels from multiply imputed data. Due to these
problems described in Chapter 5, in Chapter 6 we present an adjusted
procedure, the componentwise-moment-based method, to easily calculate
correct significance levels from multiply-imputed data under some
assumptions. In Chapter 7 we examine this new componentwise-moment-based
method and the already existing procedures in detail by an extensive
simulation study and compare them with each other. We also compare the
results with former simulation studies of Li, Raghunathan, Meng, and Rubin
(1991, 1992), where they simulated draws from the theoretically calculated
distributions of the test statistics, because it was too computationally
expensive to generate data sets and impute them several times due to the
lack of computer power at that time. Our simulation study enables us to
give some practical advices in Chapter 8 about how to calculate correct
significance levels from multiply-imputed data. Finally in Chapter 9, an
overview is given for addressing many challenging tasks left for future
research. [...]

- Quote paper
- Christine Aust (Author), 2010, New methods for generating significance levels from multiply-imputed data, Munich, GRIN Verlag, https://www.grin.com/document/418195

Comments