Likelihood Method for Randomized Time-to-Event Studies with All-or-None Compliance

Causal Inference in Survival analysis


Research Paper (postgraduate), 2009
154 Pages, Grade: A

Free online reading

Table of Contents
Summary 1
1 Introduction
2
2
The HIP study 6
2.1
The HIP breast cancer program
6
3
Notation, assumptions, basic concepts
11
3.1
Notation
11
3.2
Assumptions
12
3.3
Basic
Concepts
14
3.3.1 Compliance Types
14
3.3.2 Causal Effects
16
3.3.3 Compliers Average Causal Effect
(CACE)
17
4
Model Structure and Likelihood
18
4.1
Compliance
types
distributions
18
4.2
The Structure of the joint distribution
20
4.2.1 The joint distribution of T, C, D and Z
20
4.2.2 The joint distribution of the observable variables
26
4.3
The
likelihood
function
28
4.3.1 The idealised likelihood function
28
4.3.2 Likelihood function for complete data
30
4.3.3 Likelihood function for incomplete data
36
5
Simplification of model parameter
EE 37
6
EM algorithm
42
6.1
The Expectation step
43
6.2
The Maximisation step
45
7
Model Specifications
45
7.1.
Specific non-ignorable PPO survival models
46
7.2
Parameter of interest
T in specific models
47
7.2.1
Parameter
T in the NIGN-WW model
47
7.2.2
Parameter
T in the NIGN-LL model
49

8
Computational issues
50
8.1
Maximisation of the Likelihood Function using the EM algorithm 51
8.1.1 Posterior Probabilities of Compliance type U
i
51
8.2
Structures of the Expectation function Q (
T, T
k
)
57
8.3
Maximisation of the Expectation function Q (
T, T
k
) 60
8.4
Estimation of standard errors
61
8.4.1 The standard error for the estimated model parameters 61
8.4.2 The observed information matric (OIM)
63
9
Application
65
9.1
Pre-processing of the HIP breast cancer data
65
9.2
HIP data analysis: via non-ignorable censoring mechanism -
PPO-survival models
70
9.2.1 Model specification for the HIP trial
71
9.2.2 Estimation of causal effects: CACE(t) and ACE(t) 73
9.2.3 Estimation of relevant survival probabilities 80
9.3
HIP data analysis: via the Frangakis-Rubin method
84
9.3.1 Frangakis-Rubin (F-R) method
84
9.3.2 Estimation of relevant probabilities 86
9.3.3 Estimation of causal effects: CACE(t) and ACE(t) 87
9.4
Model comparisons 91
9.4.1 Non-ignorable PPO-survival model vs F-R method
91
9.4.2 The NIGN-WW model vs IGN-W model
98
9.5
Results of the analysis via the non-ignorable PPO-survival model 105
9.5.1 Estimated CACE(t) for the HIP
7
data
105
9.5.2 Estimated CACE(t) for the HIP
8
data
105
10
Conclusions and Discussions
107
References 112
Appendix B
118
Appendix C
137
Appendix D
145
Appendix E
148

Likelihood Method for Randomized
Time-to-Event Studies
with All-or-None Compliance
Zhaojing Gong, Irene Hudson, Patrick Graham
SUMMARY
Estimating causal effects in clinical trials often suffers from treatment
non-compliance and missing outcomes. In time-to-event studies, it is more
complicated because of censoring, the mechanism of which may be non-
ignorable. While new estimators have recently been proposed to account
for non-compliance and missing outcomes, few papers have specifically con-
sidered time-to-event outcomes, where even the intention-to-treat (ITT)
estimator is potentially biased for estimating causal effects of assigned treat-
ment. In this paper we develop a likelihood based method for randomized
clinical trials (RCTs) with non-compliance for time-to-event data and adapt
the EM algorithm according to derive the maximum likelihood estimators
(MLEs).
In addition, we give formulations of the average causal effect
(ACE) and compliers average causal effect (CACE) to suit survival anal-
ysis. To illustrate the likelihood-based method (EM algorithm), a breast
1

cancer trial data (Baker, 1998) was re-analyzed using a model, which as-
sumes that the failure times and censored times both follow Weibull and
Lognormal distributions, respectively (termed the NIGN-WW model and
NIGN-LL model).
keywords: Causal inference; Noncompliance; Survival analysis; Maximum
likelihood; EM algorithm; Weibull distribution; HIP breast cancer trial; CACE;
ITT analysis.
1
INTRODUCTION
Estimating causal effect of interventions is often required for empirical studies
in medicine. Randomized clinical trials (RCTs) are only generally approach for
causal inference, but it often suffers from noncompliance. Noncompliance is a
common complication in the analysis of randomized clinical trials (RCTs). It can
take many forms (Baker, 1997), such as switching from the assigned treatment to
another treatment, or not attending all scheduled visits to receive treatment. The
main feature of RCTs with noncompliance is that not every patient in each group
of the RCT receive the same treatment. All-or-none compliance is a special type of
noncompliance, in which switching occurs immediately after randomization, and
every patient either does or does not comply with their assigned treatment com-
pletely, no partial compliance is assumed. All-or-none compliance is of particular
interest because it greatly simplifies modeling and has received more attention
over the last decade. (Imbens & Rubin, 1997; Baker, 1998; Frangakis & Rubin,
1999; Hirano & Imbens, 2000; Jo, 2002; Loeys & Goetghebeur, 2003; Levy et al.,
2004; Peng et al., 2004; O'Malley & Normand, 2005; White, 2005; Gong et al.,
2005).
2

Besides noncompliance, censoring is another problem has to be dealt with in
many randomized clinical trials, since in medicine studies, the outcomes of interest
most likely are time of event of interest. This kind of data is well known as time-
to-event data, also called survival data. It is a special type of data collated in
long-term follow-up studies, in which the outcome of interest is a time of a specific
event, usually treated as "failure time "or "survival time ".
Time-to-event data often appears in the areas of medicine and health study
(survival time), as well as in economics and engineer industry fields (failure time),
usually time to death or the manifestation of a symptom or occurrence of a dis-
ease. Subjects who may not experience the event of interest during the period of
study (observation), or may withdraw from the clinical trial for some reasons are
considered censored. In any one of these cases mentioned above, subjects´survival
time (time of event) will not be observed, this is given in terminology as censoring.
The standard method of analysis of the time-to-event data is so-called Survival
Analysis, in which censoring is the main problem need to be faced.
The standard analysis of a RCT with noncompliance is an intention-to-treat
(ITT) analysis (Angrist et al., 1996; Imbens & Rubin, 1997), which compares out-
comes of subjects by assigned treatment, ignoring the actual treatments received.
ITT only provides an estimator in terms of the causal effect of assigned treatment
(denoted by Z, of an outcome of interest). It has been denoted by ACE in terms
of Average Causal Effect (Holland, 1986; Graham, 2000, 2001) or IT T (Imbens
& Rubin, 1997; Loeys & Goetghebeur, 2003; O'Malley & Normand, 2005). ITT
analysis is a very plausible method for causal inference (Holland, 1986; Pearl,
2001; Greenland, 2004; Rubin, 2005), as it avoids those biases that associated
with estimating the causal effect "as-treated "(comparing outcomes of subjects
by received treatment) or causal effect "per protocol "(comparing outcomes of
3

subjects who comply with their assigned treatment by assigned treatment) (?).
ITT works well only when the assigned treatment and received treatment are
identical for all patients in the trial, at least if most patients satisfy this as-
sumption, otherwise, a biased estimator of causal effect may be produced (Loeys
& Goetghebeur, 2003; Hirano & Imbens, 2000; White, 2005; Baker & Kramer,
2005).
Amongst an extensive body of the literature on noncompliance, most papers
are address noncompliance as missing covariates. Some recently published papers
consider noncompliance with missing outcomes together (Baker et al., 2000; Hi-
rano & Imbens, 2000; Barnard et al., 2003; Peng et al., 2004; Mealli et al., 2004;
O'Malley & Normand, 2005; Baker & Kramer, 2005), however, they are all on
binary or dichotomous outcomes or normal distributions data, rather than the
time-to-event data.
For time-to-event data, to address the noncompliance and missing outcomes at
the same time is a big challenge for standard survival analysis. As in general, sur-
vival analysis follows the assumption of Non-informative censoring (Kalbfleisch
& Prentice, 2002; Klein, 1997; Cox & Oakes, 1984), which assumes that censoring
mechanism is ignorable. Implying that subjects censored at time t do not con-
tribute to the likelihood function. But when compliance is imperfect in a clinical
trial, its censoring mechanism is more than likely to be non-ignorable (Baker, 1994,
1998). This means that the conventional methods of survival analysis can not be
applied simply for analysis those time-to-event data from randomized clinical trial
with noncompliance.
Therefore, to develop a new approach for analyzing causal effects for time-to-
event data from RCTs with noncompliance and missing outcomes, is necessary
and vital (Fleming & Lin, 2000; Fisher, 1999).
4

In survival study area, to the date, we only find a few of papers are related
to noncompliance issue (Robins & Tsiatis, 1991; Baker, 1994, 1998; Frangakis &
Rubin, 1999; Loeys & Goetghebeur, 2003; Herring & Lipsitz, 2004; Gong et al.,
2005). But none of them are focused on general likelihood method with considering
noncompliance and and non-ignorable simultaneously.
Robins & Tsiatis (1991) focus on structural failure time model; Baker (1994,
1998) are based on cause-specific hazards; Frangakis & Rubin (1999) is based
on the Kaplan-Meier estimator (Cox & Oakes, 1984; Klein, 1997; Kalbfleisch &
Prentice, 2002). Herring & Lipsitz (2004) Loeys & Goetghebeur (2003) and Gong
et al. (2005) are based on Cox model Cox & Oakes (1984). Some of them, such as
Loeys & Goetghebeur (2003); Gong et al. (2005) are still under the assumption
of non-informative censoring.
In this paper, we propose a new approach based on the consideration of non-
compliance and missing outcomes together, which is like the circumstance consid-
ered in Frangakis & Rubin (1999), and focus on likelihood function rather than
using survival estimators; as uncompleted covariate is included, EM algorithm is
adapted, to the author's knowledge, this is the first study to date to develop like-
lihood based (EM) approach t survival data RCT's accommodating both missing
outcomes and noncompliance simultaneously.
To illustrate the plausible of proposed likelihood based approach, two specific
Parametric Potential-Outcome (PPO) Survival models, namely the NIGN-WW
and the NIGN-LL models, have been given and applied for re-analysis a time-to-
event data (breast cancer data)from randomized clinical trial conducted by Health
Insurance Plan (HIP) of Greater New York during the 1960s. For comparison,
the method introduced in Frangakis & Rubin (1999), which can be viewed as a
nonparametric method for imperfect compliance survival data, is also implement
5

on computer and applied for the HIP data.
The rest of the paper has a structure as following: Section 2 gives the back-
ground of the HIP data; Section 3 defines notation used in this paper and as well
as assumptions required for our PPO-survival model plus some basic concepts
which are important for establishing our PPO-survival model; Section 4 provides
the likelihood structure; the PPO-survival model parameter is specified in Section
5 and Section 6 is about the EM algorithm. Seall results of re-analysis of HIP
data with different methods are given in Section 9. Conclusion and discussion are
given in Section 10. All related equation of computations are given in Appendix,
as well as tables and figures.
2
The HIP study
2.1
HIP breast cancer program
The breast cancer data used in this paper was collected from an breast screening
study in USA conducted by the Health Insurance Plan (HIP) of Greater New
York during the 1960s.Participants were those women who enrolled in the HIP for
one year or more when the program started in 1963 with asymptomatic of breast
cancer and ages between 40 and 64.
In total 60696 eligible women were included in this program, and all were
randomly assigned into either the "screening "or "control "group. Women in
the "screening "group were offered free annual breast examinations four times.
These examinations consisted of film mammography (cephalocaudal and later
views of each breast); a clinical examination of the breast by a physician (usually
a surgeon), an interview for demographic, other background information and a
health history. While the women in "control "group only received their usual
6

health care. All participants in this study were followed up in the long-term up
to 18 years from their date of entry. Mortality due to breast cancer was the
main outcome variable in the HIP study, and was used to evaluate the efficacy
of the screening program (Shapiro et al., 1988; Baker, 1998). Therefore the event
of interest was death from breast cancer and the outcome of interest was the
observed time (in years) for each individual from entry to death by breast cancer
or loss to follow-up. For those women diagnosed with breast cancer in the study
there was no loss to follow up.
Approximately one-third of women who were assigned to the
"screening
"group refused the offer of a screening examination. Our analysis assumes no
women in the "control "group received "screening " this assumption has been
given by Frangakis & Rubin (1999); Loeys & Goetghebeur (2003) in terms of No
access of treatment for the control group.
We also assume "All-or-none compliance "meaning that we assume women
who took up the offer of screening, complied fully with the screening programme
protocol by attending (at least one of) all scheduled screening visits whereas
women who initially declined to participate in the programme are assumed not to
have attended any screening visits.
Previous research on the HIP breast cancer data (Shapiro, 1977; Shapiro et al.,
1982, 1988) was mostly based on cumulative breast cancer's mortality rate and
survival rate over particular fixed follow-up durations rather than on estimating
survival or hazard functions. In addition, with the exception of Baker (1998)
Richardson and Wells (1995?).
The previous analysis of the HIP by Shapiro (1977); Shapiro et al. (1982, 1988)
have ignored compliance information and estimated the intention-to-treat (ITT)
effect of assigned treatment.
7

Non compliance presents a challenge to traditional statistical methods for the
analysis of RCTs because neither an analysis which ignores non-compliers, nor an
analysis which focuses on treatment actually received can provide [a precise and
reliable answer] valid inferences for on the effect of mammography on breast cancer
mortality. In the former case, the observed outcomes for the intervention group
are a mixture of outcomes for women receiving and not receiving the intervention
In the latter case, the two groups are not necessarily comparable, because the self-
selection of treatment for the intervention group means that treatment received
is not randomized and consequently treatment groups may differ on background
factors related to breast cancer and mortality risk.
The summary statistics of HIP data is given in Table 2.
8

T
able
1:
Primary
notation.
P
oten
tial
and
Observ
able
data
T
erm
Data
for
sub
ject
i
Assigned
treatmen
t
Z
i
=
z
(1
=
in
terv
en
tion
,
0
=
con
trol)
Compliance
ty
p
e
U
i
=
u
(c
=
complier
,n
=
nev
er-tak
er,
and
a
=
alw
a
ys-tak
er)
P
oten
tial
data
Observ
able
Data
Receiv
ed
treatmen
t
D
i
=[
D
i
(1)
,D
i
(0)]
D
obs
i
=
zD
i
(1
)+(
1
-
z
)D
i
(0)
F
ailure
time
T
i
=[
T
i
(1)
,T
i
(0)]
T
obs
i
=
zT
i
(1
)+(
1
-
z
)T
i
(0)
Censored
time
C
i
=[
C
i
(1)
,C
i
(0)]
C
obs
i
=
zC
i
(1
)+(
1
-
z
)C
i
(0)
Censored
indicator
R
i
=[
R
i
(1)
,R
i
(0)]
R
obs
i
=
zR
i
(1
)+(
1
-
z
)R
i
(0)
Observ
ed
time
Y
i
(z
)=
min
(T
i
(z
),C
i
(z
))
Y
i
=
zY
i
(1
)+(
1
-
z
)Y
i
(0)
P
opulation
parameters
P
arameter
Notation
Subp
opulation
Mo
del
parameter
=(
T
,uz
,
C
,uz
,
u,z
d
)(
U,
Z
)
i
=(
u,
z
)
Surviv
al
probabilit
y
S
uz
(t
)(
U,
Z
)
i
=(
u,
z
)
Prop
ortion
U
i
=
u
u
All
Prop
ortion
(Z,
D
)
i
=(
z,
d
)
zd
All
Data
summaries
Statistics
Notation
Subsample
Sample
size
n
zd
r
(Z,
D
,R
)
i
=(
z
,d,
r
)
Mean
observ
ed
time
¯ Y
zd
(Z,
D
)
i
=(
z,
d
)
Prop
ortion
of
ev
en
t
¯ R
(Z,
D
,R
)
i
=(
z
,d,
r
)
9

T
able
2:
Summary
statistics:
HIP
data.
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
60,696
30,131
30,565
20,147
9,984
40,549
BC
patien
ts
2,325
1,171
1,154
835
336
1490
Deaths
of
BC
786
372
414
247
125
539
Mean
of
observ
ed
time
¯ Y
14.051
14.1578
13.9457
14.905
12.6499
13.6266
Prop
ortion
of
¯ R
=
1
0.0129
0.0123
0.0135
0.0123
0.0125
0.0133
Prop
ortion
of
¯ Z
=
1
0.4964
1
0
1
1
0.2462
Prop
ortion
of
¯ D
=
1
0.3319
0.6686
0
1
0
0
R
=
1
:
ev
en
t
o
ccurs;
Z
=
1
:
randomly
assigned
to
the
screening
group;
D
=
1
:
receiv
ed
the
screening
examination.
10

3
Notation, assumptions and basic concepts
3.1
Notation
We develop our model under the same randomization conditions considered in
Frangakis and Rubin (1999). First, a sample of N individuals is randomly sam-
pled from a large population of individuals all of whom are eligible to participate
in the study, and could therefore be assigned to any of the possible treatment
levels. Under this set-up it meaningful to consider contrasts between outcomes
under alternative treatment assignments for the population from which the study
sample is assumed to have been obtained. A randomized controlled trial is con-
structed by randomly allocating each of the N sampled individuals to a treatment
group. Our aim is to make inferences about the population causal effects of treat-
ment assignment, and of treatment received on the basis of the data from the N
individuals enrolled in the study.
We adopt a potential outcomes model (Neyman 1923, Rubin, 1978) for sur-
vival times under alternative treatment assignments.
Let Z
i
= z denote the
assigned treatment (z = 1, for invitation of screen and z = 0 for control); and
let T
i
(1) denote the failure time for the ith individual in the population if as-
signed treatment 1 (for example, invitation for screening) and let T
i
(0) denote the
failure time for that same individual if assigned to treatment 0 (not invited for
screening). Similarly let C
i
(1) and C
i
(0) denote potential censoring times corre-
sponding, respectively to treatment assignments 1 and 0. The censoring times are
the times at which an individual will be censored. As in Imbens & Rubin (1997),
our model also requires indicators for the treatment received under the alterna-
tive treatment assignments,. Thus D(1) denotes treatment actually received if
assigned to treatment, 1, and D(0) the treatment actually received if assigned to
11

treatment 0. In the potential outcomes models observable data is represented as
particular elements of the potential outcome vectors, and the elements of these
vectors actually observed, reflects the treatment assignment. Thus if Z
i
= 1, the
observable treatment received indicator is D
i
(Z
i
) = D
i
(1), the observable failure
time is T
i
(Z
i
) = T
i
(1) and the observable censoring time is C
i
(Z
i
) = C
i
(1), and
similarly if Z
i
= 0. Note that although we refer to T
i
(Z
i
) and C
i
(Z
i
) as 'observ-
able', at most one of these variables is fully observed. If an individual is, in fact,
censored then C
i
(Z
i
) is observed and T
i
(Z
i
) is only partially observed in the sense
that it is known that T
i
(Z
i
) > C
i
(Z
i
). On the other hand if the failure time for
the ith individual is observed the censoring time is only partially observed and
known to be greater than the survival time. While it is standard in much applied
work on survival analysis to structure the analysis in terms of the observable min-
imum of survival and censoring time and a censoring indicator, such a set-up is
valid only under the assumption of ignorable censoring,which is equivalent to the
assumption of the independence of failure time and censored time given in section
3.2 in Klein (1997).
Because our analysis allows for non-ignorable censoring we work directly with
the more fundamental failure time and censoring time variables.
More details of notation used in this paper are given in Table 1.
3.2
Assumptions
Before giving the structure of our PPO-survival model, we list all assumption
required first which most follow those assumptions given in Frangakis & Rubin
(1999).
1. The stable unit treatment value assumption (SUTVA) (Rubin, 1986; Imbens
& Rubin, 1997; Peng et al., 2004): this assumes that the potential outcomes
12

of each individual are unrelated to the treatment status of other individuals.
2. Random assignment: this assumes that each individual has been assigned
randomly to a treatment group.
3. Exclusion restriction: this assumes that the assigned treatment has an effect
on the outcome only through its effect on the received treatment.
4. Monotonicity: the actual received treatment D
i
(z) is a monotone function
with respect to the potential treatment levels, 1 for being in the treatment
group and 0 for being in the control group.
5. Latent ignorability: this assumes that the potential outcomes (of failure
time) and the associated potential non-responses (censoring time) are inde-
pendent at each level of the latent compliance types.
In the literature, the latent ignorability assumption for general data from RCTs
has been expressed as follows:
P r(Y
i
, R
i
|U
i
, Z
i
) = P r(Y
i
|U
i
, Z
i
)P r(R
i
|U
i
, Z
i
);
(3.1)
where Y
i
and R
i
represent the potential outcome variable and the response indi-
cator for the ith individual, respectively; the notation for U
i
and Z
i
are the same
as in Frangakis & Rubin (1999), Mealli et al. (2004) and O'Malley & Normand
(2005). U
i
denotes the compliance type, and Z
i
indicates the assigned treatment.
The analogous assumption for time-to-event data is:
P r(T
i
, C
i
|U
i
, Z
i
) = P r(T
i
|U
i
, Z
i
)P r(C
i
|U
i
, Z
i
),
(3.2)
13

where T
i
and C
i
denote the potential outcome of failure time and censoring
time, respectively. This assumes that in RCTs with time-to-event outcomes, the
potential failure and censoring times are independent conditional on the latent
compliance type. We regard equation (3.2) as the time-to-event version of the
latent ignorabilty assumption given by Frangakis & Rubin (1999). Note that in
this chapter, we use the bold symbols, such as T
i
and C
i
, to denote potential
outcome vectors.
3.3
Basic concepts
For convenience of understanding the model structure given in Section 4, we in-
troduce a few basic concepts in the following.
3.3.1
Compliance types
In literature, a variety of names that have been used to describe individuals' com-
pliance behaviour, such as compliance type (Imbens & Rubin, 1997), compliance
state (O'Malley & Normand, 2005), or principle strata (Frangakis & Rubin, 2002;
White, 2005). This is shown that individuals' compliance behaviour has drawn
much attention from many authors, as doing noncompliance analysis, individuals'
compliance behaviour in RCTs is a very important issue that has to be concerned.
In this paper we follow the work of Imbens & Rubin (1997) and refer to this new
variable as the compliance type and denote it by U.
The definition of compliance type given by Imbens & Rubin (1997) is as follows:
1. Compliers (c) are those individuals who receive the treatment when they
are assigned to the treated group. They would not receive any treatment
when assigned to the control group. That is, compliers always comply with
their assigned treatment Z
i
.
14

2. Always-takers (a) are those individuals who always receive the treatment
regardless which groups they are assigned to. That is, always-takers only
comply with their assigned treatment when Z
i
= 1.
3. Never-takers (n) are individuals who would never receive the treatment re-
gardless which group they are assigned to. That is, never-takers only comply
with their assigned treatment when Z
i
= 0;
4. Defiers (d) are individuals who receive the treatment when they are assigned
to the control group and would not receive the treatment when they are
assigned to the treatment group. That is, defiers never comply with their
assigned treatment Z
i
= z no matter whatever value Z
i
has.
Using the symbol of potential received treatment D
i
= [D
i
(1), D
i
(0)], these
four compliance types can be written mathematically as follows,
U
i
=
c,
(i.e. subject i is a complier)
if D
i
(1) = 1 and D
i
(0) = 0;
n,
(i.e. subject i is a never-taker)
if D
i
(1) = 0 and D
i
(0) = 0;
a,
(i.e. subject i is an always-taker)
if D
i
(1) = 1 and D
i
(0) = 1;
d,
(i.e. subject i is a defier)
if D
i
(1) = 0 and D
i
(0) = 1.
(3.3)
This compliance type variable, U
i
, allows us to classify a population into four
subpopulations by the four categories of individuals' compliance behaviour, that
is compliers (U
i
= c), always-takers (U
i
= a), never-takers (U
i
= n) and defiers
(U
i
= d).
In the literature, the fourth compliance type, defier (U
i
= d), is often omitted,
since there is no reason to suppose that individuals who agree to participate in
a trial will necessarily always counter their experimenters. To avoid this extreme
15

situation, a very common assumption named Monotonicity (Angrist et al., 1996;
Imbens & Rubin, 1997; Frangakis & Rubin, 1999) is given, which supposes that
the component of the received treatment D
i
is a monotone function with respect
to Z
i
, that is D
i
(1) and D
i
(0) have a relationship as follows,
D
i
(1)
D
i
(0).
(3.4)
Instead of using the terminology of Monotonicity, some authors directly as-
sume, however, that there are strictly no defiers in the study (Baker, 1998; Baker
& Kramer, 2005).
3.3.2
Causal Effects
Using the potential outcomes notation it is straightforward to define causal ef-
fects of treatment assignment, for example T
i
(1)
- T
i
(0) is the the causal effect
of treatment assignment for the i
th
individual. Within the potential outcomes
framework it is conventional to define causal effects contrasts between population
averages under alternative treatment assignments (Rubin, 1978, Imbens & Rubin,
1997, Hirano et al, 2000). However for failure time data the full survival curve
is potentially of interest and we therefore define causal effects in a time depen-
dent manner, contrasting population survival curves under alternative treatment
assignments.
Let S(a, t) denote the population survival curve under assignment to treatment
a. This is the survival curve, defined as Pr(T (a) > t), that would be observed
if all members of the population were assigned to treatment a and followed until
failure. Here the probability notation, Pr() can be interpreted as the population
proportion or as a relative frequency under repeated sampling from the population.
16

For any pair of treatment levels labeled, for convenience, 0 and 1, , an average
causal effect of treatment assignment at time t, can be defined as
ACE(t) = S(1, t)
- S(0, t)
(3.5)
Thus the population survival curves induce a population - level causal effect
curve. This time varying notion of causal effect can be related to time-independent
definitions of average causal effect by considering, for, any time t, the indicator
variables I(a, t) defined to equal 1, if T (a) > t and to equal 0, otherwise. For
the ithindividual, the causal effect of treatment at time, t, is then ICE
i
(t) =
I
i
(1, t)
- I
i
(0, t) and the average causal effect is the average of these individual
effects effect at time t, can then be written
ACE(t) = E (I(1, t)
- I(0, t))
= S(1, t)
- S(0, t)
(3.6)
3.3.3
Compliers average causal effect
Because the treatment received by always takers and never takers is not affected
by treatment assignment the subpopulation average causal effect functions for
these groups do not reflect the causal effect of treatment. On the other hand, the
sub-population average causal effect functions for compliers and defiers are of of
interest because these effects may reflect the effect of treatment. For example the
Complier Average Causal Effect (CACE):
CACE = S
c
(1, t)
- S
c
(0, t)
(3.7)
contrasts survival under alternative treatment assignments for a sub-population
17

know to comply with these assignments. It is therefore more reasonable to at-
tribute any effect of assigned treatment in this sub-population to the causal effect
of treatment,than is the case for the full population. Nevertheless, Hirano et al
(2000) that such causal attributions, remain assumptions, the plausibility of which
is increased by study design strategies such as blinding and double blinding.
Implicit in the above definitions is the Stable-Unit-Value Assumption (SUTVA)
(Rubin, 1986 comment ), which holds that outcomes for any given individual are
not affected by the treatment assignment or outcomes of other individuals. This
seems reasonable in screening trials such as HIP and in many other studies of
interventions delivered at the level of the individual. When SUTVA is violated
the vector of potential outcomes must be expanded to accommodate, not only the
outcomes corresponding to each individual's potential exposure but also combi-
nations of potential exposures for other individuals (Infectious disease reference
here?) Definition of causal effects is then correspondingly more difficult.
4
Model Structure and Likelihood
4.1
Compliance type's distribution
Noncompliance is the main issue we have to consider in our PPO-survival approach
and it is often described by compliance type variable, which is defined by the vector
of potential received treatment D
i
= [D
i
(1), D
i
(0)] (equation (3.3)). By following
the property of conditional probability, P r(D
i
|Z
i
, ), the conditional distribution
of vector D
i
given the assigned treatment Z
i
, (where is the associated parameter
18

for this distribution) is broken down in equation (4.1).
P r(D
i
|Z
i
; ) = P r(D
mis
i
, D
obs
i
|Z
i
; )
= P r(D
mis
i
|D
obs
i
, Z
i
;
zd
)P r(D
obs
i
|Z
i
;
z
),
(4.1)
where
zd
and
z
represent the parameters associated with these two conditional
distributions of D
mis
i
and D
obs
i
respectively. This is because D
i
can also be ex-
pressed as [D
mis
i
, D
obs
i
], where D
mis
i
and D
obs
i
denote the unobservable and ob-
servable components of potential received treatment vector D
i
. Note that when
Z
i
is a binary variable, we can write D
mis
i
= D
i
(1
- Z
i
) and D
obs
i
= D
i
(Z
i
).
On the other hand, by the definition of compliance type U
i
, given the actual
received treatment D
obs
i
= D
i
(Z
i
), learning the value of D
mis
i
is equivalent to
learning the value of U
i
. This implies that the conditional probability distribution
of the compliance type U
i
given D
obs
i
= D
i
(Z
i
) and Z
i
, P r(U
i
|D
i
(Z
i
), Z
i
, ;
), can
be obtained from P r(D
mis
i
|D
obs
i
, Z
i
; ), the conditional probability distribution of
D
mis
i
, the unobservable component of the potential received treatment vector D
i
,
given the actual received treatment D
obs
i
and assigned treatment Z
i
. In other
words:
P r(U
i
|D
obs
i
, Z
i
;
)
d
P r(D
mis
i
|D
obs
i
, Z
i
;
zd
);
(4.2)
where
d
indicates distributional equivalence: any probability on one side of
the equation can be obtained from the distribution on the other side. For example,
P r(U
i
= c
|D
obs
i
= 1, Z
i
= 1,
) = P r(D
mis
i
= 0
|D
obs
i
= 1, Z
i
= 1,
zd
).
After substituting equation (4.2) into the second line in equation (4.1), we
have:
P r(D
i
|Z
i
, )
d
P r(U
i
|D
obs
i
, Z
i
;
)P r(D
obs
i
|Z
i
;
z
),
(4.3)
where
denotes the parameter corresponding to the distribution of U
i
condi-
19

tional on the actual received treatment D
obs
i
and assigned treatment Z
i
, and
z
is the parameter associated with the distribution of D
obs
i
given Z
i
. This implies
that P r(D
i
|Z
i
, ) can be obtained from the product of P r(U
i
|D
obs
i
, Z
i
;
) and
P r(D
obs
i
|Z
i
;
z
). In other words, P r(D
i
|Z
i
, ), the conditional distribution of D
i
(the potential received treatment vector) given assigned treatment Z
i
, can also
be obtained from the product of two conditional distributions, P r(U
i
|D
obs
i
, Z
i
;
)
and P r(D
obs
i
|Z
i
;
z
), the former is the conditional distribution of U
i
given the
actual received treatment D
obs
i
and assigned treatment Z
i
, and the latter is the
conditional distribution of D
obs
i
given Z
i
(equation (4.3)).
4.2
The structure of the joint distribution
4.2.1
The joint distribution of T, C, D and Z
Suppose a RCT includes N individuals, each component of the potential outcomes
vector, including failure time T
i
= [T
i
(1), T
i
(0)], censoring time C
i
= [C
i
(1), C
i
(0)]
and the potential received treatment D
i
= [D
i
(1), D
i
(0)], can then be written as
a vector of length N, as follows:
T(1) = [T
1
(1), . . . , T
N
(1)] , the failure time if assigned to the treatment group;
T(0) = [T
1
(0), . . . , T
N
(0)], the failure time if assigned to the control group;
C(1) = [C
1
(1), . . . , C
N
(1)], the censoring time if assigned to the treatment group;
C(0) = [C
1
(0), . . . , C
N
(0)], the censoring time if assigned to the control group;
D(1) = [D
1
(1), . . . , D
N
(1)], the received treatment if assigned to the treatment
group;
D(0) = [D
1
(0), . . . , D
N
(0)], the received treatment if assigned to the control
group;
20

We also assume that all related variables are independent and identically dis-
tributed (i.i.d.) random variables, which include failure time T, censoring time
C, potential received treatment D and assigned treatment Z. Their joint distri-
bution can then be expressed as a product of the joint distribution for N single
observations, that is:
P r(T(1), T(0), C(1), C(0), D(1), D(0), Z; )
=
i
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ),
(4.4)
where denotes the model parameter vector.
We suppose that the model parameter can be broken down into three parts,
namely = (
, , ), where is the parameter associated with the joint dis-
tribution of failure and censoring times (outcomes), is the parameter for the
distribution of the vector of potential received treatment D
i
(post-treatment co-
variates), and is the parameter corresponding to the distribution of assigned
treatment Z
i
(pre-treatment covariates). According to the rule of conditional dis-
tribution, the joint probability of all the aforementioned variables, corresponding
to the ith individual, P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ), can thus
be expressed as a product of three distributions:
· The joint distribution of failure time T
i
= [T
i
(1), T
i
(0)] and censoring time
C
i
= [C
i
(1), C
i
(0)] given potential received treatment D
i
= [D
i
(1), D
i
(0)]
and assigned treatment Z
i
, P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0)
|D
i
(1), D
i
(0), Z
i
;
).
· The distribution of potential received treatment D
i
= [D
i
(1), D
i
(0)] given
assigned treatment Z
i
, P r(D
i
(1), D
i
(0)
|Z
i
; ).
· The distribution of assigned treatment Z
i
, P r(Z
i
, ).
21

Note that
, and denote the parameters associated with these above three
distributions, respectively. We then have:
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0), D
i
(1), D
i
(0), Z
i
; ) =
P r(T
i
(1), T
i
(0), C
i
(1), C
i
(0)
|D
i
(1), D
i
(0), Z
i
;
)P r(D
i
(1), D
i
(0)
|Z
i
; )P r(Z
i
; ).
(4.5)
Using the notation of potential-outcome vectors, equation (4.5) can thus be ex-
pressed as follows:
P r(T
i
,C
i
, D
i
, Z
i
; ) =
P r(T
i
, C
i
|D
i
, Z
i
;
)P r(D
i
|Z
i
; )P r(Z
i
; );
(4.6)
for = (
, , ), the model parameter vector defined earlier.
In Section 4.1, we derived equation (4.2), which allows us to replace P r(D
i
|Z
i
; )
with P r(U
i
|D
obs
i
, Z
i
; )P r(D
obs
i
|Z
i
;
z
). The joint distribution of all the aforemen-
tioned variables given in equation (4.4) can now be rewritten as follows:
P r(T
i
,C
i
, D
i
, Z
i
; )
d
P r(T
i
, C
i
|Z
i
, D
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
|Z
i
;
z
)P r(Z
i
; ),
= P r(T
i
, C
i
|Z
i
, D
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
, Z
i
; );
(4.7)
where
d
indicates equivalence in distribution.
The last equality holds in equation (4.7) is because
P r(D
obs
i
, Z
i
; ) = P r(D
obs
i
|Z
i
,
z
)P r(Z
i
; ).
(4.8)
Therefore, these two parameters
z
and , have been replaced with the parameter
of . The advantage of using to replace
z
and is that it simplifies notation of
22

the parameters involved in the model. Recall that the main parameter of interest
in our approach is
, the parameter associated with the joint distribution of failure
time and censoring time (survival outcomes).
In addition, the vector of received treatment D
i
can be viewed as being
equivalent to compliance type U
i
. This is because, on one hand, U
i
= u (for
u
{c, a, n, d}) is determined by D
i
= [D
i
(1) = d
1
, D
i
(0) = d
0
] (for d
0
, d
1
{0, 1}) (equation (3.3)), and, on the other hand, if we know the compliance type
U
i
= u, the values of vector D
i
= [D
i
(1), D
i
(0)] are then also known. So we write
(U
i
= u)
(D
i
= [D
i
(1), D
i
(0)] = [d
1
, d
0
]) where
denotes equivalence.
Let U
i
replace D
i
. We have:
P r(T
i
, C
i
, D
i
, Z
i
; )
d
P r(T
i
, C
i
, U
i
, Z
i
; );
(4.9)
then equation (4.7) is equivalent to
P r(T
i
,C
i
, U
i
, Z
i
; )
d
P r(T
i
, C
i
|Z
i
, U
i
;
)P r(U
i
|Z
i
, D
obs
i
; )P r(D
obs
i
, Z
i
; ).
(4.10)
Furthermore, it is useful to impose some additional structure on the model.
This is achieved by partitioning the parameter vector as
= (
T
,
C
,
T C
)
T
,
where
T
denotes transpose. We also require that the marginal bivariate failure
time distribution depends only on
T
and that the marginal bivariate censoring
time distribution depends only on
C
. Hence the
T C
parameters act as association
parameters. This latter condition is only required to complete the parametrisation
of the joint distribution of the potential failure and censoring times.
By following the assumption of latent ignorability, equation (3.2), which gives
the joint distribution of failure and censoring times conditional on U
i
and Z
i
,
23

denoted by P r(T
i
, C
i
|Z
i
, U
i
;
), can be separated as a product of two distributions
of the failure time and the censoring time (conditional on compliance type U
i
only
because of randomisation). Specifically:
P r(T
i
, C
i
|U
i
, Z
i
;
) = P r(T
i
|U
i
, Z
i
;
T
)P r(C
i
|U
i
, Z
i
;
C
),
(4.11)
This means we can break down
into = (
T
,
C
)
T
. In other words, under the
latent ignorability assumption, the third part of
, namely
T C
, has been omitted.
Consequently, the model parameter can then be broken down as =
{
T
,
C
,
, }, where the subcomponent parameters of are detailed below:
T
corresponds to the parametrisation of the distribution of failure time,
C
corresponds to the parametrisation of the distribution of censoring time,
corresponds to the parametrisation of the conditional distribution of compliance
type U
i
, and
corresponds to the parametrisation of the joint distribution of (D
obs
i
, Z
i
).
Equation (4.11) can also be expressed as the following equivalence relation-
ships:
Pr(T
i
(1), C
i
(1)
|U
i
, Z
i
= 1;
) =Pr(T
i
(1)
|U
i
, Z
i
= 1;
T 1
)Pr(C
i
(1)
|U
i
, Z
i
= 1;
C1
)
Pr(T
i
(0), C
i
(0)
|U
i
, Z
i
= 0;
) =Pr(T
i
(0)
|U
i
, Z
i
= 0;
T 0
)Pr(C
i
(0)
|U
i
, Z
i
= 0;
C0
).
(4.12)
We propose a similar structuring of the marginal bivariate failure and cen-
soring time models by assuming
T
= (
T 1
,
T 0
,
T 10
) and
C
= (
C1
,
C0
,
C10
)
24

respectively. We then have:
Pr(T
i
(0)
|U
i
;
T
) = Pr(T
i
(0)
|U
i
;
T 0
)
(4.13)
Pr(T
i
(1)
|U
i
;
T
) = Pr(T
i
(1)
|U
i
;
T 1
).
(4.14)
Pr(C
i
(0)
|U
i
;
C
) = Pr(C
i
(0)
|U
i
;
C0
)
(4.15)
Pr(C
i
(1)
|U
i
;
C
) = Pr(C
i
(1)
|U
i
;
C1
).
(4.16)
Note that the third components of
T
and
C
, denoted by
T 10
and
C10
, were
not involved. This can be interpreted as that an individual in a RCT can only be
assigned into one of treatment groups (Z
i
= 1 or 0). No one can be assigned into
both two treatment groups (Z
i
= 1 and 0). In other words, in practice no such a
situation exists which need parameter
T 10
or
C10
to describe. So we simplify
T
and
C
as
T
= (
T 1
,
T 0
) and
C
= (
C1
,
C0
).
Under randomisation, the assigned treatment Z is independent of all other
variables. This is denoted as Z
(T, C, D, U), where means independent. It
therefore follows that:
P r(T
i
, C
i
|Z
i
, U
i
;
) = P r(T
i
, C
i
|U
i
;
).
(4.17)
Equation (4.17) represents a major simplification because it means that the
dependency of the potential survival (and censoring) times on assigned treatment
(Z
i
) does not have to be explicitly modelled. In non-randomised studies, this
dependency normally has to be modelled. Without additional assumptions, this
generally leads to problems with identifiability (Rubin, 1978; Graham, 2000).
25

The joint distribution of all potential entities for the ith observation can then
be expressed as follows:
P r(T
i
,C
i
, D
i
, Z
i
; )
d
P r(T
i
|U
i
;
T
)P r(C
i
|, U
i
;
C
)P r(U
i
|D
obs
i
;
)pr(D
obs
i
, Z
i
; );
(4.18)
where the model parameter is =
{
T
,
C
,
, }.
Our primary focus is on
T
. In particular, we wish to focus on (
T 1
,
T 0
),
the parameters of the marginal failure time distributions. Given estimates of
these parameters and given our specific distributional assumptions, the estimated
potential survivor functions (under the alternative assigned treatment) and the
estimators of ACE(t) and CACE(t) can then be obtained. Here ACE(t) and
CACE(t) are the forms of average causal effect (ACE) and complier average causal
effect (CACE) (Angrist et al., 1996; Imbens & Rubin, 1997; Yau & Little, 2001;
Peng et al., 2004; Mattei & Mealli, 2007), respectively, for time-to-event studies.
Note that they both are functions of time t.
4.2.2
The joint distribution of the observable variables
For deriving the formulation of the likelihood function, we need to address the
construct of the joint distribution of observed variables first, as it is well known
that the likelihood function of the model parameter , denoted by L
F
(), is a
function of , for fixed observed variables, and the function is proportional to the
joint distribution of all observed variables.
In time-to-event studies, however, failure time T
i
and censoring time C
i
cannot
be fully observed jointly. It is convenient to present a construct of the joint distri-
bution of all observable variables (denoted by P r(T
obs
, C
obs
, D
obs
, Z; )), rather
than the joint distribution of all observed variables, as the former can be ob-
26

tained directly by integrating P r(T
i
, C
i
, D
i
, Z
i
; )) with respect to the unobserv-
able/missing variables, which are T
mis
i
= T
i
(1
- Z
i
), C
mis
i
= C
i
(1
- Z
i
) and
D
mis
i
= D
i
(1
- Z
i
) when Z
i
is a binary. Therefore:
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
=
P r(T
i
, C
i
, D
i
, Z
i
; )dD
mis
i
dC
mis
i
dT
mis
i
,
(4.19)
where P r(T
i
, C
i
, D
i
, Z
i
; ) is given in equation (4.18).
Because of equation (4.18), equation (4.19) can then be expressed as follows
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
d
P r(T
i
|U
i
;
T
)P r(C
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
)P r(D
obs
i
, Z
i
; )dU
i
dC
mis
i
dT
mis
i
,
(4.20)
where dD
mis
i
is replaced by dU
i
because, given D
obs
i
and Z
i
, we have D
mis
i
U
i
.
Recall that D
obs
i
= D
i
(Z
i
), this implies that for the ith individual, his/her assigned
treatment Z
i
and actual received treatment D
obs
i
are always observed.
Table 3: Compliance types U
i
= u given observed values of Z
i
and D
i
(Z
i
).
Z
i
D
i
(Z
i
)
Possible U
i
1
1
a or c
1
0
n
0
1
a
0
0
n or c
Under the assumptions of exclusion restriction and monotonicity, compliance
type U
i
has up to three categories, which are compliers (U
i
= c), never-takers
(U
i
= n) and always-takers (U
i
= a) and shown in Table 3. Then integration in
equation (4.20), with respect to U
i
(compliance type), can be further replaced by
27

its summation. We then write:
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; )
d
U
i
{c,n,a}
P r(T
i
|U
i
;
T
)P r(C
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
)P r(D
obs
i
, Z
i
; )dC
mis
i
dT
mis
i
,
= P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(U
i
|D
obs
i
;
)
P r(T
i
|U
i
;
T
)dT
mis
i
P r(C
i
|U
i
;
C
)dC
mis
i
,
= P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(T
obs
i
|U
i
;
T
)P r(C
obs
i
|U
i
;
C
)P r(U
i
|D
obs
i
;
).
(4.21)
where
d
denotes distributional equivalence.
4.3
The likelihood function
As stated in Section 4.2.2, the likelihood function with respect to the model pa-
rameter can be a function that is proportional to the joint distribution of all
observed variables (p9, Tanner (1993)). However, because in time-to-event stud-
ies for the ith individual, either failure time (T
obs
i
) or censoring time (C
obs
i
) can
be observed, but not both. This implies that P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ) can only
be viewed as the joint distribution for the observable variables (equation (4.21))
rather than the joint distribution for the observed variables.
Hence, instead of the likelihood function, we formulate a function of the model
parameter that is proportional to the joint distribution of all observable vari-
ables, the so-called the idealised likelihood function with respect to the model
parameter denoted by IDL().
4.3.1
The idealised likelihood function
Given the joint distribution of the observable variables, P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
|),
in equation (4.21), for the ith observation, the idealised likelihood function with
28

respect to the model parameter , denoted by IDL
i
(), has the following form:
IDL
i
()
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ).
(4.22)
On the other hand, by following equation (4.21), the product of four distribu-
tions, P r(T
obs
i
|U
i
, Z
i
;
T
),P r(C
obs
i
|U
i
, Z
i
;
C
), P r(U
i
|D
obs
i
, Z
i
; ) and P r(D
obs
i
, Z
i
; )
can be viewed as a function of the model parameter = (
T
,
C
,
, ), and
it is proportional to the joint distribution of the idealised observed variables,
P r(T
obs
i
, C
obs
i
, D
obs
i
, Z
i
; ).
The idealised likelihood function with respect to the model parameter ,
(IDL
i
()), can then be obtained from the right hand side of equation (4.21),
so it has the formulation as follows:
IDL
i
()
=
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
)P r(D
obs
i
, Z
i
; )
=P r(D
obs
i
, Z
i
; )
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
),
(4.23)
where the second equation holds because the term P r(D
obs
i
, Z
i
; ) is not related
to the compliance covariate U
i
involved in the summation (equation (4.23)).
Note that in equation (4.23), assigned treatment Z
i
is involved in each term of
conditional distribution on the right-hand side. This is true because the observable
variables, namely T
obs
i
, C
obs
i
and D
obs
i
, all rely on the given value of Z
i
.
Recall that in our approach, the main parameter of interest is
T
, the param-
eter associated with the distribution of failure time T
i
. However, from equation
(4.23), we can see that because the three parameters,
T
,
C
and
, are all
involved in the summation over compliance type U
i
, the likelihood for these pa-
rameters does not separate into distinct components which can be maximised
29

separately. This implies that estimation of
T
cannot proceed independently. So
we write these three parameters as a row vector,
= (
T
,
C
,
), which are asso-
ciated with the distributions of failure time, censoring time and compliance type,
respectively. We then focus on deriving the idealised likelihood function for
,
denoted by IDL
i
(
) rather than IDL
i
(). Then IDL
i
(
) can be expressed as
follows:
IDL
i
(
)
=
U
i
{c,n,a}
P r(T
obs
i
|U
i
, Z
i
;
T
)P r(C
obs
i
|U
i
, Z
i
;
C
)P r(U
i
|D
obs
i
, Z
i
;
).
(4.24)
4.3.2
Likelihood function for complete data
Using R
i
to denote the censoring indicator, which indicates the situation of ob-
served failure time or censoring time, (1 for the event occurring (observed failure
time), 0 for censoring (observed censoring time)). Therefore, once given the cen-
soring indicator R
i
for the ith individual, the idealised likelihood function IDL
i
(
)
can be specified to the likelihood function L
i
(
), which can be written as the fol-
lowing equation:
L
i
(
)
=
U
i
{c,n,a}
pr(T
obs
i
= t
|U
i
, Z
i
;
T
)
×P r(C
obs
i
> t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 1, T
i
(z) = t,
U
i
{c,n,a}
P r(T
obs
i
> t
|U
i
, Z
i
;
T
)
×pr(C
obs
i
= t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
;
)
for R
i
= 0, C
i
(z) = t.
(4.25)
30

where P r() and pr() denote the probability function and the probability density
function, respectively. However, in practice, the compliance type U
i
cannot be
observed fully. Hence the parameter
cannot be estimated through the form of
L
i
(
) given in equation (4.25). Dempster et al. (1977) proposed an approach for
MLE of incomplete data, named the EM-algorithm, which can be employed in
our case as U
i
is considered unknown and can be treated as incomplete data.
Briefly, the EM algorithm involves two steps: the E-step and the M-step,
which denote the expectation and maximisation steps, respectively (Dempster
et al., 1977). The E-step aims to obtain the conditional expectation of the log-
likelihood function for complete data log(L
F
(
)), denoted by Q(,
k
), given the
observed data T (z), C(z), D(z) and Z = z, and given the current parameter
values
k
. More details for the EM algorithm is given in Section 6.
In the EM-algorithm, the first step requires the explication of the likelihood
form for complete data denoted by L
F
(
). This depends on the assumption that
all compliance covariates are known. We then find the related likelihood form
for the ith observation, which can be written as follows (for known compliance
covariates U
i
= u):
L
F
i
(
) = L
i
(
, u)
=
pr(T
obs
i
= t
|U
i
, Z
i
;
T
)
×P r(C
obs
i
> t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 1, T
i
(z) = t,
P r(T
obs
i
> t
|U
i
, Z
i
;
T
)
×pr(C
obs
i
= t
|U
i
, Z
i
;
C
)
×P r(U
i
|D
obs
i
, Z
i
; )
for R
i
= 0, C
i
(z) = t.
(4.26)
Compliance type U
i
= u is used to describe an individual's compliance behaviour.
31

Therefore, we have redefined the parameter of interest,
T 1
= (
T,u1
) and
T 0
=
(
T,u0
) (for u
{c, a, n}). Recall that
T
= (
T 1
,
T 0
). More specifically, the
parameter of interest associated with the failure time distribution,
T
, is redefined
as
T
= (
T,c1
,
T,n1
,
T,a1
,
T,c0
,
T,n0
,
T,a0
). This is equivalent to
T
= (
T,uz
)
(for u
{a, c, n} and z {0, 1}). Here the subscripts indicate failure time (T )
with U = u for u
{a, c, n} and assigned treatment Z = z for z {0, 1}.
Similarly, we use
C,uz
for u
{a, c, n}, and z {0, 1} to denote the compo-
nents of
C
, the parameter associated with the censoring time distribution. In
other words, we have
C
= (
C,c1
,
C,n1
,
C,a1
,
C,c0
,
C,n0
,
C,a0
).
Hence, for the specific assigned treatment Z
i
= z (with known U
i
= u), equa-
tion (4.26) can be simplified as follows:
L
i
(
, u)
=
pr(T (z)
i
= t
|U
i
, Z
i
;
T,uz
)
×P r(C(z)
i
> t
|U
i
, Z
i
;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
;
u,zd
)
for R
i
= 1, T
i
(z) = t,
P r(T (z)
i
> t
|U
i
, Z
i
;
T,uz
)
×pr(C(z)
i
= t
|U
i
, Z
i
;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
;
u,zd
)
for R
i
= 0, C
i
(z) = t.
(4.27)
where R
i
is the observed indicator of censoring, which is 1 for the event of in-
terest and 0 for censoring. The parameter
u,zd
is associated with the condi-
tional probability of compliance, given Z
i
= z and D
i
(z) = d, and is defined by
u,zd
= P r(U
i
= u
|Z
i
= z, D
i
(z) = d) for u = (a, c, n) and z, d = 0, 1. Details
about the estimation of the parameter
u,zd
will be discussed in Section 8.
According to equation (4.27), an individual randomised to treatment Z
i
= 1
who dies or is censored at (observed) time t has a contribution to the likelihood
32

as follows:
L
i
{i:Z
i
=1}
(
, u)
=
pr(T (1)
i
= t
|U
i
= u, Z
i
= 1;
T,u1
)
×P r(C(1)
i
> t
|U
i
= u, Z
i
= 1;
C,u1
)
×P r(U
i
|D
i
(1), Z
i
= 1;
u,1d
)
for R
i
= 1, T (1)
i
= t,
P r(T (1)
i
> t
|U
i
= u, Z
i
= 1;
T,u1
)
×pr(C(1)
i
= t
|U
i
= u, Z
i
= 1;
C,u1
)
×P r(U
i
|D
i
(1), Z
i
= 1;
u,1d
)
for R
i
= 0, C(1)
i
= t.
(4.28)
for d = 0 or 1 and u
{c, a, n} .
Similarly, an individual randomised to treatment Z
i
= 0 who dies or is censored
at (observed) time t has a contribution to the likelihood for
as follows:
L
i
{i:Z
i
=0}
(
, u)
=
pr(T (0)
i
= t
|U
i
= u, Z
i
= 0;
T,u0
)
×P r(C(0)
i
> t
|U
i
= u, Z
i
= 0;
C,u0
)
×P r(U
i
|D
i
(0), Z
i
= 0;
u,0d
)
for R
i
= 1, T (0)
i
= t,
P r(T (0)
i
> t
|U
i
= u, Z
i
= 0;
T,u0
)
×pr(C(0)
i
= t
|U
i
= u, Z
i
= 0;
C,u1
)
×P r(U
i
|D
i
(0), Z
i
= 0;
u,0d
)
for R
i
= 0, C(0)
i
= t.
(4.29)
for d = 0 or 1 and u
{c, n, a} .
Recall from Table 3 that knowing the values of Z
i
= z and D
i
(z) = d means
that compliance type U
i
= u can be determined for up to two types. Hence, given
the value of z and d, equations (4.28) and (4.29) can then be expressed as the
33

following form:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u)
=
pr(T (z)
i
= t
|U
i
= u, Z
i
= z;
T,uz
)
×P r(C(z)
i
> t
|U
i
= u, Z
i
= z;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
= z;
u,zd
)
for R
i
= 1, T (z)
i
= t,
P r(T (z)
i
> t
|U
i
= u, Z
i
= z;
T,uz
)
×pr(C(z)
i
= t
|U
i
= u, Z
i
= z;
C,uz
)
×P r(U
i
|D
i
(z), Z
i
= z;
u,zd
)
for R
i
= 0, C(z)
i
= t.
(4.30)
for z and d = 0 or 1 and u
{c, a, n} . Note that equation (4.30) given above is
a likelihood function of our non-ignorable PPO-survival model which is analo-
gous to the one given in O'Malley & Normand (2005) model that is for normally
distributed outcomes.
For notational convenience, we introduce the following conventions:
f
i
u
(v, t;
T,uv
) = pr(T
i
(v) = t
|U
i
= u,
T,uv
)
for v = 0 or 1,
S
i
u
(v, t;
T,uv
) = P r(T
i
(v) > t
|U
i
= u,
T,uv
)
for v = 0 or 1,
e
i
u
(v, t;
C,uv
) = pr(C
i
(v) = t
|U
i
= u,
C,uv
)
for v = 0 or 1,
G
i
u
(v, t;
C,uv
) = P r(C
i
(v) > t
|U
i
= u,
C,uv
) for v = 0 or 1.
(4.31)
As mentioned before, P r() and pr() denote the probability and probability
density function, respectively and v indicates potential treatment levels.
Equation (4.30) can then be rewritten as:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u)
= f
i
u
(z, t;
T,uz
)G
i
u
(z, t;
C,uz
)
R
i
S
i
u
(z, t;
T,uz
)e
i
u
(z, t;
C,uz
)
1-R
i
u,zd
.
(4.32)
34

Furthermore, we write:
F
i
u,zd
(
T,uz
,
C,uz
)
= f
i
u
(z, t;
T,uz
)G
i
u
(z, t;
C,uz
)
R
i
S
i
u
(z, t;
T,uz
)e
i
u
(z, t;
C,uz
)
1-R
i
.
(4.33)
Equation (4.30) can then be expressed more simply as:
L
i
{i:Z
i
=z,D
i
(z)=d}
(
; u) = F
i
u,zd
(
T,uz
,
C,uz
)
u,zd
= l
i
(
; u)l
i
(
; u)
(4.34)
Because of the independent and identically distributed (i.i.d.) assumption for
all observable variables (conditional on model parameters), the likelihood function
for complete data denoted by L
F
() = L(, u) can be expressed by the product
of the fully likelihood function associated with each individual. In other words,
we have:
L
F
(
) =
i=1
L
i
(
, u) =
i=1
F
i
u,zd
(
T,uz
,
C,uz
)
u,zd
=
i=1
l
i
(
; u)l
i
(
; u). (4.35)
Given observed Z
i
= z and D
i
(z) = d, the likelihood form of for complete
data, in which compliance type U
i
(for each individual) is presumed to be known,
35

can then be expressed as:
L
F
(
) = L(, u) =
i
L
i
(
, u)
=
i
{i:U
i
=c,Z
i
=1,D
i
(1)=1}
F
i
c,11
(
T,c1
,
C,c1
)
c,11
i
{i:U
i
=a,Z
i
=1,D
i
(1)=1}
F
i
a,11
(
T,a1
,
C,a
)
a,11
i
{i:U
i
=c,Z
i
=0,D(0)=0}
F
i
c,00
(
T,c0
,
C,c0
)
c,00
i
{i:U
i
=n,Z
i
=0,D
i
(0)=0}
F
i
n,00
(
T,n0
,
C,n0
)
n,00
i
{i:U
i
=n,Z
i
=1,D(0)=0}
F
i
n,10
(
T,n0
,
C,n0
)
n,10
i
{i:U
i
=a,Z
i
=0,D(0)=1}
F
i
a,01
(
T,a1
,
C,a1
)
a,01
.
(4.36)
4.3.3
The likelihood form for incomplete data
Since compliance type U
i
is unobservable, U
i
needs to be treated as a variable
with missing data. Hence, data associated with the case where noncompliance
occurs have to be viewed as incomplete data.
Let
A(z, d) denote the subset of subjects with specified values of Z and D;
more specifically,
A(z, d) = {i : Z
i
= z, and D
i
= d
}. The likelihood form for
incomplete data for the non-ignorable PPO-survival models can then be written
as shown below:
L(
) =
u
{c,n,a}
L(
, u) =
u
{c,n,a} i
L
i
(
, u)
=
i
A(1,1)
F
i
c,11
(
T,c1
,
C,c1
)
c,11
+ F
i
a,11
(
T,a1
,
C,a1
)
a,11
i
A(0,0)
F
i
c,00
(
T,c0
,
C,c0
)
c,00
+ F
i
n,00
(
T,n0
,
C,n0
)
n,00
i
A(1,0)
F
i
n,10
(
T,n0
,
C,n0
)
n,10
i
A(0,1)
F
i
a,01
(
T,a1
,
C,a1
)
a,01
.
(4.37)
The likelihood function of
for incomplete data, given in equation (4.37),
36

does not appear to be readily broken down into products of functions involving
only
T
and
C
, respectively. That is L(
) = L(
T
)L(
C
). That is the case where
censoring is not ignorable (Baker, 1997).
5
Simplification of model parameter
In equation (4.36) (on page 36), we observe that the parameter
has the form
= (
T,uz
,
C,uz
,
u,zd
) for u = (a, c, n) and z, d = (0, 1). However, the number of
model parameters can be further reduced by following the assumption of compound
exclusion restriction (Angrist et al., 1996; Frangakis & Rubin, 1999).
The compound exclusion restriction assumes that all potential-outcome vectors
can be affected by the received treatment D
i
(Z), rather than by the assigned
treatment Z
i
. This implies that the outcome of interest for an individual, as
long as his/her received treatment has the same value, will have the same value
when he/she is assigned into either treatment group (if possible in practice). For
convenience to express the assumption of exclusion restriction mathematically, we
use the notation T
i
(Z
i
= z, D
i
(z) = d) and C
i
(Z
i
= z, D
i
(z) = d) to denote the ith
individual's failure and censoring time where his/her assigned treatment Z
i
and
actual received treatment D
i
(Z
i
) are given by Z
i
= z and D
i
(z) = d, respectively.
Recall that in a two-arm RCT, we have z and d
{1, 0}. Hence the exclusion
restriction assumption can be expressed as follows:
T
i
(Z
i
= 1, D
i
(1) = d) = T
i
(Z
i
= 0, D
i
(0) = d)
for failure time,
C
i
(Z
i
= 1, D
i
(1) = d) = C
i
(Z
i
= 0, D
i
(0) = d)
for censoring time.
(5.1)
This also implies that individuals who are never-takers or always-takers would
have the same failure and censoring times probability distributions, whether they
37

are in the treated group or in the control group. In other words:
P r(T
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 0, U
i
= u)
= P r(T
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 1, U
i
= u)
for u = a or n;
P r(C
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 0, U
i
= u)
= P r(C
i
(Z
i
, D
i
(Z
i
))
|Z
i
= 1, U
i
= u)
for u = a or n.
(5.2)
For convenience, in the remainder of this thesis, we use T
i
(Z
i
) and C
i
(Z
i
) to
denote the failure and censoring times corresponding to the assigned treatment for
the ith individual, whose assigned treatment is Z
i
= z and whose actual received
treatment is D
i
(z) = d. Equation (5.2) can then be re-written as:
P r(T
i
(0)
|Z
i
= 0, U
i
= u)
= P r(T
i
(1)
|Z
i
= 1, U
i
= u)
for u = a or n;
P r(C
i
(0)
|Z
i
= 0, U
i
= u)
= P r(C
i
(1)
|Z
i
= 1, U
i
= u)
for u = a or n.
(5.3)
Consequently, the parameters associated with the outcome distributions (fail-
ure and censoring), which correspond to the subgroup with U
i
= n and Z
i
= 1 or
0, and to subgroup with U
i
= a and Z
i
= 1 or 0, are equivalent. Hence, we have
the following relationships:
T,n1
=
T,n0
T,a1
=
T,a0
C,n1
=
C,n0
C,a1
=
T,a0
.
(5.4)
Therefore we can use
T,n
to represent both
T,n1
and
T,n0
, as they are identical
under the assumption of exclusion restriction. Similarly, the two parameters
T,a1
and
T,a0
are identical under the same assumption. So we can use
T,a
to denote
38

both
T,a1
and
T,a0
. By following the same logic, we write
C,n
=
C,n1
=
C,n0
and
C,a
=
C,a1
=
C,a0
.
Equation (5.4) implies that the parameters
T,uz
and
C,uz
, which are associated
with the failure and censoring times for the subgroup with U
i
= u and assigned
treatment Z
i
= z, include the following components:
T,uz
= (
T,c1
,
T,c0
,
T,n
,
T,a
)
C,uz
= (
C,c1
,
C,c0
,
C,n
,
C,a
).
(5.5)
Note that no parameter for the subgroup with U
i
= d and Z
i
= 1 or 0 exists,
because under the assumption of Monotonicity, no defiers exist.
Thereby, because we have assumed monotonicity, up to two categories exist
for compliance type U
i
= u (knowing the value of Z
i
and D
i
(Z
i
)) (Table 3).
The parameter
u,zd
= P r(U
i
= u
|Z
i
= z, D
i
(z) = d), which is defined as the
conditional probability of compliance type U
i
given the assigned treatment Z
i
= z
and actual received treatment D
obs
= d, can now satisfy the following equations:
c,11
+
a,11
= 1 for u
{c, a} when Z
i
= 1 and D
i
(1) = 1,
c,00
+
n,00
= 1 for u
{c, n} when Z
i
= 0 and D
i
(0) = 0,
n,10
= 1
for u
{n}
when Z
i
= 1 and D
i
(1) = 0,
a,01
= 1,
for u
{a}
when Z
i
= 0 and D
i
(0) = 1.
(5.6)
The other
u,zd
for u
{c, n, a} and z, d = 1 or 0 have zero values.
Now the model parameters of interest
can then be specified as = (, ),
where
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
C,c1
,
C,c0
,
C,n
,
C,a
),
= (
c,11
,
c,00
).
(5.7)
In addition, using
u
to denote the probability of compliance type U
i
= u, i.e.
39

u
= P r(U
i
= u), and by following the theorem of total probability, we have:
P r(U
i
= u) =
d
{1,0} z{1,0}
P r(U
i
= u, D
i
(z) = d, Z
i
= z)
=
d
{1,0} v{1,0}
P r(U
i
= u
|D
i
(z) = d, Z
i
= z)P r(D
i
(z) = d, Z
i
= z).
(5.8)
Recall that
u,zd
= P r(U
i
= u
|D
i
(z) = d, Z
i
= z). Denote
zd
= P r(D
i
(z) =
d, Z
i
= z) to correspond to the probability of individuals with values (Z
i
=
z, D
i
(z) = d)) in the population. Then equation (5.8) can be re-written as:
u
=
d
{1,0} z{1,0}
u,zd
zd
.
(5.9)
By following this definition,
zd
can then be estimated by
zd
=
N
zd
N
. Here
N
zd
is the size of the subgroup in a trial where all individuals have Z
i
= z and
D
i
(z) = d, and N is the total number of individuals involved in the study trial.
Furthermore, we have P r(U
i
= u
|Z
i
= 1) = P r(U
i
= u
|Z
i
= 0) as randomisation
guarantees a balance between the treated (Z
i
= 1) and the control (Z
i
= 0)
groups. Therefore we can write
u
= P r(U
i
= u
|Z
i
= 1) = P r(U
i
= u
|Z
i
= 0).
Since the number of subjects who do or do not actually receive the treatment
in the treatment group (Z
i
= 1) are observable (denoted by N
11
and N
10
respec-
tively), we can then estimate
n
by
n
=
N
10
N
1
. If always-takers exist in a trial,
then the number of subjects N
01
in the control group who receive the treatment
is known. Then we have
a
=
N
01
N
0
. Also
c
+
a
+
n
= 1 because no defiers exist.
Then the probability of compliers can be estimated by
c
= 1
-
a
-
n
(O'Malley
40

& Normand, 2005). These can be expressed as:
n
=
N
10
N
1
;
a
=
N
01
N
0
;
c
= 1
-
a
-
n
.
(5.10)
Recall that individuals from the subgroup with Z
i
= 1 and D
i
(1) = 1 can
be written as i
A(1, 1). Their compliance type can be compliers, U
i
= c, or
always-takers, U
i
= a. Similarly, individuals from the subgroup with Z
i
= 0
and D
i
(0) = 0, denoted by i
A(0, 0), may either be compliers, U
i
= c, or
never-takers, U
i
= n. Then, following the definition of
u,zd
and by Table 3, we
have:
c,11
=
c
c
+
a
,
c,00
=
c
c
+
n
.
(5.11)
By following equation (4.35) and considering the relationships given in equa-
tion (5.11), the likelihood function L
F
(
) with respect to , can be replaced by
a product of l
F
(
) and l
F
(
). That is:
L
F
(
) = l
F
(
)l
F
(
),
(5.12)
where
l
F
(
) =
i=1
F
i
u,zd
(
T,uz
,
C,uz
),
l
F
(
) =
i=1
u,zd
.
(5.13)
Therefore, to maximise L
F
(
) is equivalent to maximise l
F
(
) and l
F
(
), sepa-
rately.
Because of the relationships given equation (5.11), which implies that
c,11
and
41

c,00
can be calculated directly through the value of
u
for u
{c, n, a}, we then
write:
= (
c
,
n
,
a
).
(5.14)
Therefore, to estimate parameter
= (, ) is equivalent to estimating (,
).
As we stated before, the MLE of
can be estimated by equation (5.10). Hence
we need only to maximise l
F
(
) for obtaining the MLE of . The EM algorithm is
employed here because an unobservable variable, compliance type U
i
, is involved
in the likelihood function l
F
(
).
In other words, there is only the first part of parameter
, denoted by need
to be further estimated through the EM algorithm. That is:
= (
T,uz
C,uz
);
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
C,c1
,
C,c0
,
C,n
,
C,a
).
(5.15)
The rest part of
, the parameter
can be regarded as a known parameter
when we estimate the parameter
. As a consequence, the parameter is also
known because of equation (5.11). For simplification of notation, in the remains
of this chapter we still use
rather than
to refer to the rest part of
.
6
EM algorithm
The EM algorithm is a generally applicable algorithm that provides an iterative
procedure for computing MLEs in such situations. Indeed, the EM algorithm is
a commonly used method of maximum likelihood to deal with incomplete data
(Laird & Louis, 1982; Baker, 1992; Meng & Rubin, 1991, 1993; Imbens & Rubin,
1997; O'Malley & Normand, 2005; Peng et al., 2004).
In general, the likelihood method requires the score equation to be solved.
42

Unfortunately, the likelihood function of
for incomplete data, as given in equa-
tion (4.37), cannot be formulated. This is because, at least for those individuals
assigned to the control group, their compliance type, U
i
, will never be observed.
The EM algorithm approaches this problem (of solving the incomplete data likeli-
hood score function) indirectly by proceeding iteratively in terms of the complete
data log likelihood function of
, denoted by log(L
F
(
)) (equation (5.12)).
6.1
The expectation step
According to Dempster et al. (1977), as introduced in Section 4.3.2, the first
step in the EM algorithm is to obtain the conditional expectation of log(L
F
(
)),
the Q-function Q(
,
k
). Now it is replaced by the conditional expectation of
log(l
F
(
)), the Q-function Q(,
k
). In other words:
Q(
,
k
) = E
u
{log(l
F
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d), U
i
= u, ;
k
},
(6.1)
where u
{c, n, a} and l
F
(
) denotes the likelihood function with respect to
parameter
for the complete data that is given in equation (5.13). (T
i
, C
i
)
obs
indicates the observable part of survival outcomes (T
i
, C
i
), which is (T
obs
i
=
t, C
obs
i
> t) if the ith individual experienced the event at time t; or (T
obs
i
>
t, C
obs
i
= t) if the ith individual is censored at time t.
Note that equations (5.11) and (5.10) have shown that
is only determined by
the observed values of subgroup sizes, which are denoted by N
1
, N
10
, N
0
, and N
01
.
This implies that for a certain dataset, L
F
(
) would be a constant because in the
dataset N
1
, N
10
, N
0
, and N
01
are fixed. Hence, to maximise l
F
(
) is equivalent
to maximise L
F
(
|) = l
F
(
)l
F
(
) = Kl
F
(
) when is known. (K refers to
any constant here). So we give the likelihood function with respect to parameter
43

as l
F
(
) = L
F
(
|), which has the same structure as L
F
(
), the likelihood
function of
given in equation (4.36), but is viewed as a known vector. The
advantage of using this form of the likelihood function l
F
(
) = L
F
(
|) will be
seen in Chapter 7 when we create an extended PPO-survival model for a more
complicated situation in time-to-event study.
As compliance type, U
i
, can be treated as a discrete random variable with
c, a, n (three states), then, by the definition of expectation of discrete random
variables, equation (6.1) can be expressed as follows:
Q
i
(
,
k
)
=
u
{c,n,a}
P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
)
{log(l(, u)}
(6.2)
where l(
, u) is another form of l
F
(
), the likelihood function with respect to
for complete data, we give its form as follows:
l
F
(
) = L(, u|),
(6.3)
because L
F
(
) = L(, u).
In addition, we can further assume that compliance type, U
i
, follows the
Bernoulli distribution, given observed data T
i
(z), C
i
(z), D
i
(z).
This is true
because of the assumptions of compound exclusion restriction and monotonicity,
given (Z
i
= z, D
i
(z) = d). Recall that compliance type has up to two compli-
ance categories for this development (refer to Table 3). More details about the
structure of the Q-function Q(
,
k
), the expectation function, are given Section
8.2.
44

6.2
The maximisation step
After obtaining the formulation of Q-function Q(
,
k
) given the current param-
eter values
k
, the second step of the EM algorithm is to find a new estimate,
k+1
, which maximizes the Q-function Q(
,
k
). This implies that the following
inequality is always satisfied:
Q(
k+1
,
k
)
Q(,
k
)
for all
,
(6.4)
where
is the parameter space of
.
Further information on the process of maximisation for the non-ignorable
PPO-survival model is given in Section 8.3.
7
Model specifications
For the purpose of showing how our non-ignorable models work, we focus mainly
on the Weibull model as the survival distribution. This is due to its special fea-
tures. Firstly, the Weibull distribution is flexible for many different patterns of
data. This can be seen from its varying types of hazard curve (Klein, 1997; Cox
& Oakes, 1984). Secondly, the Weibull distribution is associated with relatively
simple survival, hazards and probability density functions (p.19, Cox & Oakes
(1984)). This makes the Weibull a very popular parametric model in conventional
survival analysis. Furthermore, the Weibull distribution is the only distribution
which satisfies both the accelerated failure time (AFT) and the Cox proportional
hazard (PH) models (Cox & Oakes, 1984). The log-normal model is another plau-
sible parametric distribution used in conventional survival analysis. We also spec-
ify the log-normal distribution for the non-ignorable PPO-survival model (Section
7.1). In this thesis, our parametric potential-outcome (PPO) models are specified
45

in terms of the Weibull and the log-normal distributions. Other distributions are
the topic of future research.
7.1
Specific non-ignorable PPO-survival models
When accounting for a censoring time distribution, the non-ignorable survival
model can be specified in more than two forms. This is true, even when the
Weibull and the log-normal are the only pre-specified distributions. Note that
failure and censoring times may follow different types of distribution. For instance,
suppose that the failure time follows a Weibull distribution and its associated
censoring time has an underlying log-normal distribution. In this case, the non-
ignorable survival model has to be specified as a NIGN-WL model; namely a
non-ignorable survival Weibull log-normal model.
However, when failure and
censoring times follow the log-normal and the Weibull distribution, respectively,
we have the NIGN-LW model. In this thesis, we focus, however, on specific non-
ignorable survival models with the same distributions for both the failure and
censoring times. These are clearly denoted as the NIGN-WW model and the
NIGN-LL model. Table 4 gives the nomenclature for the non-ignorable survival
model specifications.
Table 4: Specification of the non-ignorable PPO-survival models.
Distribution
non-ignorable survival model
Weibull
NIGN-WW model
log-normal
NIGN-LL model
46

7.2
Parameter of interest
in the specific models
The parameter
= (
T,uz
,
C,uz
) is associated with failure and censoring time
distributions, respectively, corresponding to four subgroups each. In total,
in-
cludes 8
× 2 = 16 components. This is true, as both a Weibull and a log-normal
distribution include two model parameters: the scale parameter and the shape
parameter for the Weibull; the mean and the variance of the logarithms of
survival outcome for the log-normal.
7.2.1
Parameter
in the NIGN-WW model
Suppose failure time T
0i
, corresponding to the baseline, follows a Weibull distribu-
tion with parameter
0
T
and
0
T
. By following Theorem 1 (proved in Appendix 10),
the parameter of most interest,
T
=
T,uz
, which is associated with the distribu-
tions of failure time and is specified in equation (5.15) (the first four parameters
corresponding to 4 possible subgroups), has the following relationships with the
parameters associated with the standard distribution for the baseline failure time,
0
T
and
0
T
, and the regression coefficients in AFT model, which are:
T,c1
= (
T,c1
,
T,c1
);
where
T,c1
=
0
T
exp(
(c)
+
(c)
) and
T,c1
=
0
T
,
T,c0
= (
T,c0
,
T,c0
);
where
T,c0
=
0
T
exp(
(c)
) and
T,c0
=
0
T
,
T,n
= (
T,n
,
T,n
);
where
T,n
=
0
T
and
T,n
=
0
T
,
T,a
= (
T,a
,
T,a
);
where
T,a
=
0
T
exp(
(a)
) and
T,c0
=
0
T
.
(7.1)
Similarly, we suppose that the censoring time C
0
, corresponding to the base-
line, also follows a Weibull distribution with parameters
0
C
and
0
C
. In this case,
47

C
=
C,uz
) can be specified as follows:
C,c1
= (
C,c1
,
C,c1
);
where
C,c1
=
0
C
exp(
(c)
+
(c)
) and
C,c1
=
0
C
,
C,c0
= (
C,c0
,
C,c0
);
where
C,c0
=
0
C
exp(
(c)
) and
C,c0
=
0
C
,
C,n
= (
C,n
,
C,n
);
where
C,n
=
0
C
and
C,n
=
0
C
,
C,a
= (
c,a
,
C,a
);
where
C,a
=
0
C
exp(
(a)
) and
C,c0
=
0
C
.
(7.2)
Consequently, the parameter
given in equations (5.15) for the NIGN-WW
model can be written as follows:
= (
T,uz
,
C,uz
),
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
).
(7.3)
Here
T
=
0
T
and
C
=
0
C
denote the shape parameters of the Weibull models
associated with the failure and censoring times (which correspond to the baseline),
respectively. The number of components involved in is now reduced from 16
to 10. Note that equations (7.1) and (7.2) show that these scale parameters
(corresponding to the four subgroups), namely
T,c1
,
T,c0
,
T,n
,
T,a
, have been
linked to
0
T
( the scale parameter in the Weibull distribution, associated with
the failure time for the baseline), through
(c)
,
(c)
,
(a)
, the regression coefficients
in the AFT model. However, we use the scale parameters associated with the
four subgroups, rather than the AFT regression coefficients because at this stage,
the former clearly describe model parameters in which each subgroup (either for
failure time or censoring time) has its own scale parameter and these subgroups
share the same shape parameter. Recall that these four subgroups are as follows:
Subgroup 1 and Subgroup 2 consist of compliers who are all assigned to the treated
group (U
i
= c and Z
i
= 1) or the control group (U
i
= c and Z
i
= 0), respectively;
48

Subgroup 3 includes never-takers (U
i
= a) only; and Subgroup 4 includes always-
takers (U
i
= a).
7.2.2
Parameter
in the NIGN-LL model
As in the NIGN-WW model, the parameter
given in equation (5.15) for the
NIGN-LL model can then be specified as follows:
= (
T,uz
,
C,uz
),
= (
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
).
(7.4)
The parameters
T
and
C
, given in equations (7.4), denote the variance pa-
rameters of the logarithm of the failure time log(T
0i
) (
(2)
T
) and the logarithm
of the censoring times log(C
0i
) (
(2)
C
) associated with the baseline
T
=
0
T
and
C
=
0
C
By following Theorem 1,
given in Appendix 10, the parameters associated with the failure time distribution
satisfy the following equations:
T,c1
= (
T,c1
,
T,c1
)
where
T,c1
=
0
T
+
(c)
+
(c)
and
T,c1
=
0
T
,
T,c0
= (
T,c0
,
T,c0
)
where
T,c0
=
0
T
+
(c)
and
T,c0
=
0
T
,
T,n
= (
T,n
,
T,n
)
where
T,n
=
0
T
and
T,n
=
0
T
,
T,a
= (
T,a
,
T,a
)
where
T,a
=
0
T
+
(a)
and
T,c0
=
0
T
.
(7.5)
Parameters associated with the censoring time distribution have the following
49

relationships:
C,c1
= (
C,c1
,
C,c1
)
where
C,c1
=
C
0
+
(c)
+
(c)
and
C,c1
=
0
C
,
C,c0
= (
C,c0
,
C,c0
)
where
C,c0
=
C
0
+
(c)
and
C,c0
=
0
C
,
C,n
= (
v,n
,
C,n
)
where
C,n
=
0
C
and
C,n
=
0
C
,
C,a
= (
C,a
,
C,a
)
where
C,a
=
0
C
+
(a)
and
C,c0
=
0
C
.
(7.6)
Table 5 shows the parameters involved in the NIGN-WW and the NIGN-LL
models.
Table 5: Non-ignorable PPO-survival model parameter vectors.
Model name
model parameter vector
NIGN-WW
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
NIGN-LL
T,c1
,
T,c0
,
T,n
,
T,a
,
T
,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
8
Computational issues
Since the compliance type variable, U
i
= u, is unobservable, those general MLE
methods are not easily implemented for the PPO-survival model. Alternatively,
as mentioned before (Section 6), the EM algorithm (Dempster et al., 1977) can be
employed for our developments. Since the posterior probabilities of compliance
type U
i
, P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
), have been involved in the
Q-function, Q(
,
k
), (equation (6.2)), we need to formulate it first before giving
the specified Q-function Q(
,
k
).
50

8.1
Maximization of the likelihood function using the EM
algorithm
8.1.1
Posterior probabilities of compliance type U
i
For convenience of expression, we use P
i
ac
, P
i
nc
, P
i
n
and P
i
a
to denote the related
posterior probabilities for non-censored R
i
= 1 associated with the following cases:
(1) Z
i
= 1, D
i
(1) = 1; (2) Z
i
= 0, D
i
(0) = 0; (3) Z
i
= 1, D
i
(1) = 0; and (4)
Z
i
= 0, D
i
(0) = 1. Note that in the first two cases given above, compliance type
U
i
can possible be two categories, say compliers (U
i
= c) or always-takers (U
i
= a)
for case (1) and compliers (U
i
= c) or never-takers (U
i
= n) for case (2); while in
case (3), compliance type U
i
can only be never-takers (U
i
= n); and in case (4),
compliance type U
i
can only be always-takers (U
i
= a) (Table 3). This is because
of the monotonicity assumption, which excludes defiers.
So we define P
i
ac
, P
i
nc
, P
i
n
and P
i
a
as follows:
P
i
ac
= P r(U
i
= c
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 1;
k
)
for R
i
= 1 ,
P
i
nc
= P r(U
i
= c
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 0;
k
)
for R
i
= 1 ,
P
i
n
= P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 0;
k
)
for R
i
= 1 ,
P
i
a
= P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 1;
k
)
for R
i
= 1 .
(8.1)
Similarly, we use Q
i
ac
, Q
i
nc
, Q
i
n
and Q
i
a
to denote the related posterior probabilities
for censored R
i
= 0 associated with the aforementioned four cases, they are defined
as the following:
Q
i
ac
= P r(U
i
= c
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 1;
k
)
for R
i
= 0 ,
Q
i
nc
= P r(U
i
= c
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 0;
k
)
for R
i
= 0,
Q
i
n
= P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
(1) = 0;
k
)
for R
i
= 0,
Q
i
a
= P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
(0) = 1;
k
)
for R
i
= 0 .
(8.2)
51

Note that when the event occurs (non-censored and denoted by R
i
= 1), the
observed data includes: failure time T
i
(z), censoring time (C
i
(z) > T
i
(z)), assigned
treatment Z
i
= z and actual received treatment D
i
(z), and when an individual is
censored (denoted by R
i
= 0), the observed data includes: censoring time C
i
(z),
censored failure time T
i
(z) > C
i
(z), assigned treatment Z
i
and actual received
treatment D
i
(z).
The formulations for the posterior probabilities corresponding to the last two
cases, namely Z
i
= 1, D
i
(1) = 0 ; and Z
i
= 0, D
i
(0) = 1, are as follows:
P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
= 0;
k
) = 1;
for R
i
= 1;
P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
= 1;
k
) = 1;
for R
i
= 1;
P r(U
i
= n
|(T
i
(1), C
i
(1))
obs
, Z
i
= 1, D
i
= 0;
k
) = 1;
for R
i
= 0;
P r(U
i
= a
|(T
i
(0), C
i
(0))
obs
, Z
i
= 0, D
i
= 1;
k
) = 1;
for R
i
= 0.
(8.3)
This is true, because no defiers exist and by the definition of compliance type
U
i
, we know that an individual is a never-taker if he/she is assigned to the treat-
ment group Z
i
= 1 but does not receive the treatment D
i
(1) = 0 or is an always-
taker if the individual is assigned to the control group Z
i
= 0 but for some reason
does receive the treatment.
Equation (8.3) gives the following equivalences:
P
i
n
= 1
for i
C(1, 0, 1),
Q
i
n
= 1
for i
C(1, 0, 0),
P
i
a
= 1
for i
C(0, 1, 1),
Q
i
a
= 1
for i
C(0, 1, 0).
(8.4)
where
C(z, d, r) denotes the subset of individuals, which is defined by the following:
C(z, d, r) = {i : Z
i
= z, D
i
= d, R
i
= r
} for all z, d, r {1, 0}. Now, we
52

formulate the posterior probability of compliance type U
i
for the first two cases,
i.e. Z
i
= 1 and D
i
(1) = 1, and Z
i
= 0 and D
i
(0) = 0. Since both failure and
censoring times are involved in the non-ignorable PPO-survival model, so too
are their posterior probabilities. As per Bayes' theorem, these can be written as
the formulae given in equations (8.5) and (8.6) (see next page). For the non-
ignorable PPO-survival model, P
i
ac
, P
i
nc
, Q
i
ac
and Q
i
nc
can then be written as the
equivalences shown in equation (8.7) on page 55.
53

Pr
(U
i
=
u|
(T
i
(1)
,C
i
(1))
obs
,Z
i
=1
,D
i
(1)
=
1
;
k
)
=
f
u
(1
,t
;
T,
u
1
)G
u
(1
,t
;
C,
u
1
)
u,
11
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
+
f
a
(1
,t
;
T,
a
1
)G
a
(1
,t
;
C,
a
1
)
a,
11
,if
R
i
=1
,T
i
(1)
=
t;
Pr
(U
i
=
u|
(T
i
(1)
,C
i
(1))
obs
Z
i
=1
,D
i
(1)
=
1
;
k
)
=
S
u
(1
,t
;
T,
u
1
)e
u
(1
,t
;
C,
u
1
)
u,
11
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
+
S
a
(1
,t
;
T,
a
1
)e
a
(1
,t
;
C,
a
1
)
a,
11
,if
R
i
=0
,C
i
(1)
=
t;
(8.5)
for
u
{
c,
a}
;
and
Pr
(U
i
=
u|
(T
i
(0)
,C
i
(0))
obs
,Z
i
=0
,D
i
(0)
=
0
;
k
)
=
f
u
(0
,t
;
T,
u
0
)G
u
(1
,t
;
C,
u
0
)
u,
00
f
c
(0
,t
;
C,
c
0
)G
c
(0
,t
;
C,
c
0
)
c,
00
+
f
n
(0
,t
;
T,
n
0
)G
n
(0
,t
;
C,
n
0
)
n,
00
,
if
R
i
=1
,
T
i
(0)
=
t;
Pr
(U
i
=
u|
(T
i
(0)
,C
i
(0))
obs
,Z
i
=0
,D
i
(0)
=
0
;
k
)
=
S
u
(0
,t
;
T,
u
0
)e
u
(0
,t
;
C,
u
0
)
u,
00
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
+
S
n
(0
,t
;
T,
n
0
)e
n
(0
,t
;
C,
n
0
)
n,
00
,
if
R
i
=0
,C
i
(0)
=
t;
(8.6)
for
u
{
c,
n}
.
54

P
i
ac
=
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
f
c
(1
,t
;
T,
c
1
)G
c
(1
,t
;
C,
c
1
)
c,
11
+
f
a
(1
,t
;
T,
a
1
)G
a
(1
,t
;
C,
a
1
)
a,
11
;
for
iC
(1
,1
,1)
P
i
nc
=
f
c
(0
,t
;
T,
c
0
)G
c
(1
,t
;
C,
c
0
)
c,
00
f
c
(0
,t
;
T,
c
0
)G
c
(0
,t
;
C,
c
0
)
c,
00
+
f
n
0
(0
,t
;
T,
n
0
)G
n
0
(0
,t
;
C,
n
0
)
n,
00
;
for
iC
(0
,0
,1)
Q
i
ac
=
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
S
c
(1
,t
;
T,
c
1
)e
c
(1
,t
;
C,
c
1
)
c,
11
+
S
a
(1
,t
;
T,
a
1
)e
a
(1
,t
;
C,
a
1
)
a,
11
;
for
iC
(1
,1
,0)
Q
i
nc
=
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
S
c
(0
,t
;
T,
c
0
)e
c
(0
,t
;
C,
c
0
)
c,
00
+
S
n
(0
,t
;
T,
n
0
)e
n
(0
,t
;
C,
n
0
)
n,
00
;
for
iC
(0
,0
,0)
.
(8.7)
55

For convenience, we use W
i
(u,
k
) to denote the posterior probability of com-
pliance type P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d,
k
) given the observed data
(and parameter values) for the non-ignorable PPO-survival model. That is:
W
i
(u,
k
) = P r(U
i
= u
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d;
k
),
(8.8)
where (T
i
, C
i
)
obs
is same as given in equation (6.1), and indicates the observable
part of survival outcomes (T
i
, C
i
). (T
i
, C
i
)
obs
is equivalent to T
obs
i
= T
i
(z) given
Z
i
= z, when R
i
= 1; (T
i
, C
i
)
obs
is equivalent to C
obs
i
= C
i
(z) given Z
i
= z, when
R
i
= 0; and
k
represents the given value of the relevant model parameters, the
components of which are:
(
T,c1
,
T,c0
,
T,n
,
T,a
, ,
C,c1
,
C,c0
,
C,n
,
C,a
,
C
)
(8.9)
for the NIGN-WW model and
(
T,c1
,
T,c0
,
T,n
,
T,a
, ,
C,c1
,
C,c0
,
C,n
,
C,a
,
c
)
(8.10)
for the NIGN-LL model.
The posterior probability W
i
(u,
k
) of the ith individual can then be written
56

as follows:
W
i
(u,
k
) =
(P
i
ac
)
I
{U
i
=c}
(1
- P
i
ac
);
I
{U
i
=a}
for i
C(1, 1, 1),
(P
i
nc
)
I
{U
i
=c}
(1
- P
i
nc
);
I
{U
i
=n}
for i
C(0, 0, 1),
(Q
i
ac
)
I
{U
i
=c}
(1
- Q
i
ac
);
I
{U
i
=a}
for i
C(1, 1, 0),
(Q
i
nc
)
I
{U
i
=c}
(1
- Q
i
nc
);
I
{U
i
=n}
for i
C(0, 0, 0),
P
i
n
= 1;
for i
C(1, 0, 1),
Q
i
n
= 1;
for i
C(1, 0, 0),
P
i
a
= 1;
for i
C(0, 1, 1),
Q
i
a
= 1;
for i
C(0, 1, 0);
(8.11)
where I
{} is an indicator function, and P
i
ac
, P
i
nc
, Q
i
ac
and Q
i
nc
denote the pos-
terior probabilities for the specific cases, given by equation (8.7) for the non-
ignorable PPO-survival model. Recall that
C(z, d, r) denotes the subset of indi-
viduals
C(z, d, r) = {i : Z
i
= z, D
i
= d, R
i
= r
} for all z, d, r {1, 0}.
8.2
Structures of the expectation function Q(
,
k
)
After defining the posterior probability, W
i
(u,
k
), which is given in equation
(8.11), the expectation equation, Q-function Q(
,
k
) specified in equation (6.2)
can then be re-written as:
Q
i
(
,
k
) =
u
{c,n,a}
W
i
(u,
k
)
{log(l
F
(
)}
=
u
{c,n,a}
W
i
(u,
k
)
{log(l(, u)},
(8.12)
where l
F
(
) is given in equation (4.36).
57

After substituting equations (8.7), (8.11) and (4.36) into equation (8.12),
Q(
,
k
) can be specified as in equation (8.13) overleaf.
58

Q
(
,
k
)=
i
Q
i
(
,
k
)=
iC
(1
,1
,1)
{
lo
g
(f
i
c
(1
,t
;
T,
c
1
)G
i
c
(1
,t
;
C,
c
1
)
c,
11
)
P
i
ac
+
lo
g
(f
i
a
(1
,t
;
T,
a
)G
i
a
(1
,t
;
C,
a
)
a,
11
)(
1
-
P
i
ac
)}
iC
(1
,1
,0)
{
lo
g
(S
i
c
(1
,t
;
T,
c
1
)e
i
c
(1
,t
;
C,
c
1
)
c,
11
)
Q
i
ac
+
lo
g
(S
i
a
(1
,t
;
T,
a
)e
i
a
(1
,t
;
C,
a
)
a,
11
)(
1
-
Q
i
ac
)}
iC
(0
,0
,1)
{
lo
g
(f
i
c
(0
,t
;
T,
c
0
)G
i
c
(0
,t
;
C,
c
0
)
c,
00
)
P
i
nc
+
lo
g
(f
i
n
(0
,t
;
T,
n
)G
i
n
(0
,t
;
C,
n
)
n,
00
)(
1
-
P
i
nc
)}
iC
(0
,0
,0)
{
lo
g
(S
i
c
(0
,t
;
T,
c
0
)e
i
c
(0
,t
;
C,
c
0
)
c,
00
)
Q
i
nc
+
lo
g
(S
i
n
(0
,t
;
T,
n
)e
i
n
(0
,t
;
C,
n
)
n,
00
)(
1
-
Q
i
nc
)}
iC
(1
,0
,1)
lo
g
(f
i
n
(1
,t
;
T,
n
)G
i
n
(1
,t
;
C,
n
)
n,
10
)
iC
(1
,0
,0)
lo
g
(S
i
n
(1
,t
;
T,
n
)e
i
n
(1
,t
;
C,
n
)
n,
10
)
iC
(0
,1
,1)
lo
g
(f
i
a
(0
,t
;
T,
a
)G
i
a
(0
,t
;
C,
a
)
a,
01
)
iC
(0
,1
,0)
lo
g
(S
i
a
(0
,t
;
T,
a
)e
i
a
(0
,t
;
C,
a
)
a,
01
)
.
(8.13)
59

8.3
Maximization of the expectation function Q(
,
k
)
As stated in Section 6, the M-step aims to maximise the expectation function
given in equation (8.13). To realise this goal, the score equation (McLachlan &
Krishnan, 1997; Casella & Berger, 2002) is often used in the M-step. Specifically,
for the ith individual, we take the first partial derivative of Q
i
(,
k
), with respect
to the parameter in equation (8.13) for the non-ignorable PPO model. Then
the i
th
score equation is:
S
obs
i
(
) =
Q
i
(
,
k
) =
u
{c,n,a}
W
i
(u,
k
)
logl
i
(
, u).
(8.14)
As before, let S
i,
(u,
) denote
logl
i
(, u). Then the score equation for the
i
th
individual can be expressed as:
S
obs
i
(
) =
u
W
i
(u,
k
)S
i,
(u,
).
(8.15)
All parameters can then be estimated simultaneously by solving:
n
i
S
obs
i
(
) = 0.
(8.16)
Let
(k+1)
denote the solution of equation (8.16). We then substitute
(k)
in
Q(
,
(k)
) with
(k+1)
and repeat the M-step until convergence.
The models proposed in this thesis use a built-in function in MATLAB (7.4.0,
R2007a) (Higham & Higham, 2000). This function, fminsearch, is based on
the Nelder-Mead method (Lagarias et al., 1998) for multidimensional uncon-
strained minimisation.
We use it to minimise the negative expectation func-
tion
-Q(,
(k)
), which is equivalent to maximising the expectation function
Q(
,
(k)
).
60

8.4
Estimation of the standard errors
8.4.1
The standard error for the estimated model parameter
The EM algorithm relies only on the convergence of iterative procedures, in which
no gradient matrices are derived. As such, the EM algorithm does not provide the
information matrix required for traditional calculation of the asymptotic variance-
covariance matrix of the MLEs, from either the expected information matrix or
the observed information matrix (OIM). Consequently, no standard errors of the
MLEs can be obtained, even after possible adaptation of the EM algorithm Louis
(1982), however, proposed an approach for calculating the observed information
matrix (OIM) when the EM algorithm is used to find MLEs for so-called incom-
plete data problems. Louis's approach (Louis, 1982) only requires computation
of the complete-data gradient vector or computation of the second derivative ma-
trix, (which is not associated with the incomplete-data likelihood function). After
obtaining the OIM, the covariance matrix can then be calculated easily. Only
the inverse of the OIM is then required. The standard error of the MLEs vectors
can thus be obtained easily, in that the standard error is the square root of the
diagonals of the covariance matrix, where the latter is the inverse of the observed
information matrix (OIM).
Louis's method
1
Let Data
f ull
and Data
obs
denote the complete (full) and incomplete (observed)
data, respectively. The former includes the observed time (T, C)
obs
, (which is
equivalent to (Y (z), R(z)) in survival analysis, where Y (z) = min(T (z), C(z))
and R(z) is the censoring indicator), assigned treatment Z = z, received treat-
1
Equation (3.2 ) in Louis (1982) is strictly not correct. We believe that the rule of a non-
exchangeable vector order in the product was ignored. Details of our correction is given in
Appendix 10.
61

ment D(z) and the compliance type U . Thereby Data
f ull
= ((T, C)
obs
, Z =
z, D(z), U ). As compliance type U is unobservable, we have Data
obs
= ((T, C)
obs
, Z =
z, D(z)). By following equation (3.3) in Louis (1982), the observed information
matrix (OIM), I
obs
= I(^
), which can be viewed as the matrix at the MLEs,
where matrix I can be expressed in two parts: the conditional expected full data
OIM, I
f ull
, and the expected information for the conditional distribution of the
complete data I
f ull
|obs
. This is represented as follows:
I
obs
= I(^
) = I
f ull
(^
) - I
f ull
|obs
(^
).
(8.17)
Under the independent and identically distributed (i.i.d) assumption, these
two matrices I
f ull
and I
f ull
|obs
can be specified as:
I
f ull
= E
u
{-
n
i=1
2
2
logL
i
(
, u)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
} (8.18)
I
f ull
|obs
=
i
E
u
logl
F
i
(
)
T
logl
F
i
(
) |(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
+
i=j
E
u
logl
F
i
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
T
×
E
u
logl
F
j
(
)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d, U
i
= u
(8.19)
where
T
denotes transpose. Consequently, the observed information matrix I
obs
62

can be expressed
2
, after correcting the classical method of Louis (1982) as:
I
obs
= I(^
)
=
n
i=1
E
u
-
2
2
logl
i
(
, u)|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
|=^
-
i
E
u
logl
i
(
, u)
T
logl
i
(, u)
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
|=^
-
i=j
E
u
logl
i
(, u)
|(T
i
, C
i
)
obs
, Z
i
= z, D
i
(z) = d
T
×
E
u
logl
j
(, u)
|(T
j
, C
j
)
obs
, Z
j
= z, D
j
(z) = d
|=^
.
(8.20)
Equation (8.20) (also see equation (D-28) in Appendix10) guarantees that the OIM
is symmetric. (Details are given on page 63.) As the inverse of I
obs
is the asymptotic
variance-covariance matrix of the MLEs, ^
, this can be denoted as V(^) = I
-1
(^
). Its
diagonal vector is the variance vector associated with each of the components of ^
and
is denoted by SE
^
. The standard error of ^
can then be calculated directly by taking
the square root of
2
^
, where
2
^
=
{V(^)}
ii
(i = 1, 2,
· · · , p), for p parameters.
8.4.2
The observed information matrix (OIM)
Note that the first term in equation (8.17) is equivalent to the second order partial
derivative of Q-function Q(
,
k
), which can be written as
2
Q(
,
(k)
)
2
and
is a interest
parameter vector with length p. If we write
= (
1
,
2
, . . . ,
p
), the element at the ith
row and the jth column in the matrix
2
Q(
,
(k)
)
2
, the second order derivative matrix
of Q-function Q(
,
(k)
) with respect to the ith and jth components of the interest
parameter
,
i
and
j
, can be written as follows:
2
Q(
,
(k)
)
2
=
2
Q(
,
(k)
)
i
j
ij
for i and j from 1 to p.
(8.21)
Since in the PPO-survival model, the interest parameter vector
, given in equation
2
This equation can be regarded as our correction of equation (3.2') of Louis (1982). Louis's
subscripts should specify that i = j rather than i < j.
63

(5.15), consists of the parameters associated with the distributions of failure and cen-
soring times for three subgroups. This leads to the fact of that majority of elements in
the matrix
2
Q(
,
(k)
)
2
are zeros. For instance, the second order partial derivative of the
Q-function,Q(
,
(k)
),
2
Q(
,
(k)
)
2
with respect to two scale parameters
1
=
T,c1
and
2
=
T,c0
(in Weibull PPO-survival model) is zero. That is:
2
Q(
,
(k)
)
i
j
=
2
Q(
,
(k)
)
1
2
= 0
(8.22)
As no an explicit relationship exists between
1
and
2
, which represent the scale pa-
rameter from two different subgroups here.
Even though in the same subgroup, however, with respect to the parameters asso-
ciated with the distributions of failure and censoring times,
2
Q(
,
(k)
)
i
j
can also be zero.
For example, let i = 1 and j = 5, in Weibull PPO-survival model we have
1
=
T,c1
and
5
=
C,c1
, where
2
Q(
,
(k)
)
1
5
= 0 always holds. This is because there is not an explicit
relationship between these two parameters
T,c1
and
C,c1
which are from two different
distributions. Recall that
T,c1
denotes the scale parameter of Weibull for failure time
in and
C,c1
denotes the scale parameter of Weibull for censoring time in subgroup1).
Therefore, in the PPO-survival model (including both Weibull and Log-normal
form), the first term of equation (8.20) can be simplified directly without knowing
the estimated value of parameter ^
. The simplified form of the matrix is given in Table
6 (page 65) where
i
and
j
denote any components of parameter
, for the specific
model, say the NIGN-WW or the NIGN-LL models,
is defined in Table 5 (on page
50). Note that Table 6 shows that in the matrix, [
2
Q(
,
(k)
)
i
j
]
10×10
, only few of elements
are not zero (denoted by
in the table) which can be calculated directly through the
standard likelihood function. In this thesis, the standard likelihood function were given
as either Weibull or log-normal form.
From Table 6, it is easy to see that [
2
Q(
,
(k)
)
i
j
]
10×10
, the second order partial deriva-
tive matrix of Q-function Q(
,
(k)
) is a symmetric matrix. In fact, the second term in
64

Table 6: The patten of second order partial derivative matrix of Q(
,
(k)
).
2
Q(
,
(k)
)
i
j
1
2
3
4
5
6
7
8
9
10
1
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
5
0
0
0
0
0
6
0
0
0
0
0
0
0
0
7
0
0
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
9
0
0
0
0
0
0
0
0
10
0
0
0
0
0
All these components of
are defined in Table 5 on page 50.
equation (8.20) also denotes a symmetric matrix as it is a summation of a product of
a vector and it's transpose; the third term in the same equation represents a symmet-
ric too, even though two different vectors were involved in the product term. This is
because that the later summation is taken over on i = j only, which means that those
single matrix that satisfied i = j was not included in the summation. Therefore, the
OIM, defined by equation (8.20) is really a symmetric matrix as their three components
are all symmetric matrices.
9
Application
This section gives demonstrations of using PPO-survival models via the NIGN-WW and
the NIGN-LL models which were used to re-analyse the HIP breast cancer data. Recall
that the background of this data was introduced in Section 2.
9.1
Pre-processing of the HIP breast cancer data
In the re-analysis of the HIP breast cancer data, two types of data subsets were used.
65

The original HIP data (Shapiro et al., 1988; Baker, 1998) includes 60,696 observa-
tions, in which the enrolment date and the last follow-up date for all individuals had
been recorded, as well as their survival status. The latter indicates whether an individ-
ual died of breast cancer during the observation period or not. Note that in the HIP
breast cancer trial, there were two examiners independently took the responsibility of
determining the course of each single case of death. For those individuals who died but
not due to breast cancer, they were treated as censored individuals in the trial (Shapiro
et al., 1988).
The observed time (from entry to death or last follow-up) was given in days in the
original HIP data. In this thesis, the survival status of an individual is the so-called
censoring indicator (denoted by R = 1 for death from breast cancer, 0 otherwise) and
observed times are counted in years, as in Baker (1998), which was obtained by the
following formula:
observed time (in years) =
days for entry to death or last follow-up
365
.
(9.1)
We note that 164 individuals in the original HIP data have the same date for their
enrolment and their last follow-up date. This implies that these individuals did not, in
fact, take part in the HIP study and thus should not be counted as "valid " individuals
in the HIP trial. Therefore, in this thesis, we focus on analyzing the subset of the HIP
data (60,696 - 160 = 60,532 women) rather than the original HIP data.
The HIP
18
dataset:
The first data subset studied follows Shapiro's report on the
HIP study (Shapiro et al., 1988), wherein the maximum observed time is fixed at 18
years. This is because Shapiro et al. (1988) purports that in the HIP study, the maxi-
mum follow-up period is 18 years. Thus, women whose observed times are greater than
18 years (in the original HIP data set) are all preset at 18. Their censoring indicators
are given as 0, as they can be viewed as censored at year 18 (after entry).
66

The HIP
7
dataset:
The second data subset is obtained by following the arguments
of Baker (1998). Baker mentions that:
"by seven years, the number of breast can-
cers in the control and screen groups have equalized, so presumably they represent all
breast cancers that could have benefitted from screening. " To reduce the variance of
estimators, Baker (1998) focused only on analyzing those individuals with breast cancer
diagnosed in their first seven years of follow-up. Baker also points out that the choice of
seven years (for the HIP study) was conservative; other authors have used earlier times
of so-called equalisation (Baker, 1998).
As with Baker (1998), we also chose to analyse the HIP data with seven years as
the maximum follow-up period. The benefit from the screening examination should
therefore be considered
"realised "
in seven rather than in eighteen years (Baker,
1998). Note that, as we are interested in investigating how the screening examination
impacts the longevity of women who are in general good health (rather than breast
cancer patients). Hence, unlike Baker (1998), in this thesis, we focus on analysing all
women in the HIP trial rather than those who had been diagnosed with breast cancer
during the observation period.
After truncating the observed time at 7 years, individuals whose observed times
were greater than seven years have been given 0 as their censored indicators value,
which shows that these individuals had not died at that time (7 years). Therefore, in
this HIP dataset, denoted by HIP
7
, the maximum observed time is 7 years; individuals
whose observed time are greater than 7 years (in the original HIP data) have R
i
= 0.
This indicates that individuals whose observed time has been truncated (in the HIP
7
data) did not die from breast cancer during the the follow-up period (7 years after
entry). Summary statistics for the three HIP datasets (the original HIP, HIP
18
and
HIP
7
) are given in Table 7 (on page 69).
As stated before, we focused on analysing the two aforementioned HIP data subsets,
namely HIP
18
and HIP
7
, with our NIGN-WW and NIGN-LL survival models. The
results are given in Section 9.2. Furthermore, for the purpose of comparison, these two
67

HIP data subsets are also re-analysed by the Frangakis-Rubin (F-R) model (Frangakis
& Rubin, 1999). To date, the F-R model has not yet been applied to any time-to-event
data in the literature. The results of the re-analysis of the HIP data subsets are given
in Section 9.3.
68

T
able
7:
Summary
statistics
of
HIP
data
sets
Original
HIP
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,696
30,131
30,565
20,147
9,984
40,549
Mean
of
observ
ed
time
¯ Y
i
14.051
14.1578
13.9457
14.905
12.6499
13.6266
Prop
ortion
of
ev
en
t
¯ R
i
0.0129
0.0123
0.0135
0.0123
0.0125
0.0133
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4964
1
0
1
1
0.2462
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3319
0.6686
0
1
0
0
HIP
18
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,532
30,058
30,474
20,132
9,926
40,400
Mean
of
observ
ed
time
¯ Y
i
13.9756
14.0816
13.8709
14.8112
12.6018
13.5591
Prop
ortion
of
ev
en
t
¯ R
i
0.0126
0.0119
0.0134
0.0119
0.0119
0.0130
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4966
1
0
1
1
0.2457
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3326
0.6698
0
1
0
0
HIP
7
data
T
otal
Screen
in
vited
Con
trol
Screen
receiv
ed
Screen
refused
No
Screen
receiv
ed
Z
=1
Z
=0
Z
=1
,D
=1
Z
=1
,D
=0
D
=0
Sample
size
N
60,532
30,058
30,474
20,132
9,926
4,0400
Mean
of
observ
ed
time
¯ Y
i
6.5503
6.5813
6.5197
6.7393
6.2607
6.4561
Prop
ortion
of
ev
en
t
¯ R
i
0.0030
0.0021
0.0038
0.0020
0.0024
0.0035
Prop
ortion
of
screen
in
vited
¯ Z
i
0.4966
1.0000
0
1.0000
1.0000
0.2457
Prop
ortion
of
screen
receiv
ed
¯ D
i
0.3326
0.6698
0
1
0
0
Z
i
=
1
:
in
vitation
for
screening;
D
i
=
1
:
receipt
of
screening
and
R
i
=
1
:
death
from
breast
cancer.
69

9.2
HIP data analysis: via the non-ignorable censoring
mechanism PPO-survival models
For the HIP trial, it is assumed that whether a participant died from breast cancer or not
would not be related to any other participant's treatment status. We can then suppose
Assumption 1 (given in Section 3.2), the SUTVA assumption is satisfied. Assumption
2 (random assignment) clearly holds as all participants in the HIP trial were randomly
assigned to the screening or the control group (Shapiro et al., 1988). Some participants
assigned to the screening group refused to take the screening examination (Shapiro
et al., 1988; Baker, 1998). This implies that these individuals did not follow their
allocated randomised assigned treatment (Z
i
). Assumption 3, namely that of exclusion
restriction, can then be assumed. This means that the assigned treatment (Z
i
) affects
the outcome of interest only through the actual received treatment (D
i
(Z
i
)). We also
suppose Assumption 4, that of monotonicity, holds in the HIP trial, as it is more
acceptable to suppose that no defiers exist than to have defiers in a RCT (Angrist
et al., 1996; Frangakis & Rubin, 1999; Baker, 1998; Yau & Little, 2001; Peng et al.,
2004; O'Malley & Normand, 2005; Mattei & Mealli, 2007). Finally, we suppose that the
latent ignorability assumption holds in the HIP trial, i.e. that the potential failure time
and censoring time are independent, conditional on the latent compliance type. Hence,
all assumptions given in Section 3.2 are assumed satisfied in the HIP trial. In addition,
for all individuals in this trial, there are only two possible types of compliance behavior,
namely compliers or never-takers. Note that no woman assigned to the control group
could receive the screening examination, which implies that no always-takers exist in
the HIP breast screening trial. In other words, the assumption of no access to treatment
for the control (Loeys & Goetghebeur, 2003; Frangakis & Rubin, 1999) holds in the HIP
breast screening trial. However, it is not a necessary assumption for our PPO-survival
models (derived in Section 4).
70

9.2.1
Model specification for the HIP trial
Since no always-takers exist in the HIP trial, the number of compliance types in the
HIP data has thus been reduced to two, and the number of subgroups of interest in the
HIP data are thus reduced to three (Subgroup j ; j = 1, 2, 3), specified as follows:
Subgroup1 : compliers (U
i
= c) from the screening group (Z
i
= 1);
Subgroup2 : compliers (U
i
= c) from the control group (Z
i
= 0);
Subgroup3 : never-takers (U
i
= n) from either the screening (Z
i
= 1) or the control
(Z
i
= 0) group.
Consequently, for the HIP breast cancer data, the model parameter vector
, given
in equation (5.7), is simplified as follows:
= (, ); where
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
),
and
= (
c,00
).
(9.2)
Recall that only
need to be estimated using the EM algorithm, is the interest of
model parameter.
Note that the parameters corresponding to the always-takers subgroup in equation
(5.7) are thus excluded in equation (9.2), because no always-takers exist in the HIP trial.
This can be denoted by
a
= 0, which indicates that the ith individual in the HIP trial,
has 0 (zero) probability of being an always-taker. Therefore, the parameter
c,11
, defined
by the first equation in (5.11), is fixed as 1 (because
a
= 0), i.e.
c,11
1. So for the
HIP data,
c,11
is clearly not a parameter that needs to be estimated. Furthermore, the
parameter
c,00
, defined by the second equation in (5.11) is equivalent to
c
, (
c,00
c
)
because
c
+
n
1 in the case of the HIP study.
Hence, for the NIGN-WW model, which assumes that both failure and censoring
time follow a Weibull distribution, the interest model parameter vector,
W W
, includes
71

eight elements, i.e.:
W W
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
T
,
(9.3)
where
c1
,
c0
and
n
represent the scale parameters of three Weibull distributions for
failure time (T
i
) associated with the relevant subgroups (Subgroup1, Subgroup2 and
Subgroup3), with
T
as the common shape parameter for these Weibull distributions;
and
C,c1
,
C,c0
and
C,n
represent the scale parameters of three Weibull distributions
for censored time (C
i
) associated with the same three relevant subgroups (Subgroup1,
Subgroup2 and Subgroup3) and with
c
as the common shape parameter for these
distributions.
When the log-normal is assumed to be the distribution for failure and censoring
time , the interest NIGN-LL model parameter vector, denoted by
LL
, is as follows:
LL
= (
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
),
(9.4)
where
c1
,
c0
and
n
denote the mean of the logarithm of failure time, corresponding to
the three extant subgroups (Subgroup1, Subgroup2 and Subgroup3);
T
is their common
variance (of the logarithm of failure time); and
C,c1
,
C,c0
,
C,n
and
C
represent the
parameters corresponding to the logarithm of censoring time.
The model parameters corresponding to the specific non-ignorable PPO-survival
models (for the HIP trial) are given in Table 8.
Table 8: Non-ignorable PPO-survival model parameters for the HIP trial.
Model name
model parameter vector
NIGN-WW
(
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
NIGN-LL
(
T,c1
,
T,c0
,
T,n
,
T
,
C,c1
,
C,c0
,
C,n
,
C
)
The estimated values of the interest model parameter vectors
, associated with the
two non-ignorable survival models, namely the NIGN-WW and NIGN-LL models, are
72

given in Appendix10. Specifically, Tables B-1 and B-2 (on page 118) give analogous
estimates for the HIP
18
data; Tables B-3 and B-4 (on page 119), for the HIP
7
data.
Details on the iteration process used in the EM algorithm (Dempster et al., 1977) for
these two specific non-ignorable PPO-survival models are detailed in Appendix
, where
Tables B-5 (on page 120) and B-6 (on page 121) give the iteration of convergence in the
EM algorithm for the HIP
18
dataset, and Tables B-7 (on page 122) and B-8 (on page
123), for the HIP
7
dataset.
9.2.2
Estimation of causal effects: CACE(t) and ACE(t)
Table 9: Estimated CACE(t) and associated standard errors per model.
Models
NIGN-WW model
NIGN-LL model
Data
HIP
18
Years
CACE (SE
CACE
)
CACE ( SE
CACE
)
1
0.19E-4 (0.10E-4)
0.15E-4 (0.07E-4)
5
3.56E-4 (1.86E-4)
5.22E-4 (2.19E-4)
10
12.6E-4 (6.57E-4)
17.9E-4 (7.17E-4)
15
26.2E-4 (13.7E-4)
32.1E-4 (13.2E-4)
Data
HIP
7
Years
CACE (SE
CACE
)
CACE (SE
CACE
)
1
0.66E-4 (0.24E-4)
0.62E-4 (0.25E-4)
5
14.8E-4 (3.83E-4)
15.7E-4 (4.02E-4)
For causal inference in time-to-event studies, we focus on estimating the causal
effects in terms of CACE(t) and ACE(t).
Tables 9 (on page 73) and 10 (on page 74) give estimated
CACE(t) and ACE(t) for the two non-ignorable PPO survival models (the NIGN-WW
and the NIGN-LL model). Their associated standard errors are also given in Tables 9
and 10, which are calculated by following the formulation given in equations (C-15) and
(C-24).
73
A

Table 10: Estimated ACE(t) and associated standard errors per model.
Models
N IGN
- W W model NIGN - LL model
Data
HIP
18
Years
ACE ( SE
ACE
)
ACE (SE
ACE
)
1
0.13E-4 (0.07E-4)
0.10E-4 (0.05E-4)
5
2.38E-4 (1.25E-4)
3.49E-4 (1.47E-4)
10
8.41E-4 (4.40E-4)
11.6E-4 (4.80E-4)
15
17.5E-4 (9.19E-4)
21.5E-4 (8.81E-4)
Data
HIP
7
Years
ACE (SE
ACE
)
ACE (SE
ACE
)
1
0.44E-4 (0.16E-4)
0.41E-4 (0.16E-4)
5
9.90E-4 (2.56E-4)
10.5E-4 (2.69E-4)
Since CACE(t) and ACE(t) both are the functions of the MLE and appealing to
large-sample normality of the MLE, the 95% lower and upper confidence limits (LCL
and UCL) of CACE(t) are calculated easily as follows:
LCL
cace
(t) = CACE(t)
- 1.96SE
cace
(t)
U CL
cace
(t) = CACE(t) + 1.96SE
cace
(t).
(9.5)
Similarly, the 95% lower and the upper confidence limits (LCL and UCL) of ACE(t)
are calculated via the following formulae:
LCL
ace
(t) = ACE(t)
- 1.96SE
ace
(t)
U CL
ace
(t) = ACE(t) + 1.96SE
ace
(t).
(9.6)
The associated Z values of CACE(t) and ACE(t) are also given for all analyses. The
Z value is calculated by using the following equations:
Z
- value
cace
(t) =
CACE(t)
- 0
SE
cace
(t)
(9.7)
74

and
Z
- value
ace
(t) =
ACE(t)
- 0
SE
ace
(t)
;
(9.8)
where 0 is the assumed under the null mean value of CACE(t) or ACE(t). The p-
value derives from 1
- (Z - value) (one tail), where () denotes the standard normal
cumulative probability function.
Further details regarding the estimated ACE(t) and CACE(t), obtained via our
non-ignorable PPO-survival models for the HIP
18
and HIP
7
data sets are given in
Appendix10, where Tables B-10, B-11 give the estimated ACE(t) for the HIP
18
dataset
via the NIGN-WW and NIGN-LL model, respectively, and Tables B-13 and B-14 give
the estimated CACE(t) for the HIP
18
data set via the same two models. Tables B-17,
B-18, B-19 and B-20 are analogous for the analysis of the HIP
7
data set.
Based on these estimated values of ACE(t) and CACE(t), the curves of the estimated
CACE(t) and ACE(t) against survival time (years survival post entry) are plotted in
the following figures per data subset analysed. For the the HIP
18
data, Figures 1 and
2 (on page 76) show the estimated CACE(t) curves obtained using the NIGN-WW and
NIGN-LL model, respectively, and their associated ACE(t) curves are given in Figures 3
and 4 (on page 77); the corresponding curves of estimated CACE(t) and ACE(t) (with
the same names) for the HIP
7
data are given in Figures 5 and 6 (on page 78) and