For new authors:
free, easy and fast
For registered authors
Diploma Thesis, 2004, 101 Pages
Diploma Thesis
1,0
“It is almost universally accepted that technological change and other kinds of innovations are the most important sources of productivity and increased material
welfare - and that this has been so for centuries”. ^{ 1}
On the corporate level, the recognition has succeeded that the implementation and maintenance of a successful innovation management is the key contribution to competitiveness and future growth. For this reason, there is great interest in understanding the processes of innovation and its subsequent diffusion to formulate appropriate policies.
Within the last decades, researchers in management and marketing science have greatly contributed to the development adoption- and diffusion theory by suggesting analytical models for describing and forecasting the diffusion of an innovation in a social system. The main reason for this has been the perceived high failure rate of new products and the consequent needs to improve the related management and marketing decisions.
The explanation why firms do not instantaneously adopt new technology immediately after its commercialisation (i.e. diffusion is a time-intensive process) can be traced to different theories of innovation diffusion advocated in literature. According to early epidemic theories of inter-firm diffusion, ^{ 2} diffusion is a disequilibrium process resulting from information symmetries between potential adopters. ^{ 3} In contrast to epidemic models, contemporary approaches to technology diffusion are characterised by the dismissal of information spreading as the key explanatory variable of innovation diffusion. ^{ 4} Rather, models in general assume that firms behave optimally (i.e. are profit maximizers) and that information pertaining to the technological and economic characteristics of the information is perfect. Within this ^{}
equilibrium approach there are three categories of models that have been developed in the literature: the rank or probit; stock or game theoretic, and order effects models.
In rank or probit models ^{ 5} potential adopters of technology have different inherent characteristics and as a result obtain different gross returns from its use.
The essence of stock effect models is that benefits to the marginal adopter from acquisition decreases as the number of previous adopter’s increases. ^{ 6}
Order effect models are similar to the rank effect models in that the gross returns of a firm adopting a new technology depends upon its position in the order of adoption, with higher order adopting firms achieving a greater return than low-adopters. ^{ 7}
Despite the continuing progress of contemporary approaches, the main impetus
underlying diffusion research is still the epidemic Bass model ^{ 8} . Subsuming the majority of other models derived from that model or independently, this model addresses the market in the aggregate. The typical variable measured is the number of adopters who purchase the product by a certain time t. The emphasis is on the total market response rather than on the individual adopter. Here, the individual characteristics of potential adopters and their impact on the decision-process remain wholly uncovered. Not the individual who decides, whether to adopt or reject an innovation is central to the analysis, but the time-related distribution of the adoption
decision dependent on marketing variables. ^{ 9} These models cannot explain why a particular individual adopts or rejects an innovation at a specific point in time. Consequently, these models achieve no adequate aggregation of individual adoption decisions. Although the specific managerial implications that these models give should not be questioned in general, they remain limited by the aggregate perspective which they take.
In fact, diffusion theory faces a constant dilemma between disaggregate and aggregate diffusion modelling. Although it is unquestionable that the diffusion process is built upon individual adoption decisions, the persuasion that diffusion
^{ 4} ^{Gourlat, Pentecost (2000).} ^{}
One reason lies undoubtedly in the substantial modelling obstacles that theory has faced so far in trying to pursue this.
Most models that allow for illumination of individual adoption behaviour are static in nature, hereby failing to capture the inherent dynamics of the diffusion process which makes plausible aggregation nearly impossible. This dilemma has forced an explicit distinction between adoption- and diffusion theory. Although this distinction is often taken to frame the sort of analysis that is performed, it is forced by the disability of most diffusion models to persuasively incorporate the naturally inherent individual perspective.
By recognizing that the diffusion process is built upon individual adoption decisions, the adoption theory should be recognized and modelled much more as the key basement of diffusion theory rather than a theory that is conceptionally and in content different to the diffusion theory. The implication of this is that diffusion models that take the individual perspective simultaneously perform an adoption analysis.
Moreover, diffusion models based on individual adoption decisions offer an opportunity to study the actual pattern of social communication and its impact on product perceptions, preferences and ultimate adoption. Nonetheless, first attempts to establish the diffusion process on the basis of individual adoption decisions faced
severe problems in realizing ultimate aggregation. ^{ 10} Merely the study by Chatterjee and Eliashberg (1989) provided encouraging empirical evidence for a useful
aggregation of individual adoption decisions. ^{ 11} Indeed, it has been recognized only recently that the above described dilemma can be solved.
So-called event history data is able to capture the dynamics of the diffusion process while, simultaneously, the individual perspective (micro level) can be preserved.
Eventually, with the introduction of hazard models ^{ 12} into diffusion theory, various micro models were found that could effectively deal with event history data and thus allowed for consideration of individual heterogeneity among adopters by incorporating covariate effects into diffusion models. Up to now, most models that have come up in the widely applied field of event history analysis have been applied ^{}
to diffusion theory, too. 13 It should be said, however, that these applications have taken place only recently making the use of event history data still a novel thought to diffusion theory.
The main reason for this may lie in the extent of data collection necessary to perform an analysis. Especially, in economic theory, where the necessity for event history data is not obvious, this may prove a vital obstacle; keeping track of each individual and his adoption decision is undoubtedly a more challenging task than simply taking the aggregate approach. Fortunately, with rising technological possibilities, the applicability of event history models has risen, too.
With the extension of the non-parametric classification and regression tree method (CART ^{ TM} ) ^{ 14} to the analysis of censored event data, we are now given the opportunity to move research forward by examining usefulness and applicability of that method for the analysis and forecast of innovation diffusion. The development of the socalled “survival trees” was highly motivated by the need to develop meaningful
prognosis rules in medical science. ^{ 15} As will be shown later, there are a number of essential parallels between survival analysis in medical science and diffusion analysis in economics. Emergences of new methods in that field are therefore likely to prove applicable in adoption- and diffusion theory (ADT), too.
As the CART ^{ TM} method itself is still new to economic theory, it should not surprise that no known application of survival trees has taken place in an economic context so
far. Indeed, even for the CART ^{ TM} method only two applications in an economic context are known. ^{ 16} Both methods, CART ^{ TM} and survival trees, have been developed in the area of medical science and seem to spread only slowly to other scientific areas. Economists and other non-medical scientists alike will have to be persuaded of the new insights that these methods offer. As for survival trees, this thesis is the first attempt to do this. ^{}
The method offers additional insights into causal relations that traditional methods fail to give and can therefore resemble a powerful contribution to modern diffusion theory. Its interpretational power makes it likely that this method will meet widespread acceptance.
I want to briefly put into words the structure of the thesis that is already summarized in the table of contents. I believe this will make it more easily understandable and more coherent. Additionally, I find it important that the reader is aware of the internal pattern underlying this thesis. With this, I mean simple formatting or used terminology decisions.
Let us start with the structure: In the course of the thesis, the survival tree method will be introduced within the context of ADT. For this reason, I will provide arguments in favour of dynamic micro models as a means to analyse and forecast innovation diffusion (section 2.1).
As event history data enables us to do this, I will set up the common concepts and ideas of event history data modelling just as the classical methods from this area, all within the context of ADT (2.2, 2.3, and 2.4). This will be done to grasp an understanding of the interpretation and functionality of the event history patterns within the ADT context and is considered essential for understanding the survival tree method and its usefulness in forecasting innovation diffusion.
Survival trees have been derived from CART ^{ TM} and consequently both methods share essential conceptual features. After a general introduction into the CART ^{ TM} methodology (3.1) and a first introduction in the area of survival trees (3.2), I will attempt to classify the proposals that have come for the construction of survival trees
into three building blocks that are commonly used in the construction of CART ^{ TM} (3.3).
Subsequently, the various proposals that have come up in the construction of survival trees will be evaluated and the merits just as the deficiencies of the method will be discussed (3.4).
I will describe in detail the software applications available for survival tree
calculations to facilitate future work on them (4.1). The data that the method will be applied on is presented and the way data was handled is documented (4.2) before I state which of the various options was taken (4.3).
Analysing the results, we will see whether the method can offer new insights into ADT and whether the previously discussed merits & deficiencies of the method hold true or might have to be reconsidered in the discussed context (4.4).
Eventually, I will discuss the central question about the usefulness of the method to forecast innovation diffusion. I will try to relate the method’s results and their implications to economic practice. Other related issues and thoughts will be discussed, as well (4.5). Conclusively, main patterns and findings of the thesis will be summarized (4.6).
Let me now explain the internal pattern of the thesis relating to measures that were taken to ease functionality and readability of the thesis.
The problem of inconsistent terminology is particularly apparent in event history analyses. If we take, for instance, the denomination “event history data”, we can easily find at least five other denominations, all used interchangeably, which may sometimes hamper understanding substantially. I will thus name these cases when they appear and say explicitly which of the various denominations I will use. Additionally, I have developed an index of synonyms in Appendix 7.3 to prevent any confusion.
Other confusion is likely to be caused by the various denominations in ADT. No definite rule can be established as to whether one should use adoption theory or diffusion theory for a specific field under investigation. In this thesis, I claim that these two areas belong essentially together. I will therefore make no distinction between these two areas using the single denomination adoption- and diffusion theory (ADT) throughout this thesis.
Besides, there is no generally agreed structure in the area as to what model belongs to what class of models and so on. The classification of models into micro and macro, static and dynamic models is by no means generally agreed and was adapted from Litfin (2000).
For easier readability and in order to put emphasize on sentences that I consider vital, I will format respective text bold or italic. In this way, words representing
important issues are formatted bold to enable easier localization. 17 Italic formatting is used for sentences that I considered vital for overall understanding.
I have noticed that the literature on survival trees has picked up momentum within the year 2003 and 2004, especially. This made it difficult to incorporate all new literature in the thesis as it was published while this thesis was written. Yet, I think I have successfully attempted to include all literature until the end of November 2004 in the thesis.
Sometimes, I will sum up findings or provide a brief outlook at the very beginning of a section. I do this to make sure one does not lose track of the findings and is always aware of why a certain section was written.
In this chapter, I will provide reasons why innovation diffusion analysis and forecast ^{}
should be performed on the basis of dynamic micro models. These models can be established only on the basis of event history data. As all models from the area of event history analysis are either directly or indirectly based on the hazard rate framework, I will establish this framework to ease understanding of the upcoming presentation of the various parametric, semi-parametric and non-parametric models.
For the upcoming introduction of the survival trees, it is important to understand the conceptionel parallels between diffusion theory and survival analysis. These parallels allow us to use models coming from the area of survival analysis for ADT.
“An innovation is an idea practice or object that is perceived as new by an individual or another unit of adoption” ^{ 21} . Commonly speaking, innovation diffusion theory addresses how new ideas, products and social practices spread within society or from one society to another. Moreover, adoption theory analyzes the process of innovation adoption by an individual. Both theories aim to identify explanatory variables that drive and determine the respective process. The adoption process of each individual can differ in starting point and duration. In this way, adoption decisions of members of social systems are spread across time. Consequently, the adoption theory forms the fundament of innovation diffusion theory and is thus part of it.
While, by definition, adoption theory is mainly concerned with the exploration of the determinants of adoption, the diffusion theory focuses on the aggregate analysis of all adoption decisions of the members of a social system.
However, by recognizing that the diffusion process is built upon individual adoption decisions, the adoption theory should be recognized and modelled much more as the key basement of diffusion theory rather than a theory that is conceptionally and in
The diffusion of an innovation has traditionally been defined as the process by which “an innovation is communicated through certain channels over time among the members of a social system” ^{ 23} . This definition, with its reference to innovation, communication (and the respective communication channels), time and the members of a social system names the four key components widely recognized as driving innovation diffusion. Although the diffusion process is undoubtedly a dynamic process, the majority of the models that have emerged in diffusion theory could only insufficiently capture this essential feature. ^{ 24} Empirical research for analysis and forecast of the diffusion process is still dominated by aggregate diffusion models that mostly envisage capturing the influence of marketing variables on the success of an
These approaches are convenient in practical terms but they raise the following question: Can a genuine diffusion model be constructed by aggregating demand from consumers who behave in the neoclassical way? That is, assume that consumers are smart and are not just carriers of information? They therefore maximize some objective function such as expected utility or benefit from the product, taking into account the uncertainty associated with their understanding of its attributes, its price, pressure from other adopters to adopt it and their budget. Because the decision to adopt is individual-specific, all potential adopters do not have the same probability of adopting the innovation in a given time-period. Is it possible to develop the adoption curve at the aggregate market level, given the heterogeneity among potential adopters in terms of adopting the innovation at any time t? ^{ 25}
In fact, aggregate models cannot explain why an individual adopts or rejects an innovation at a specific point in time. As a result, these models achieve no adequate aggregation of individual adoption decisions. Analysis and forecast of adoption procedures by means of these models is hardly convincing. While attempts have been taken to unbundle adopters of the aggregate level by categorizing adopters expost into a scheme, they could not eliminate the shortcomings of the underlying assumption of adopter homogeneity. ^{ 26} ^{}
The general scheme used for adopter classification is that of Rogers. Rogers divided individual responses to technology into five ideal categories: innovators, early adopters, early majority, late majority, and laggards. ^{ 27} According to him, the main concern of the innovation diffusion research is how innovations are adopted and why innovations are adopted at different rates. Furthermore, he identified five characteristics of innovations that help to explain differences in adoption rates: relative advantage, compatibility, complexity, trialability, and observability. His work has become fundamental to innovation diffusion research and has been documented and quoted in many papers and books.
Although a wide variety of innovations and diffusion processes have been investigated, one research finding keeps recurring. If the cumulative adoption time path or temporal pattern of the diffusion process is plotted, the resulting distribution can generally be described as taking the form of an s-shaped (sigmoid) curve. ^{ 28} The observed regularity in the diffusion process results from the fact that initially only few members of the social system adopt the innovation in each time period. In subsequent time periods, however, an increasing number of adoptions per period occurs as the diffusion process begins to unfold more fully. Finally, the trajectory of the diffusion curve slows down and begins to level off, ultimately reaching an upper asymptote. At this point diffusion is complete. ^{ 29}
In entrepreneurial reality, information about the process of diffusion is crucial to the success of new product marketing. If this information is provided on the aggregate level, however, marketing implications are limited. A company will not know whom to target to drive the diffusion process forward. These shortcomings may have let to an unquantifiable waste of resources as companies are likely to have targeted late adopters in the early stages of the innovations market placing and vice-versa. A tool that can identify crucial target groups at every stage of the diffusion process is seen to be of utmost importance in marketing. So far, there is no method that is capable of providing this insight.
Besides, the witnessed unilateral reliance on aggregate models may have let to a great number of incorrect diffusion prognoses. The most prominent example of an ^{}
(ex-post) off beam forecast that was based on an aggregate model is described in a
diffusion study by Berndt and Altobelli (1991) ^{ 30} . Other wrong forecasts may prove the insufficient predictive power of these diffusion models. ^{ 31}
In practice, companies need information about target clients and the factors that drive their decisions; something aggregate models cannot provide. This growing recognition has materialized in a mounting demand for rapid integration of micro models to identify and analyse the adoption and diffusion process. Next to the widely used macro models, these micro models can contribute decisively to the analysis and forecast of adoption behaviour and the resulting diffusion process.
Even though the adoption behaviour is nothing but the disaggregated form of the diffusion process, the areas of adoption theory and diffusion theory have been largely separated so far. In fact, not all micro models can be used to analyse and forecast innovation diffusion.
Generally, all micro models consider the heterogeneity of individuals and allow for the integration of co-variables. There is only one type of model, however, that can adequately model censored event data in order to capture the dynamics of the diffusion process. Thus, I claim that only dynamic micro models can be used to forecast innovation diffusion adequately.
To illustrate this, a comparison between a static model and dynamic micro model will
be used. ^{ 32}
If the focus of analysis is on finding out whether a specific individual adopts or rejects an innovation at a specific point in time and what explanatory variables can be identified, then logistic regression is often employed. ^{ 33} This method explains one dependent dichotomous variable through a number of independent variables. Within the framework of ADT, the dichotomous variable can be labelled “adoption of innovation” and “rejection of innovation” always with respect to one specific point in time. ^{ 34} ^{Independent variables could be all sorts of individual characteristics. In ADT}
^{exact form of each curve, including the scope and the asymptote, may differ.} ^{}
one often differentiates between product-, adopter- and environment specific
independent variables. ^{ 35}
For logistic regression the usual restrictive assumptions that are known from linear regressions have to be taken. ^{ 36} A violation of these premises can lead to distorted and inefficient estimations for the regression coefficients and eventually to invalid statistical inferences. Here, empirical research is still severely limited by the existence of multicollinearity and autocorrelation between the independent variables. Generally speaking, logistic regression establishes a functional relation between the probability that an event takes place (i.e. an individual adopts the innovation) and a number of predetermined explanatory variables (i.e. independent variables).
In contrast to the linear regression the observable dependent variable, in this case, is
not metric, but dichotomous. ^{ 37} Logistic regression quantifies and thus identifies the factors driving or preventing individual innovation adoption. Heterogeneity of individuals is respected and uncovered. Nevertheless, the characteristics of the process itself are not considered at all. With the help of logistic regressions only the result of the adoption process can be revealed. All individuals who have adopted the innovation in between the market placing and the end of the observation period are classified as adopters. Individuals who have not yet or will never adopt the innovation are accordingly classified as non-adopters. There is no differentiation with respect to the adoption’s specific point in time and the future possibility of adoption. Logistic regression ignores time and thus merely gives a snapshot of adoption behaviour and the diffusion process.
No valid conclusions can be drawn concerning future market potential, for instance. Despite of this, it is out of the question that with the method elementary relations between adoption decisions and its determinants can be established. Nevertheless, in logistic regressions the duration between market placing and adoption is not taken into account. There is no difference between those individuals who adopt the innovation shortly after market placing and those who adopt shortly before the observation period ends.
Yet, it appears only natural that, by average “early adopters” exhibit a higher ^{}
likelihood of adoption than “late adopters”. The negligence of this information reduces
accuracy and inferential power of static estimations. ^{ 38} Besides, it is the time-related observation of the adoption process, in particular, that enables predictions about future adoption behaviour and thus innovation diffusion. A solution could be the integration of a time-to-adoption independent variable but then one could only consider the individuals who have already adopted the innovation within the observation period. As for the individuals who have not adopted in the period, no time-to-adoption duration can be asserted, as we do not know when and whether they will adopt the innovation after the observation period ends. These observations are “censored”. Censored data can simply be ignored and filtered off the analysis, but this leads to distorted estimates, which is why this approach should be abandoned in the presence of censored data. It is here that the so-called event history models come in.
The general purpose of the analysis of event history data is to explain why certain individuals are at a higher risk of experiencing the event(s) of interest than others. ^{ 39} In general, this can be accomplished by using special types of methods which, depending on the field in which they are applied, are called failure-time models, lifetime models, survival models, transition rate models, response-time models or hazard models. ^{ 40} It should be noted, however, that the origin of event history data modelling lies in the area of medical science. ^{ 41} For this reason and for the continuing dominance of survival analysis within the area of event history data modelling, it is not surprising that all of the models that will be introduced shortly have been developed in this area and thus carry respective denominations.
In hazard models the risk of experiencing an event within a short time interval is regressed on a set of covariates. ^{ 42} Two special features distinguish hazard rate models from other types of regression models: They make it possible to include ^{}
censored observations in the analysis and to use time-varying explanatory variables. Censoring is, in fact, a form of partially missing information: On the one hand, it is known that the event did not occur during a given period of time, but on the other hand, the time at which the event occurred is unknown. Time varying covariates may change their value during the observation period. The ability of including covariates that may change their value in the regression makes it a truly dynamic analysis.
In order to understand the nature of event history data and the purpose of event history analysis, it is important to understand the following four elementary concepts: state, event, duration, and risk period. These concepts are illustrated below using first an example from the analysis of unemployment histories. ^{ 43}
The first step in event history analysis is to define the relevant states which can be distinguished. The states are the categories of the dependent variable, the dynamics of which we want to explain. At every particular point in time, each person occupies exactly one state. In the analysis of unemployment histories, four states are generally distinguished: employment, part-time employment, re-training, and unemployment. The set of possible states is sometimes called the state space.
An event is a transition from one state to another, that is, from an origin state to a destination state. In this context, a possible event is “first employment”, which can be defined as the transition from the origin state, unemployed, to the destination state, employed. Other possible events are: taking a part-time employment or a job retraining. It is important to note that the states which are distinguished determine the definition of possible events. If only the states employment and unemployment were distinguished, none of the above mentioned events could have been defined. In that case, the only events that could have been defined would be becoming employed or unemployed.
Another important concept is the risk period. Clearly, not all persons can experience each of the events under study at every point in time. To be able to experience a particular event, one must occupy the origin state defining the event, that is, one must be at risk of the event concerned. The period that someone is at risk of a particular event, or exposed to a particular event, is called the risk period. For example, someone can only experience to become unemployed when one was
Keiding (2001), pp. 4956-4962 for more details.
employed before. A strongly related concept is the risk set. The risk set at a particular point in time is formed by all subjects who are at risk at experiencing the event open at that point in time.
Using the concepts, event history analysis can de defined “as the analysis of the duration of the non-occurrence of an event during the risk period” ^{ 44} . This duration is usually labelled by the term episode ^{ 45} . When the event of interest is “first employment”, the analysis concerns the duration of non-occurrence of a first employment. In practice, as will be shown below, the dependent variable in event history models is not duration or time itself but a rate.
Therefore, event history analysis can also be defined as the analysis of rates of occurrence of the event during the risk period. In the first employment example, an event history model concerns a person’s employment rate during the period that he or she is in the state of never having been employed.
A strong point of hazard models is that one can use time-varying covariates. These are covariates that may change their value over time. Examples of interesting time varying covariates are, in the employment history example, an individual’s financial status or health status. As a matter of fact, the time variable and interactions between time and time-constant covariates are time-varying covariates as well.
We now do have to fit the above described concepts into the area of ADT: In ADT one generally distinguishes between two states: “adoption” or “non-adoption” of an innovation. The event will be described by the adoption an innovation, which can be defined as the transition from the origin state, non-adoption, to the destination, adoption. This event pattern is called a “single non-repeatable event” where the term single reflects that the origin state, non-adoption, can only be left by one type of event, and the term, non-repeatable, indicates that the event can occur only once. Models that have been developed for this type of event pattern, we will call single risk models ^{ 46} . The duration measures the time until an individual adopts an innovation. Logically, an individual does not necessarily have to adopt within the observation period or further beyond it. Individuals that do not adopt within the
^{ 43} ^{See for an example of this type of analysis: Heckman, Borjas (1980).} ^{}
observation period and of which we do not when or whether at all they will adopt in the time after produce censored data. I will describe this phenomenon later within the current context.
By and large, this is the sort of event history pattern that is known from the area of survival analysis. In both fields, we observe the duration that lies between some predefined point in time and one single (absorbing) event. In most cases, survival analysis deals with the investigation of the duration between the beginning of treatment or hospitalization and the death of an individual. Ironically enough, both the adoption decision and the death of an individual are single non-repeatable events. As both processes are equal in terms of their general event pattern, survival models represent likely alternatives for modelling and analyzing adoption- and diffusion processes. It should thus not surprise that all models that will be introduced come from the area of survival analysis. In effect, the vast number of models in survival analysis has been developed to model this type of event pattern. There are, indeed, other alternative concepts, some of which may also be used in the context of ADT. I want to shortly introduce these for a more conclusive introduction.
Sometimes, it may prove necessary or is simply wanted to distinguish between different types of events or risks. In the analysis of death rates, one may, for example, want to distinguish between different causes of deaths. In ADT a distinction between various causes of adoption decisions is equally conceivable.
The standard method for dealing with situations where, as a result of the fact that there is more than one possible destination state, individuals may experience different types of events is the use of multiple risk or competing risk models. ^{ 47}
Most events studied in social sciences are repeatable, and even most event history data contains information on repeatable events for each individual. This is in contrast to medical research and to ADT where the event of greatest interest is death or adoption, respectively. Events of repeatable events could be job changes, having children, arrests, or promotions. In an economic context, the investigation of (repeated) product buying decisions may prove interesting. Often events are not only repeatable but also of different types, that is, we have a multiple state situation.
^{will make model distinction much easier.} ^{}
When people can move through a sequence of states, events cannot only be characterized by their destination states, as in competing risk models, but they may also differ with respect to their origin state and destination states. An example is, once again, an individual’s employment history: An individual can move through the states of employment, unemployment, and out of the labour force. In that case six different kinds of transitions can be distinguished which differ with regard to their origin and destination states.
Hazard models for analyzing data on repeatable events and multiple-state data are special cases of the general family of multivariate hazard models. Another application of multivariate hazard models is the analysis of dependent or clustered observations. ^{ 48} Examples are the occupational careers of spouses, educational careers of brothers, child mortality of children in the same family. Hazard rate models can be easily generalized to situations in which there are several origin and destination states and in which there may be more than one event per observational unit. ^{ 49}
After this general overview to other event history concepts, it is important to stress again that, in the course of this thesis, I will exclusively introduce and apply models for the analysis of single non-repeatable events (single risk models). Moreover, the integration of time varying explanatory variables will not be considered.
Therefore, I will model the adoption- and diffusion process as having one origin state, non-adoption, and one final non-repeatable event, adoption. Hereby, I will analyse the impact that time-constant explanatory variables have on the dynamics of this process.
Let me now explain the statistical framework of event history analysis that is essential in understanding hazard models regardless of the specific concept chosen.
As such, hazard models have already been introduced into ADT. ^{ 50} Moreover, these ^{}
models have been used, in an economic context, to analyse and forecast business
and firm survival (failure). ^{ 51} In hazard models, no time-point related snapshot of adoption behaviour is analyzed but a time-related observation is established that considers the process characteristics. For this purpose, one needs to know of each individual not only whether an event has taken place but also the duration until the event occurred.
The duration is put into a functional relationship with explanatory variables which can reflect both an individual’s subjective perception of an innovation ^{ 52} just as the individual characteristics of the decision-makers. In contrast to the logistic regression, this approach allows not only to ascertain the adoption probability at a specific point in time but more importantly these probabilities can be determined for each individual at any point in time. This enables a more realistic forecast of adoption behaviour. Eventually, by aggregation of the individual probabilities the macro-level can be established hereby illustrating the diffusion process over time.
The process under study (i.e. the adoption process) starts with the market placing of the innovation and ends with the adoption of a sample member at time i t . The duration of an episode is represented by a random non-negative continuous variable t for the i th ^{sample member. This implies that the time-to-event duration is}
i
interpreted as the realisation of a random process. ^{ 53}
As said, the time-to-event duration i t depends on a number of explanatory variables. ( ≥ These are combined in the vector i X . The duration of an episode i follows a ) 0 t t i specific distribution that is represented by the distribution function . The ) ( i t f respective density function is given by . The observation period has the length ) ( i t f ( T . ] , 0
The following relation between the density function and the cumulative distribution function can be established:
t i
³ = ≤ = (2-1) dv X v f X t T X t F ) | ( ) | Pr( ) | ( i i i i i i 0 ^{}
and under the assumption that the density function is continuous:
In the context of hazard models, the so-called survivor function 54 plays an important rule, too. This function is defined as
> = − = (2-3) ) | Pr( ) | ( 1 ) | ( X t T X t F X t S i i i i i i
and represents the probability that the i th member experiences (i.e. “survives”) the point in time t i , which is equivalent to the probability that the member has not yet adopted the innovation at this point.
Dependent on the assumed distribution of i t across all members of the sample, there exists a number of differing survivor functions, which all share one feature: All survivor functions fall monotonously as time proceeds. Translated into the adoption context, this means that the probability of no adoption decreases and the probability of adoption increases with time. Furthermore, the survival probability is 1 for a duration of 0 and 0 for an infinite duration. Yet, the process differs in between these two extremes, whereas explanatory variables can have both, an accelerating and a delaying effect on the survival probability. The following relation we get when time is measured continuously: ^{}
∞ ³ = − = (2-4) dv X v f X t F X t S ) | ( ) | ( 1 ) | ( i i i i i i t i The hazard rate ^{ 55} ^{is defined as}
1
= ≥ Δ + < ≤ (2-5) ) | ( i X t h ) , | Pr( X t T t t T t lim i i i i i i Δ t → Δ i 0 t
> Δ 0 ti
The aim of the hazard rate is to quantify the conditional probability (i.e. the
risk/hazard) that the event “adoption” has already taken place for the i th member at time t. As time is a continuous variable, the probability will have the value of 0 at exactly one point in time. For this reason, not a point in time, but a very small time
interval (t i ; t i + ¨t i ) is observed. The hazard rate function completely describes the probability distribution of the time until an event.
Furthermore, the condition is made that no adoption took place prior to that time interval. Otherwise, the risk of adoption would be redundant. To prevent that the hazard rate is inflated by the dimension of the time interval, the following measures are taken: First of all, only a small time interval is considered and secondly, the
probability is adjusted by dividing it by the dimension of the time space ¨t i . ^{ 56}
Henceforth, the hazard rate can be interpreted as the marginal value of the
conditional probability that the adoption takes place within the time interval (t i ; t i + ¨t i ) under the condition that no adoption took place prior to the beginning of the time interval and that the vector X i is given.
Note that, in contrast to the survivor function, which focuses on non-adoption, the hazard rate focuses on adoption, that is, on the event occurring. Thus, in some sense, the hazard function can be considered as giving the positive side of the information given by the survivor function, That is the higher S(t) is for a given t, the smaller is h(t) and vice versa. ^{ 57}
If the i th member “survives” the point in time t i , then the hazard rate informs ^{}
approximately about the future process of the probability that the event takes place. The hazard rates can greatly differ in progress. The only restriction is that of nonnegative hazard rates. Choosing an alternative formulation for the density function
reveals its similarity to the hazard rate, ^{ 58}
1
= Δ + < ≤ (2-6) ) | ( X t f ) | Pr( X t t T t lim i i i i i i Δ → Δ t 0 t i
> Δ 0 t i
The only difference between equations (2-5) (“hazard rate”) and (2-6) (“density function”) lies in the restriction; while in equation (2-6) the probability depends merely on the vector X ^{ i} , in equation (2-5) the condition that the adoption has not yet taken place before the t ^{ i} holds additionally.
The hazard rate (2-5), the density function (2-1), and the survivor function (2-3) all constitute equivalent forms to describe the continuous probability distribution of the random variable t i in dependence on X i. The relation between the function can be derived from the above equation as follows: ^{ 59}
) | ( ) | ( X t f X t f i i i i = = (2-7) ) | ( X t h i i − ) | ( 1 ) | ( X t F X t S i i i i
Although the process under study is fully described by one of these functions, it should be clear that a distinction between these is useful as every function centres on differing aspects. The hazard rate can be interpreted as the “risk” that an adoption has taken place within the observed time period under the condition that no adoption has yet taken place. ^{ 60} Furthermore, the survivor function provides information about the probability that the sample member survives that time period (i.e. that no adoption takes place within the observed duration). For each member of the sample this information exists at any point in time within the observation period.
In order to empirically estimate hazard models for all members of the sample with size I, the duration until the event just as the corresponding explanatory variables for the observation period (0,T] have to be known. In most cases the end of the observation period is pre-determined by the mere fact that random samples are ^{}
drawn retrospectively. As a consequence, the following censoring problems 61 can arise: ^{ 62} We talk about left censored data, when the event “adoption” has already taken place before the beginning of the observation period. In this case, we only know about the fact that the sample member has already adopted. Whereas left censored data describe events that have taken place before the beginning of the observation period, right censored data refer to events that occur after the observation period has ended. In the case of right censored data, one only knows that sample member have not yet adopted the innovation, but we do know at what point in time after the observation period the adoption will take place.
To distinguish between individuals (or firms) who experience the event from those
who are censored, we usually use a dichotomous variable δ that indicates the δ is 1 if firm 1 gets the event or is 0 if the observation is censorship status. Thus, 1
censored. The standard assumption made in event history models is that the failure and censoring mechanisms are independent and that censoring is not affected by the covariates.
The problem of censoring is illustrated in figure 2-1: Observations 1 and 3 are seen as complete, as here one can exactly identify the time-to-event duration and the respective explanatory variables. The observations for sample members 2 and 4 are censored. For them, it is impossible to determine the time-to-event duration. With respect to sample member 2, one only knows that at the end of the observation period he still belongs to the group of non-adopters and for member 4 that he already adopted any time before the observation period.
Figure 2-1: Right- and left censored data in ADT
^{}
Following Litfin (2000), p. 46
The problem of left censoring can be solved without difficulty in the current context. For this, we let the start of the observation period correspond to the date of the innovation’s market placing, so that no adoption can possibly happen before the observation period. This seems easy as all data is collected retrospectively anyway.
The problem of right censoring, on the other hand, cannot be solved that easily. If the goal of the analysis is not only to ex-post examine adoption behaviour and the diffusion process, but also to forecast the diffusion process we face practical obstacles. Logically, it is impossible to move the end of the observation period forward until all observations are complete. Elimination of the censored data is out of the question, as this would lead to distorted estimations. ^{ 63} Here, one rather uses all available information about observations in the analysis.
Although, we have no information about the duration for the right censored data, we know that no adoption has yet taken place. This information has to be considered in the maximum likelihood function. Various models have been developed to successfully consider right censored data in hazard models. ^{ 64} As mentioned earlier, the integration of censored data is a unique feature of the hazard models.
In section 2.2, I provided an introduction into the concepts and terminologies of event ^{}
history analysis. Subsequently, I established the statistical framework of event history models. I mentioned that comprehension of these ideas is essential in understanding the models that we are to present in this section. Also, I had mentioned that in introducing the classical methods for event history analysis, I would focus on the classical methods used for survival data. As survival analysis occupies a predominant position within event history analysis, these methods can and are, indeed, seen as the classical methods for event history analysis, too.
We can divide the hazard models into 3 well-known categories: non-parametric, parametric and semi-parametric models. ^{ 65} To understand what these models try to investigate, I will state, in general terms, three principal goals of survival analysis: ^{ 66}
To estimate time-to-event for a group of sample members ^{ 67} Goal 1:
To compare time-to-event between two or more groups ^{ 68} Goal 2:
Goal 3: To assess the relationship of explanatory variables to time-to event
Certainly, goal 3 can be said to have always been of central interest to both survival and adoption- and diffusion modelling.
As for the non-parametric methods, these fulfil mainly descriptive purposes and are generally used to estimate and compare the survivor and/or hazard functions. Both the parametric and semi-parametric approach are regressions of the hazard rate. As opposed to the parametric hazard models, Cox’s semi-parametric hazard model merely parametrifies the influence of covariates, but not the time-dependence of the hazard rate. The advantage of this model is the fact that one doesn’t need to take assumptions concerning the distribution of the duration.
Because these methods do not make any assumptions about the distribution of the ^{}
process under investigation, non-parametric methods are often used for ad-hoc examination of survival distribution and first exploratory assessment of variable relations. Moreover, these methods fulfil primarily descriptive purposes.
One obvious first numerical approach to survival data that I initially want to mention is the assessment of the median or mean survival time. In table 2-1, the mean adoption (survival) time of e-sale for a dataset of European enterprises is exhibited as a byproduct of the Kaplan-Meier estimate calculations. ^{ 69} Additionally, a 95% confidence interval, showing very low variability in data, is calculated for the mean adoption time. The median adoption time cannot be estimated if the number of events is much less than half of the observations studied. This is the case here.
Considering that the observation period was 9 years, the exhibited data clearly tells us that most of the assessed adoption durations lie at the very end of the observation period. Yet, censored data is not reasonably considered here as calculation is done on the basis of observed adoption time implying an adoption time of 9 years for observations that were censored at the end of the observation period. This is why the mean adoption time, as such, provides very limited insight in the data structure and is never used solely.
Table 2-1: Mean adoption time for e-sale
Source: SPSS output for Kaplan-Meier estimates
In general, literature describes mainly two alternative non-parametric methods both
GRIN Publishing, located in Munich, Germany, has specialized since its foundation in 1998 in the publication of academic ebooks and books. The publishing website GRIN.com offer students, graduates and university professors the ideal platform for the presentation of scientific papers, such as research projects, theses, dissertations, and academic essays to a wide audience.
Free Publication of your term paper, essay, interpretation, bachelor's thesis, master's thesis, dissertation or textbook - upload now!