Excerpt

## Table of contents

List of Illustrations and formulas

1. Scope of work

2. Procedure

3. General description

3.1. Software SPSS from the Firma Company

3.2. Analysis of Variance (ANOVA)

3.3. Multiple Regression Analysis with two or more variables

4. Definition and formulation of the task / hypothesis

4.1. Hypothesis 1 (one variable)

4.2. Hypothesis 2 (two variables)

4.3. Hypothesis 3 (three variables)

5. Composition of the required data base

5.1. Specification of the required variables

5.2. Setting up the data structure in SPSS

6. Evaluation of the data base / examination of hypothesis 1

6.1. Examination of the preconditions

6.2. Calculation of regression with one variable and interpretation of the results

7. Evaluation of the data base / examination of hypothesis 2

7.1. Examination of preconditions

7.2. Calculation of regression with two variables and interpretation of the results

8. Evaluation of the data base / examination of hypothesis 3

8.1. Examination of the preconditions

8.2. Calculations of regression with three variables and interpretation of the results

9. Conclusion

Bibliography

## List of Illustrations and formulas

List of Illustrations:

Figure 1: variables structure

Figure 2: Normal spread of the variable mileage reading

Figure 3: Box plots of the variables price and mileage reading

Figure 4: ANOVA

Figure 5: Model summary

Figure 6: Coefficients

Figure 7: Scatter diagram Price / Service and Price / Mileage

Figure 8: ANOVA for two variables

Figure 9: Model summary for two variables

Figure 10: Coefficients with two variables

Figure 11: Box plots Price / Garage

Figure 12: ANOVA with three variables

Figure 13: Model summary for three variables

Figure 14: Coefficients for three variables

## List of formulas:

Formula 1: Null hypothesis to hypothesis 1

Formula 2: Null hypothesis to hypothesis 2

Formula 3: Null hypothesis to hypothesis 3

Formula 4: Regression equation for hypothesis 1

Formula 5: Estimate for the regression equation of hypothesis 1

Formula 6: Regression equation for hypothesis 2

Formula 7: Estimate for the regression equation to hypothesis 2

Formula 8: Regression equation for hypothesis 3

Formula 9: Estimate for the regression equation to hypothesis 3

## 1. Scope of work

The following chapters deal with the scope of work and containing the following steps:

- Definition of the scope of work

- Development of a purposeful data base

- Specification of methods applied, in each case using one method with one variable (e.g. ANOVA models) and one method with two variables (e.g. multiple regression analyses).

- Analysis of the data base on the basis of the above methods by means of software SPSS from IBM

- Presentation of results

- Conclusions and interpretation of results

The project shall be compiled in the English language and shall be not exceed 20 pages in length. The procedures applied are described in the following chapter.

## 2. Procedure

In order to solve the tasks listed in the previous chapter, first of all some general terms are explained and the applied methods dealing with the subject described. A conceptual formulation is then presented and the required data base compiled. When the data base is completed, it is analyzed and evaluated by means of the defined methods. In conclusion, the results are presented and interpreted and deductions specified accordingly.

## 3. General description

The explanations of the following terms are intended to enable easier access to the subject field. Processing of the specified tasks would otherwise only be possible with difficulty, if there were no understanding of what lies behind these tasks.

### 3.1. Software SPSS from the Firma Company

The company SPSS was founded in 1968 with the development of the program of the same name at the American Stanford University by Norman H. Nie, C. Hadlai (Tex) Hull and Dale Bent. At this time the name SPSS stood as an abbreviation for “Statistical Package for the Social Sciences”.

Today SPSS is used solely for the original product, as over the years the functions of the software have further developed and today cannot really be abbreviated as such. The SPSS Company was taken over by IBM in 2009.

Today IBM issues the software SPSS Statistics, a module-based program package for statistical analysis of data. The basis module enables fundamental data management and extensive statistic and graphic data analyses with the most used statistical methods^{[1]}. Further, there are various additional modules that can be added to the basic module.

For the purposes of this paper, the author used the German SPSS Version 19. One of the possible statistical calculations which can be made using the software is, for example, the analysis of variance which is described in detail in the next chapter.

### 3.2. Analysis of Variance (ANOVA)

ANOVA stands for “Analysis of Variance”. This statistics method is used to determine the differences between various conditions/groups and to compare more than two conditions with one another.^{[2]} ANOVA is then used when there is a dependent variable and a factor with three or more levels or several factors (independent variables).^{[3]} The analysis of variance compares mean values of three or more conditions.^{[4]} This is an extension of the T-Test to more than two groups or more than one independent variable (functions also with only two conditions – the results are then identical with the T-Test results).^{[5]} The purpose is to investigate the dependence of one variable on a second variable.

With the help of ANOVA the variance of the data under examination can be separated according to systematic variance (variance arising from experimental manipulation, “treatment effects”) and non-systematic variance (variance arising from individual differences and experimental errors).^{[6]} Since variance is in direct relationship to the total square sum, this allocation of the total square sum, also known as variance analysis or abbreviated to ANOVA.^{[7]} ANOVA only confirms whether there is a significant effect, i.e. that there are significant differences in the mean values, but it is not known exactly how the mean values are different to one another.^{[8]} Preconditions for the calculation of an Analysis of Variance are listed as follows:

- A variable based on interval scale level^{[9]}

- Normal apportionment of criterion variables in main unit^{[10]}

- At least one independent variable that enables a group allocation^{[11]}

- Comparison groups must comprise independent random samples^{[12]}

SPSS software offers the calculation of an ANOVA under linear regression. This however, only indicates whether there is a significant effect or if the mean values are significantly differentiated. It does not however, state exactly the mean values for differentiation.

As well as variance analyses with one variable in the SPSS there are also possible calculations with several variables, such as a multiple regression analysis. These are described in more detail in the following chapter.

### 3.3. MultipleRegression Analysis with two or more variables

Following the description of ANOVA, multiple regression analysis is now described as an analysis with two or more variables. This form of multivariant analysis is different from a one variant analysis in that two or more factors are used for the explanation of the criteria variables.^{[13]} In respect of the ANOVA described in the previous chapter it is also possible that this can be calculated with several variables.

The most widespread statistical procedure for testing or determining multivariant connections is multiple regression analysis.^{[14]} The aim of this procedure is to set up a relationship between a dependent and one or more independent variables.^{[15]} The procedure is used in particular to describe connections in quantitative terms or to forecast dependent variables.^{[16]} Many practical applications are given when there is a variable y and a number of variables X1,…..,Xp that can be connected to y.^{[17]} The regression proceedings can thus be used, for example to quantify the strength of the connection. Mathematical methods determine a function such that the residua are minimal.^{[18]} Residua represent the basis for an estimate of variance of the disturbance variables close to expectations.^{[19]} The form of the function depends to a far extent on the method used. Linear regression comprises only linear functions or logistic regression takes only logistic functions into account. In the subsequent chapter the linear regression used is the model for the specification that the dependent variable y is a linear combination of the regression coefficientAbbildung in dieser Leseprobe nicht enthalten but not necessarily of the dependent variable x.^{[20]} In order to specify the model parameters the method used is that of the smallest quadrants.^{[21]}

Following the above general outline descriptions, the following chapters now deal with the specification and include the hypothesis.

## 4. Definition and formulation of the task / hypothesis

In this chapter the specific tasks to be undertaken and the hypotheses are presented and defined. This is necessary in order that later, when data is evaluated, a statement (hypothesis) can be made (hypothesis has been confirmed or refuted). The author is at present occupied with his car. He must decide whether to continue using the present car or whether to exchange it in part payment for a new car. In this respect the author is considering some questions, subsequently defined here as the hypotheses.

### 4.1. Hypothesis 1 (one variable)

The author of this work assumes that there is a linear connection between the mileage reading in the vehicles and their current selling price. It is assumed that vehicles with a high number of kilometers performed will fall linearly in price. This hypothesis is formulated on the basis of data for 100 vehicles that have been researched in the Internet and the book PASW Statistics from Reinhold Hatzinger and Herbert Nagel. The null hypothesis to hypothesis 1 can thus be defined as follows:

Abbildung in dieser Leseprobe nicht enthalten

Formula1: Null hypothesis to hypothesis 1

According to the definition of hypothesis 1, hypothesis 2 can be set up in the next chapter.

### 4.2. Hypothesis 2 (two variables)

Further, the author of this work assumes that apart from the mileage of the vehicles, also the number of customer services carried out to the vehicle shows a linear correlation to the sales price of the vehicle. The basis is the same data base as for the examination of hypothesis 1. On the basis of this hypothesis the following null hypothesis can be defined as shown in formula 2:

Abbildung in dieser Leseprobe nicht enthalten

Formula2: Null hypothesis to hypothesis 2

Having defined hypothesis 2, hypothesis 3 is now described in the next chapter.

### 4.3. Hypothesis 3 (three variables)

In this hypothesis the author also assumes that as well as the mileage reading of the vehicles and the customer services carried out to the vehicle, the fact of whether the vehicle was kept in a garage or in the open also stands in a linear correlation to the sales price of the vehicle. The same data base is used as for the examination of hypotheses 1 and 2. On the basis of the hypotheses the following null hypothesis, as described in formula 3, can be defined:

Abbildung in dieser Leseprobe nicht enthalten

Formula3: Null hypothesis to hypothesis 3

Having established the hypotheses, the next chapter describes the composition of the required data or data base.

## 5. Composition of the required data base

For the examination of the hypotheses defined in the previous chapter, the data are now compiled and edited and a data base established which can be evaluated. For this the following steps are necessary.

### 5.1. Specification of the required variables

The required variables for the examination of the hypotheses are to be specified. The values received on the basis of empiric research are then recorded. The author of this paper has decided on the following variables:

- Current price

- Current mileage reading

- Number of customer services carried out

- Garage available

- Color of the vehicle

Data were researched on a total of 100 second-hand lower medium-sized vehicles. All vehicles, including the author’s vehicle, were about three years old. Research was carried out my means of the offers of vehicles for sale via diverse Internet portals such as autoscout.de or mobile.de. If data were not available, the information was requested from the sellers by telephone. It will be seen late in the evaluation of the data base whether in fact all the data were necessary. The data collected in this way were now recorded under SPSS.

### 5.2. Setting up the data structure in SPSS

Since the variables have already been defined, the analogous variables structure was now set up in SPSS. After the entries were completed, this file could be saved under “file”, save under” in the save-file format of SPSS. Figure 1 shows the variables structure that was set up:

Abbildung in dieser Leseprobe nicht enthalten

Figure1: variables structure

For the variables price, kilometers and service the measuring level “Scale” was selected and for the two remaining variables, garage and color the measuring Level “Nominal”. In addition, for these variables the value label was recorded with possible entry (such as e.g. for garage available) of “yes” or “no”. Finally the 100 data sets created were collated one after the other.

With data recorded in this way it was then possible in a further step to carry out the first evaluations or data evaluations.

## 6. Evaluation of the data base / examination of hypothesis 1

Now that the recorded data base and the variables structure were fixed, the first evaluations could be generated. At the beginning it was examined whether the preconditions as set forth in Chapter 3.2 were given. Then the evaluations for ANOVA could be drawn up.

### 6.1. Examination of the preconditions

The preconditions set forth in Chapter 3.2 were now examined.

The first precondition, a variable measured at interval scale level was given. An interval scale could be formed through the numeric variable “price”, and also the numeric variable “mileage reading”.

The next precondition to be examined was the normal spread of the criteria variables. Here there is a normal spread over the variable “mileage reading”. The spread of the variable mileage reading represented in figure 2 is given.

illustration not visible in this excerpt

Figure2: Normal spread of the variable mileage reading

The precondition that at least one independent variable which makes group allocation possible is also provided. The variable price can be subdivided into various price classes at any time.

In the view of the author, the independent random samples are also given, as the data of the 100 vehicles were selected randomly from the Internet.

The two box plots that can be seen in the following figure 3, behave in a similar way.

illustration not visible in this excerpt

Figure3: Box plots of the variables price and mileage reading

The “box” can be defined as the box in which the mass of the data is to be found.^{[22]} The lines attached to the boxes reflect values which occur less frequently.

Having examined all the preconditions and established that these were met, in the following chapter the linear regression can be calculated with one variable and the hypothesis examined.

### 6.2. Calculation of regression with one variable and interpretation of the results

After having examined all the preconditions, the calculations on linear regression can be carried out together with ANOVA, the results documented and subsequently interpreted.

For the value Y (response variable) which is to be expected, the following regression equation as shown in formula 4 is set up. Here X stands for the explanatory variable and Abbildung in dieser Leseprobe nicht enthalten or Abbildung in dieser Leseprobe nicht enthalten for the regression coefficients “constant” and “rise” of the regression lines.

illustration not visible in this excerpt

Formula4: Regression equation for hypothesis 1

For the calculation of regression the function “Analyze” “Regression” and here “Linear” is selected in SPSS. Here the price is selected as the dependent variable and as constant, the predictor variable “kilometers”. The significance level 0.05 is then also selected.

After carrying out the calculation, SPSS determines the results shown in figure 4 to the sum of squares of regression.

illustration not visible in this excerpt

Figure4: ANOVA

The total sum of squares is in the line “total” and is 25,739,560.76; the residua sum of squares in the line “non standardized residua” is 9,005,449,877. The difference between these values is given in the line regression as 16,734,110.88. Based on the comparison of the total sum of squares with the sum of sum of residua squares is a measured value for the quality of the data through a line, the coefficient of determination or R².^{[23]} The value for R² can be read off in figure 5 Model summary in the column “R-Quadrat”. The value is 0.650.

**[...]**

^{[1]} Cf. http://de.wikipedia.org/wiki/SPSS, Current as of 30.07.2011

^{[2]} Cf.Rasch / Friese / Hofmann / Naumann, 2009, page 50, translated from the German

^{[3]} Cf.Litz, 2000, page 122, translated from the German

^{[4]} Cf.Zoefel / Bühl, 2000, page 171, translated from the German

^{[5]} Cf. Rumsey, 2008, pape 181, translated from the German

^{[6]} Cf.Rasch / Friese / Hofmann / Naumann, 2009, page 18, translated from the German

^{[7]} Cf.Hatzinger / Nagel, 2009, page 225, translated from the German

^{[8]} Cf. Hermann / Homburg / Klarmann, 2007, page 592, translated from the German

^{[9]} Cf. Schnell / Hill / Esser, 2008, page 147, translated from the German

^{[10]} Cf. Janssen / Laatz, 2010, page 347, translated from the German

^{[11]} Cf. Hermann / Homburg / Klarmann, 2007, page 117, translated from the German

^{[12]} Cf. Janssen / Laatz, 2010, page 347, translated from the German

^{[13]} Cf.anssen / Laatz, 2010, page 367, translated from the German

^{[14]} Cf. Wolf / Best, 2010, page 21, translated from the German

^{[15]} Cf.Gramlich, 2002, page 59, translated from the German

^{[16]} Cf.Raithel, 2008, page 156, translated from the German

^{[17]} Cf. Urban / Mayerl, 2011, page 29. Translated from the German

^{[18]} Cf. Eckstein, 2006, page 91, Translated from the German

^{[19]} Cf. Hackl, 2008, page 68, Translated from the German

^{[20]} Cf. Bornmann, 2003, page 95. Translated from the German

^{[21]} Cf. Bortz / Schuster, 2010, page 199, Translated from the German

^{[22]} Cf.Toutenburg / Knöfel, 2007, page 85, translated from the German

^{[23]} Cf.Hatzinger / Nagel, 2009, page 225, translated from the German

- Quote paper
- M.Sc. Wolfgang Illig (Author), 2011, Statistical analysis in practice and Evaluation of research results, Munich, GRIN Verlag, https://www.grin.com/document/180784

Comments