Quantitative Methods in Business - Project Assignment
Table of Contents
I. Introduction and Purpose of this Project 3
II. Project Related Basics in Statistic s 4
III. Description of Selected Data Set 6
A. General Description 6
B. Boxplot 8
C. Histogram 9
D. Scatter Diagrams 10
E. Seasonal Index 11
F. Multicollinearity 12
IV. Regression Analyses 13
A. Simple Linear Regression Analysis 13
B. Multiple Regression Analysis - Linear Model 16
C. Analysis of Residuals 18
D. Multiple Regression Analysis - Natural Log Transformation 20
V. Prediction 21
VI. Conclusion 24
VII. References 26
VIII. Appendix 27
2
Introduction and Purpose of this Project
Statistical analyses are very important today. In many areas like science or economics, for example, statistical analyses are used to support assumptions and to predict future data. With regards to business administration, modern business statistics can be used to influence decision making in finance, marketing or production, for instance.
The scope of the current project is to analyze a data set “Ibell” of phone calls and to predict future quantity of phone calls based on a regression analysis. The “Ibell” data set is related to the U.S. based company International Bell Communications (Ibell) that owns and operates direct routes through-out the world (International Bell Communications, 2008). Four variables are provided in the “Ibell” data set; three independent variables and one dependent (also called response) variable. The independent respectively predictor variables are “Quarter”, “Price” (price charged for long-distance calls in US$), and “Perinc” (reflecting the local average personal income in US$). The dependent variable is “Quantity” - the number of long-distance phone calls. The present data set was provided by the professor of the QMB class. Thus, the data has not been personally collected and hence the author of this report can not personally guarantee for the quality of the data set. However, the predictor variables of “Quarter”, “Price”, and “Perinc” seem fairly reasonable influences on the number of long-distance calls, in general. There are three major parts in this report. First, a general description of the data set will be presented, including the sort of variables, the characteristics of the observations, and the peculiarities in the distribution. Second, regression analyses estimate the validity of a modeled relationship between the dependent and the independent variables. Finally, the researcher will predict future quantity of long-distance calls for the upcoming four quarters in order to support
3
Quantitative Methods in Business - Project Assignment
International Bell Communications in network capacity planning as well as in revenue forecasts, for instance. Project Related Basics in Statistics
Since the current data set is only a sample of a population some crucial properties of sample statistics have to be taken into account before starting with the report. Every sample statistic has got a “sample” error, which is the result of the fact that the sample represents “only” an extract of the total population. Besides those inherent sampling errors, there are also nonsampling errors such as measurement errors, mismatch between sample and population, or experimenter bias, for instance (Gayle Baugh lecture notes). As previously mentioned, the researcher did not personally collect the data and therefore can only assume that non-sampling errors are not included in the data set.
Assuming a perfect sample data set, certain predictions on the total population are statistically valid. Although the sample coefficients are not the same as the population parameters, the distribution of latter parameters can be hence inferred from the sample. If the sample size is large enough (according to Anderson, Sweeny, Williams a size of 30 respectively 50 if population is highly skewed) the sampling distribution of a variable can be approximated by a normal distribution (Central Limit Theorem). In addition, the bigger the sample, the higher the probability that the sample result is relevant for the population. Thus, the higher the probability that the sample mean falls within a specified distance of the population mean (Anderson, Sweeney, Williams, 2006). However, “because a point estimator cannot be expected to provide the exact value of the population parameter, an interval estimate is often computed by adding and subtracting a value, called the margin of error, to the point estimate.” (Anderson, Sweeney, Williams, 2006, p. 307)
4
Quantitative Methods in Business - Project Assignment
Based on the findings in the sample, assumptions can be made. The tentative assumption is called the null hypothesis. The opposite is the alternative hypothesis. “The hypothesis testing uses data from a sample to test the two competing statements indicated by null hypothesis and alternative hypothesis.” (Anderson, Sweeney, Williams, 2006, p. 347) As a general guideline, research statements should be formulated as the alternative hypothesis. Hypothesis testing and particularly hypothesis based decisions might be critical. Therefore the error probability of Type I errors (rejecting the null hypothesis although it is true) and particularly of Type II errors (accepting the null hypothesis although the alternative hypothesis is true) has to be minimized as much as possible. Certain checks help to decrease the error probability. For instance, “the level of significance is the probability of making a Type I error where the null hypothesis is true as an equality.” (Anderson, Sweeney, Williams, 2006, p. 350) Researchers can define a level of significance “alpha” which represents the risk that they are willing to take. This alpha value can be compared to a probability value “p-value”. “The p-value is a probability, computed using the test statistic, that measures the support (or lack of support) provided by the sample for the null hypothesis.” (Anderson, Sweeney, Williams, 2006, p. 354) That means it is recommended to reject the null Hypothesis if the p-value is less or equal than alpha.
The process of, statistical inference, using data obtained from a sample to make estimates or test hypothesis about the characteristics of a population can also be applied to the Ibell project. The corresponding data set comprises 76 observations. This amount of data is sufficient for the statistical analysis which will be presented in this project and hence allows predicting future quantity of long-distance calls for the specific local region in which the data was collected. The alternative hypothesis for this project is increased quantity of long-distance phone calls in the upcoming four quarters 77 to 80. In contrast, the opposite statement, no increased quantity of
5
Quantitative Methods in Business - Project Assignment
long-distance phone calls, is the null hypothesis. Since the predicted quantity is a very important number for International Bell Communications, the researcher defines an alpha value of .05 as a level of significance (see chapter IV B. “Multiple Regression Analysis - Linear Model”) and hence as a limit for the p-value probability checks.
Description of Selected Data Set
The following chapter deals with a description of the data set “Ibell”. Starting with a general presentation of some data properties of the corresponding data set, there will be additional graphical illustrations of the data set. For instance, a box plot, a histogram, a seasonal index, and some scatter diagrams will be presented in order to run a first step description and analysis of the “Ibell” data set. The main focus will be on the dependent variable “Quantity”.
General Description
The data set contains 76 observations - it is complete and not missing any information -and four variables - three independent and one dependent variable. All variables are based on quantitative data and they are measured in a ratio scale. “The scale of measurement for a variable is a ratio scale if the data have all the properties of interval data and the ratio of two values is meaningful.” (Anderson, Sweeney, Williams, 2006, p. 7) Moreover, the data set comprises time series data which means that the observations were collected over several time periods. With respect to the source of the data the researcher can only make assumptions since the data was not personally collected. There is no denying the fact, that the variables “Quarter”, “Price”, and “Quantity” can be extracted from existing sources like company records, for instance. The fourth variable “Perinc”, however, is supposed to be derived from statistical studies.
6
Quantitative Methods in Business - Project Assignment
Running a Microsoft Excel based “descriptive statistics” analysis on the variables the results are as presented on table 1-3. Table 1 indicates the range as well as the minimum and maximum values of each variable. Based on this first analysis there is no indicator for outliers or extreme values so far. On table 2 some average key figures like the mean and the median are calculated. The mean provides a measure of central location for the data. It is calculated by the sum of all values divided by the number of observations. Arranging the data in ascending or descending order the median can be determined by the middle value of the observations. Finally, the mode is the value that occurs with greatest frequency. The variables “Quantity” and “Quarter” do not have a mode value since no observation value appears twice in the data set. In general, the values for the mean, median, and the mode look pretty good and can be an indicator (combined with the minimum and maximum values) for a normal distribution of the data sets. Table 3 contains some derived values measuring the variability of dispersion of the previously mentioned values. The sample variance is a measure for the variability of the data set in relation to the mean. It is the sum of the squared differences between the value of each observation and the mean divided by the number of observations. The standard deviation is a value derived from the variance. It is defined to be the positive square root of the variance (Anderson, Sweeney, Williams, 2006). The advantage of the standard deviation compared to the variance is, that it is measured in the same unit as the original data by removing the square effect of the variance calculation.
With respect to the dependent variable “Quantity” this means that we do have a range between 10,164.84 (mean value + standard deviation value) and 18,281.19 (mean value -standard deviation value). Assuming an approximate normal distribution based on our assumptions mentioned above, we can estimate that about 68.30percent (this is the standard
7
Quantitative Methods in Business - Project Assignment
value of the range defined by the mean +/- 1. standard deviation in a normal distribution) of the data are within this range. In summary, the current analysis of the dependent variable “Quantity” indicates that we do have an approximate normal distribution and there is no sign for extreme values. Even if there was any unusual observation, the researcher would not be allowed to correct it since the data has not been personally collected and the researcher has no indicator for non-sampling errors. Nonetheless, it definitely makes sense to support evidence by creating boxplot diagrams with KADD.
Boxplot
A boxplot, invented in 1977 by the American statistician John Tukey, is a convenient way of visualizing certain statistical data. It consists of four different observation categories. The lower respectively first quartile cuts off the lowest 25percent of the data. As already explained in the previous chapter the median can be determined by the middle value of the observations. The upper or third quartile cuts off the highest 25percent of the data. Finally, the interquartile range (IQR) is the range between the third and first quartiles and is a measure of statistical dispersion. It is a more stable statistic than the (standard) range presented in the previous chapter, and hence is often preferred. According to the range defined by the mean +/-1st standard deviation in a normal distribution (representing 68percent of the data), the interquartile range is a measure defining a certain area (50percent) of dispersion. This interquartile range is represented by the “box” pattern in the graphic on table 4. The box is bounded on the bottom by the first quartile (vertical line below the box) and on the top by the third quartile (vertical line above the box). The horizontal line dividing the box indicates the median value.
8
Quote paper:
Markus Schief, 2008, Multiple Non-Linear Regression Analysis, Munich, GRIN Publishing GmbH
This text can be quoted and accessed from this url:
Embed
DOI
Wertschöpfungskette - Darstellung und Bedeutung
Business economics - Controlling
Scholary Paper (Seminar), 28 Pages
Liberalisierung natürlicher Monopole am Beispiel des Deutschen Stromma...
Business economics - Economic Policy
Scholary Paper (Seminar), 36 Pages
Cash Value Added vs. Economic Value Added - eine kritische Gegenüberst...
Business economics - Controlling
Scholary Paper (Seminar), 23 Pages
The Dell Company - A Strategic Analysis
Business economics - Business Management, Corporate Governance
Scholary Paper (Seminar), 27 Pages
Business economics - Marketing, Corporate Communication, CRM, Market Research
Scholarly Research Paper, 14 Pages
Automatic stabilizers for fiscal policy
Business economics - Economic Policy
Scholarly Research Paper, 23 Pages
Die Gegenüberstellung von Capital Asset Pricing Model und Arbitrage Pr...
Business economics - Business Management, Corporate Governance
Scholary Paper (Seminar), 20 Pages
Probleme mit Befragung und des Befragtenverhaltens bei der Erhebung vo...
Sociology - Methodology and Methods
Scholarly Paper (Advanced Seminar), 32 Pages
Lineare Einfachregression und Multiple Regression
Scholarly Paper (Advanced Seminar), 14 Pages
Markus Schief has published the text Multiple Non-Linear Regression Analysis
Markus Schief has uploaded a new text
Introduction to Linear Regression Analysis
Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
Introduction to Linear Regression Analysis
Montgomery, Douglas C. Montgomery, Elizabeth A. Peck
Introduction to Linear Regression Analysis, Fourth Edition Solutions S...
Douglas C. Montgomery
Introduction to Linear Regression Analysis: Student Solutions Manual
Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining
Non-Linear Finite Element Analysis of Solids and Structures, Advanced ...
M. A. Crisfield, Crisfield
Non-Linear Finite Element Analysis of Solids and Structures, Essential...
M. A. Crisfield, Crisfield
0 comments