The purpose of this guide is to show how to conduct some data analysis using R tool. This guide is not aiming teaching statistics or related field, nevertheless, it shows practically when and how inferential statistics are conducted for those who have little knowledge on R programing environment. It is a collection of packages needed to conduct data analysis. The guide indicates step by step how to choose statistical test based on the research questions. It also presents the assumptions to be respected to validate a statistical test. This guide covers normality test, correlation analysis (numerical, ordinal, binary, & categorical), multiple regression analysis, robust regression, nonparametric regression, comparing one-sample mean to a standard known mean; comparing the means of two independent groups, comparing the means of paired samples, comparing the means of more than two group, independence test, comparing proportion, goodness of fit test, testing for stationarity for time series, exploratory factor analysis, confirmatory factor analysis, and structural equation modeling. Scripts and codes are available for each test. It shows how to report the result of the analysis. This guide will help researchers and data analysts, and will contribute to increasing the quality of their publications.

Extrait

0. Introduction

0.1. Introduction to data manipulation

0.1.1. Analyzing the data structure

0.1.2. Analyzing first observation

0.1.3. Counting the number of observations and variables

0.1.4. Selecting sample

0.1.5. Selecting variables

0.1.6. Dropping Variables

0.1.7. Renaming variable

0.1.8. Filtering subset data

0.1.9. Summarizing numerical variables

0.1.10. Group data by categorical variable

0.1.11. Selecting rows by position

0.1.12. Use of IF ELSE Statement

0.1.13. Selecting with condition

0.1.13. Summarizing the number of levels in factor variables

0.1.14. Identifying levels of factor variable

0.2. Data summarizing

0.2.1. Central tendency and dispersion measures for quantitative variable

0.2.2. Calculating frequency for qualitative variable

0.2.3. Analyzing quantitative and qualitative data

0.2.4. Analyzing two qualitative data

0.2.5. Calculating percentage for two qualitative data

1. Normality analysis

1.1. Analyzing normality visually

1.2. Testing normality numerically

1.3. Testing normality using skewness and kurtosis

2. Correlation Analysis

2.1. Pearson and Spearman Correlation

2.2. Partial correlation

2.3. Polyserial correlation

2.4. Point-biserial correlation

3. Multiple Regression Analysis

3.1. Assumptions of Multiple regression

3.2. Testing Multiple Regression Assumption in R

3.2.1. Check the linearity of the model

3.2.2. Analyzing the mean of residuals.

3.2.3. Testing homoscedasticity

3.2.4. Testing normality of residuals

3.2.5. Testing for Independence of residuals

3.2.6. Checking for collineality

3.2.7. Checking for Model Outliers

3.2.8. Checking for Other Assumptions

3.9. Variable Importance.

4. Mean Comparison

4.1. One-sample t-test

4.2. Comparing the means of two independent groups

4.2.1. Unpaired Two Samples T-test (parametric)

4.2.2.Unpaired Two-Samples Wilcoxon Test (non-parametric)

4.2.3. Comparing mean for paired sample.

4.2.3.1. Preliminary test to check paired t-test assumptions

5. Comparing the means of more than two groups

6. Test of Independence

7. Testing Association between two nominal variables

8. Comparing proportion and independence test

8.1. One-proportion Z-test

8.2. Two-proportions z-test

9. Testing for stationarity for time series

10. Factor Analysis

10.1. Exploratory Factor Analysis

10.1.1. Descriptive statistics

10.1.2. Testing correlation and sample size for factor analysis

10.1.3. Reliability

10.1.4. Identifying Optimal Number of Factors

10.1.4.1. Using Eigenvalues

10.1.4.2. Parallel analysis

10.1.5. Run the EFA with seven factors

10.2. Confirmatory Factor Analysis

10.2.1. Indices of Goodness of Fit

10.2.1. Fitting the model with CFA

10.2.1.1. Specify the model

10.2.1.2. Fitting the model

10.2.1.3. Getting the model summary, confidence intervals and goodness of fit indicators

10.2.1.4. Obtain confidence intervals for the estimated coefficients

10.2.1.5. Obtain goodness of fit indicators of the model

10.2.1.6. Reliability Analysis

10.2.1.7. Getting standardized estimates

10.2.1.8.Plotting the etimates

10.2.2. Fitting the model with CFA using SEM

10.2.2.1. Specify the model

10.2.2.2. Fitting the model

10.2.2.3. Summarizing the model

10.2.2.4. Getting goodness of fit indicators of the model

10.3. Analyzing not normally distributed

11. Structural Equation Modeling

11.1. Model one

11.1.1. Model Specification

11.1.2. Model fitting

11.1.3. Summarizing the model

11.1.4. Getting goodness of fit indicators of the model

11.1.5. Obtain goodness of fit indicators of the model Reported

11.2.Model two

11.2.1. Model specification with residual covariance

11.2.2. Model fit

11.2.3. Model summary

11.2.4 Obtain goodness of fit indicators of the model Reported

12. Goodness-of-Fit Measures

13. Mediation and Moderation

13.1. Baron and Kenny procedures

13.1.1. Analyzing the Relationship between Independent variable (FDI) and mediator variable (EXP).

13.1.2. Variables EXP and GDP must be related once the effect of FDI is controlled

13.1.3. Analyzing the relationship between Independent and Dependent variables

13.1.4. Analyzing the decrease of the Relationship between FDI and GDP

13.2. Mediation analysis using Nonparametric bootstrap

13.3. Robust mediation analysis

14. Robust Regression

14.1. Investigating Data Normality

14.2. Identifying Outliers

14.3. Linear Regression

14.4. Robust regression

14.5. Comparing Robust and Linear regression

15. Nonparametric regression

15.1. Kendall–Theil Sen Siegel nonparametric linear regression

15.2. Generalized additive models

Objectives and Research Scope

This guide provides a practical, step-by-step introduction to conducting data analysis using the R environment, specifically designed for users with limited prior programming knowledge. The primary objective is to equip researchers and data analysts with the necessary R packages, code scripts, and statistical procedures to ensure valid and high-quality research outcomes.

Practical application of statistical tests, including normality, correlation, and regression.
Step-by-step guidance on data manipulation and preparation using R packages like dplyr.
Methods for validating statistical assumptions to prevent errors in research findings.
Advanced modeling techniques such as Exploratory Factor Analysis (EFA), Confirmatory Factor Analysis (CFA), and Structural Equation Modeling (SEM).
Procedures for mean comparisons, independence tests, and robust regression analysis.

Extract from the Book

1. Normality analysis

Several of the statistical procedures as well as correlation, regression, t tests, and analysis of variance, called parametric tests, are based on the assumption that the data follow a normal distribution or a Gaussian distribution (Ghasemi & Zahediasl, 2012). Conversely, these authors added that when working with large enough sample sizes (> 30 or 40), the violation of the normality assumption should not cause major problems.

Normality can be checked visually or using numbers. Ghasemi &and Zahediasl proposed histogram, stem-and-leaf plot, boxplot, P-P plot (probability-probability plot), and Q-Q plot (quantile-quantile plot). Before conducting the any statistical test, you must know the null hypothesis that is being tested. Another thing to know is the significance level Alpha (α) which constitutes a cut-off to accept or reject the null hypothesis. In most publications, 5% is used as cut-off. However, one can define the level of his α at 10% or less.

In testing normality assumption, the null-hypothesis of this test is that the data are normally distributed. Thus, if the p-value is less than the chosen alpha level (5%, 10%), then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed. To show how normality is conducted in R, we generate following data. Data can be imported in R from one’s file.

Summary of Chapters

0. Introduction: Outlines the necessity of applying correct statistical techniques in R to prevent common errors and invalid conclusions in research.

1. Normality analysis: Discusses the visual and numerical methods for testing the assumption of normality, which is critical for the validity of parametric tests.

2. Correlation Analysis: Details how to measure the strength and direction of relationships between variables using techniques like Pearson and Spearman correlation.

3. Multiple Regression Analysis: Explains the assumptions and implementation of linear regression models in R, focusing on diagnostic checks for model validity.

4. Mean Comparison: Covers procedures for comparing sample means, including t-tests for independent and paired samples, as well as non-parametric Wilcoxon alternatives.

5. Comparing the means of more than two groups: Introduces one-way ANOVA for comparing multiple group means and the Kruskal-Wallis test as a non-parametric alternative.

6. Test of Independence: Explains the use of the Chi-square test to analyze associations between categorical variables in contingency tables.

7. Testing Association between two nominal variables: Focuses on measures like Phi Coefficient and Cramer's V for evaluating associations between categorical variables.

8. Comparing proportion and independence test: Discusses One-proportion and Two-proportions Z-tests for comparing observed proportions.

9. Testing for stationarity for time series: Describes the importance of stationarity in time-series data and how to test for unit roots using ADF and KPSS tests.

10. Factor Analysis: Explains both Exploratory (EFA) and Confirmatory Factor Analysis (CFA) to identify and test latent variables.

11. Structural Equation Modeling: Introduces the SEM framework for testing complex theoretical models involving multiple latent constructs.

12. Goodness-of-Fit Measures: Outlines the chi-square goodness of fit test for comparing observed versus expected distributions in discrete data.

13. Mediation and Moderation: Presents approaches for mediation analysis using the Baron and Kenny procedure and non-parametric bootstrap methods.

14. Robust Regression: Explores techniques to ensure reliable regression analysis even when the data contain outliers.

15. Nonparametric regression: Introduces robust alternatives like Kendall–Theil Sen Siegel regression and Generalized Additive Models (GAMs).

Keywords

Data analysis, R tool, correlation, multiple regression, structural equation modeling, t-test, ANOVA, independence test, normality test, factor analysis, mediation, moderation, robust regression, nonparametric, stationarity.

Frequently Asked Questions

What is this guide primarily about?

This guide serves as a practical, technical manual for researchers and analysts on how to perform various statistical data analyses using the R programming environment.

What are the central thematic fields covered?

The book covers a broad spectrum of statistical analysis including descriptive statistics, correlation, regression analysis, mean comparisons, factor analysis, structural equation modeling, and mediation analysis.

What is the primary goal of the author?

The author's goal is to bridge the gap between statistical theory and practical implementation, helping users perform tests correctly to increase the quality and reliability of their published research.

Which scientific methods are primarily used?

The book employs both parametric and non-parametric statistical methods, ranging from standard tests like t-tests and ANOVA to more complex techniques like EFA, CFA, and SEM.

What is the focus of the main content?

The main content focuses on practical R code, the interpretation of statistical output, and the validation of specific assumptions required for each test, such as normality, homoscedasticity, and independence.

How would you describe the keyword profile of this work?

The work is characterized by keywords related to empirical research methods and R programming, specifically focusing on data manipulation, regression, and complex structural modeling.

Why is it necessary to perform normality tests before conducting regressions?

Parametric tests assume normally distributed data; violating this assumption can lead to biased regression coefficients and invalid standard errors, potentially resulting in false Type I or Type II errors.

What does the book suggest when data contains outliers?

The author advises using robust regression methods, such as the `lmRob` or `mblm` functions, which down-weight the influence of deviating observations to yield more reliable results compared to standard linear regression.

Fin de l'extrait de 136 pages - haut de page

Résumé des informations

Titre: Practical Guide for Data Analysis Using R Tool
Cours: Independent Researcher
Note: "-"
Auteur: Docteur Antoine Niyungeko (Auteur)
Année de publication: 2021
Pages: 136
N° de catalogue: V1010252
ISBN (ebook): 9783346401373
Langue: anglais
mots-clé: Data analysis correlation multiple regression structural equation t-test ANOVA independence test Nonparametric regression Exploratory Factor Analysis Confirmatory Factor Analysis.
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Docteur Antoine Niyungeko (Auteur), 2021, Practical Guide for Data Analysis Using R Tool, Munich, GRIN Verlag, https://www.grin.com/document/1010252

Practical Guide for Data Analysis Using R Tool