Difference between revisions of "CRG PhD & Masters Course 2016 Introduction to Statistics in R"

From Bioinformatics Core Wiki
(Created page with "__TOC__ === Course Description === This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A...")
 
 
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
__TOC__
 
__TOC__
 
  
 
=== Course Description ===
 
=== Course Description ===
Line 6: Line 5:
  
 
=== Course Objectives ===
 
=== Course Objectives ===
To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
+
To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
  
 
=== Course Instructors ===
 
=== Course Instructors ===
 
* Sarah Bonnin (Module I) sarah.bonnin@crg.eu  
 
* Sarah Bonnin (Module I) sarah.bonnin@crg.eu  
* Julia Ponomarenko (organizer, Module II) julia.ponomarenko@crg.eu
+
* German Demidov (Module II) german.demidov@crg.eu
* German Demidov (Module III) german.demidov@crg.eu
+
* Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu
 
+
  
 
=== Time and Location ===
 
=== Time and Location ===
Line 20: Line 18:
 
=== Course Syllabus, Schedule, and Materials ===
 
=== Course Syllabus, Schedule, and Materials ===
  
 
+
==== MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016. ====
==== MODULE I. Descriptive statistics. Oct 5, 2016. ====
+
* Introduction to R programming language:
* Introduction to R programming language and R Studio: Data types, variables, packages, functions, handling files/scripts/projects.
+
** Introduction to R Studio.
* Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot, pie charts. Outliers.
+
** Data types, variables, packages, handling files/scripts,functions.  
 
* Basic plots in R. The ggplot2 package.
 
* Basic plots in R. The ggplot2 package.
* TO CHANGE TO A NEW ONE. SCRIPT.[[Media:Practicum1 ggplot2.pdf|View pdf-file in this browser window.]]
+
** Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
* TO CHANGE TO A NEW ONE. SLIDES.[[Media:Practicum1 ggplot2.pdf|View pdf-file in this browser window.]]
+
** R Studio ggplot2 [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
 
+
* R Markdown. How to produce html / pdf / Word reports with R.
 
+
** R Studio RMD [https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf cheatsheet]
 
+
<br>
TO MAKE UP THE OTHER TWO MODULES
+
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]]
 
+
* Template for Exercise 4: [[Media:Exercise4.Rmd|here]]
 
+
<br>
 
+
 
+
 
+
==== MODULE II. Introduction to Probability. May 9, 2016. CRG. ====
+
* LECTURE. [[Media:Module2.pdf|View slides in this browser window.]] Independence, conditional probability, Bayes formula. Distributions, population mean and population variance, Binomial, Poisson, and Normal distribution. Central Limit theorem and the Law of large numbers. Continuity correction. Sampling with and without replacement. Correction for finite population size.  
+
* PRACTICUM. [[Media:Practicum2.zip|Download the zip-file.]] Elementary probability problems in R, pdf and cdf functions, simulation explicating the law of large numbers.
+
* [[Media:Tables corrected.pdf|STATISTICAL TABLES]]
+
* [[Media:QUIZ2.pdf|QUIZ 2]]
+
 
+
 
+
==== MODULE III. Statistical Inference, part I. May 13, 2016. CRG. ====
+
* LECTURE. [[Media:Module3.pdf|View slides in this browser window.]] Statistical Inference, part I. The concept of hypothesis testing, type I and type II error, false discovery rate. Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests and confidence intervals. Sampling distribution, estimators, standard error. Normal probabilities in application to p-value. One-sample and two-sample tests for independent and matched samples with known variance.  The case of unknown variance and Student t-distribution, assumption of normality. Pooled variance and equal variances assumption.  
+
* PRACTICUM. [[Media:BIST_Module3_practicum.zip|Download the zip-file.]] One- and two-sample tests with known and unknown variance, test for proportions, simulation involving confidence intervals and t-distribution.
+
* [[Media:QUIZ3.pdf|QUIZ 3]]
+
 
+
 
+
==== MODULE IV. Statistical Inference, part II. May 18, 2016. CRG. ====
+
* LECTURE. [[Media:Module4-2.pdf|View slides in this browser window.]] Statistical Inference, part II. Estimation of variance. Fisher test for variance equality. Non-parametric tests. Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test. Chi-square test for goodness of fit, chi-square test for independence. Kolmogorov-Smirnov (KS) test. Shapiro test for normality. Sample size estimation. Correction for multiple testing, family-wise error rate.
+
* PRACTICUM. [[Media:Module4.zip|Download the zip-file.]] Tests with unknown variance, non-parametric tests, simulations explicating non-parametric tests, FDR.
+
* [[Media:QUIZ4.pdf|QUIZ 4]]
+
  
 +
==== MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.  ====
 +
* Independence, conditional probability, Bayes formula.
 +
* Distributions, population mean and population variance.
 +
* Central Limit theorem and the Law of large numbers.
 +
* The concept of hypothesis testing, type I and type II error, false discovery rate.
 +
* Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
 +
* One-sample and two-sample tests for independent and matched samples with known and unknown variance. 
 +
* Student t-distribution, assumption of normality.
 +
* Test for proportions.
 +
* [[Media:BIST_Module3_practicum.zip|Download the zip-file of the module's materials.]]
 +
<br>
  
==== MODULE V. Statistical modeling, Regression. May 20, 2016. CRG. ====
+
==== MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016. ====
* LECTURE. [[Media:Module5-2.pdf|View slides in this browser window.]] Simple linear regression model, residuals, degrees of freedom, least squares method, correlation coefficient, variance decomposition, determination coefficient. Interpretation of the slope, correlation, and determination coefficients. Standard error and statistical inference in simple linear regression model. Analysis of variance (ANOVA). One-way and two-way ANOVA.  
+
* Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.  
* PRACTICUM. [[Media:BIST_Module5_hands_on.zip|Download the zip-file.]] Problems on linear regression, ANOVA, data transformation.
+
* Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
* [[Media:QUIZ5.pdf|QUIZ 5]]
+
* Data transformation.
 +
*[[Media:Module_3_Oct_11_2016.html.zip|Download the zip-file for this part of the practicum.]]
 +
<br>
 +
* Simple linear regression model, residuals, degrees of freedom.
 +
* Interpretation of the slope, correlation, and determination coefficients.  
 +
* Standard error and statistical inference in simple linear regression model.  
 +
* Analysis of variance (ANOVA). One-way and two-way ANOVA.  
 +
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]]
  
  

Latest revision as of 16:53, 10 October 2016

Course Description

This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A two-hour practicum in a computer class, using R Studio.

Course Objectives

To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I) sarah.bonnin@crg.eu
  • German Demidov (Module II) german.demidov@crg.eu
  • Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 5, 10, 11, 2016. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016.

  • Introduction to R programming language:
    • Introduction to R Studio.
    • Data types, variables, packages, handling files/scripts,functions.
  • Basic plots in R. The ggplot2 package.
    • Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
    • R Studio ggplot2 cheatsheet
  • R Markdown. How to produce html / pdf / Word reports with R.


  • Slides for Module 1: here
  • Template for Exercise 4: here


MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016.

  • Independence, conditional probability, Bayes formula.
  • Distributions, population mean and population variance.
  • Central Limit theorem and the Law of large numbers.
  • The concept of hypothesis testing, type I and type II error, false discovery rate.
  • Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
  • One-sample and two-sample tests for independent and matched samples with known and unknown variance.
  • Student t-distribution, assumption of normality.
  • Test for proportions.
  • Download the zip-file of the module's materials.


MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016.

  • Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.
  • Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
  • Data transformation.
  • Download the zip-file for this part of the practicum.



External Resources

Bioinformatics Core Facility @ CRG — 2011-2024