Difference between revisions of "CRG PhD Course 2017 Introduction to Statistics in R"

From Bioinformatics Core Wiki
(Created page with "__TOC__ === Course Description === This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A t...")
 
(MODULE III. Introduction to Statistical Inference. Oct 5, 2017.)
 
(20 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
=== Course Description ===
 
=== Course Description ===
This introductory course to statistics and R is offered in 3 consecutive modules (please see Course Syllabus below), each consisting of A two-hour practicum in a computer class, using [https://www.rstudio.com R Studio].  
+
This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using [https://www.rstudio.com R Studio].  
  
 
=== Course Objectives ===
 
=== Course Objectives ===
To introduce the basic concepts of statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
+
To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.
  
 
=== Course Instructors ===
 
=== Course Instructors ===
* Sarah Bonnin (Module I) sarah.bonnin@crg.eu
+
* Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu  
* German Demidov (Module II) german.demidov@crg.eu
+
* Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu
* Julia Ponomarenko (organizer, Module III) julia.ponomarenko@crg.eu
+
  
 
=== Time and Location ===
 
=== Time and Location ===
* Oct 5, 10, 11, 2016. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
+
* Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.
  
  
 
=== Course Syllabus, Schedule, and Materials ===
 
=== Course Syllabus, Schedule, and Materials ===
  
==== MODULE I. Introduction to R. Basic functions and plotting. Oct 5, 2016. ====
+
==== MODULE I. Introduction to R. Oct 3, 2017. ====
* Introduction to R programming language:  
+
* Introduction to R:  
** Introduction to R Studio.
+
** What is R?
** Data types, variables, packages, handling files/scripts,functions.
+
** Why to use R?
* Basic plots in R. The ggplot2 package.
+
* R studio:
** Exploratory data analysis: bar-plot, histogram, box-plot, scatter-plot.
+
** Local installation
** R Studio ggplot2 [https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf cheatsheet]
+
** Understand and explore panels
* R Markdown. How to produce html / pdf / Word reports with R.
+
* Basics of R language:
** R Studio RMD [https://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf cheatsheet]
+
** Simple arithmetic in R console
<br>
+
** Syntax
* Slides for Module 1: [[Media:161005_Module1_Introduction_to_R.pdf|here]]
+
** Objects
* Template for Exercise 4: [[Media:Exercise4.Rmd|here]]
+
* Functions in R:  
 +
** Use functions
 +
** Get help
 +
** Arguments in functions
 +
* Data types and data structures in R
 +
** Types: numeric, character, logical.
 +
** Structures: vectors, data frames, matrices.
 +
* Slides for Module I: [[Media:171003_Introduction_to_R_day1.pdf|Open the pdf file.]]
 
<br>
 
<br>
  
==== MODULE II. Introduction to Probability & Hypothesis testing. Oct 10, 2016. ====
+
==== MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017. ====
* Independence, conditional probability, Bayes formula.
+
* Part 1:
* Distributions, population mean and population variance.
+
** Packages in R: find, install, load, explore/find functions and documentation, get help on functions.
* Central Limit theorem and the Law of large numbers.
+
** Basic plots in R: bar-plots, histograms, box-plots, scatter-plots.
* The concept of hypothesis testing, type I and type II error, false discovery rate.  
+
* Part 2:
* Significance and confidence level, p-value. Confidence intervals. One-sided and two-sided tests.
+
** Exploratory data analysis and descriptive statistical functions:  summary, mean, sd, min, max, quantile.
* One-sample and two-sample tests for independent and matched samples with known and unknown variance. 
+
* Materials:
* Student t-distribution, assumption of normality.  
+
** Module II, Part 1: [[Media:171004_Introduction_to_R_day2.pdf|Open the pdf file]]
* Test for proportions.
+
** Module II, Part 2: [[Media:Module2_Descriptive_Stat_Oct_2017.html.zip|Download the zipped html-file for the practicum.]]
* [[Media:BIST_Module3_practicum.zip|Download the zip-file of the module's materials.]]  
+
** [[Media:Module_1_Lectures_June_2017.pdf|Slides: Lecture on the topic given at CRG in June 2017.]]
 
<br>
 
<br>
  
==== MODULE III. Non-parametric tests & Linear regression. Oct 11, 2016. ====
+
==== MODULE III. Introduction to Statistical Inference. Oct 5, 2017. ====
* Non-parametric tests: Sign test, Wilcoxon sum of ranks test (Mann-Whitney U-test), Wilcoxon signed rank test, Kruskal-Wallis test.  
+
* Part 1:
* Kolmogorov-Smirnov (KS) test. Shapiro test for normality. QQ-plot.
+
** Input / Output: Reading data from a file and writing data in the file.
* Data transformation.
+
* Part 2:
*[[Media:Module_3_Oct_11_2016.html.zip|Download the zip-file for this part of the practicum.]]
+
** Continue on exploratory data analysis: plots.
 +
* Materials:
 +
** Part 1: [[Media:171005_Introduction_to_R_day3.pdf|Download the slides for Input/Output]]
 +
** Part 1: [[Media:Davis_car.txt|Download the Davis_car.txt file]]
 +
** Part 2: Use the zipped-file of the Module II.
 
<br>
 
<br>
* Simple linear regression model, residuals, degrees of freedom.
 
* Interpretation of the slope, correlation, and determination coefficients.
 
* Standard error and statistical inference in simple linear regression model.
 
* Analysis of variance (ANOVA). One-way and two-way ANOVA.
 
* [[Media:BIST_Module5_hands_on.zip|Download the zip-file for the practicum on linear regression and ANOVA.]]
 
 
  
 
=== External Resources ===
 
=== External Resources ===
Line 69: Line 73:
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [https://ryouready.wordpress.com Blog "R you ready?"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 
* [http://www.r-statistics.com "R-statistics blog"]
 +
* [https://eda.nc3rs.org.uk/experimental-design Guide and tool for design and analysis of biological experiments from the UK's National Center for the Replacement Refinement and Reduction of Animals in Research (NC3R)], covering topics of control for cofounding variables, sample size, effect size, a standardised effect size, power of statistical tests, multiple testing.
 +
* [http://www.sample-size.net/ Sample/effect size online calculators for designing biomedical experiments from UC San Francisco]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* Self-paced online courses from UC Berkeley: [https://www.edx.org/course/introduction-statistics-descriptive-uc-berkeleyx-stat2-1x Descriptive Statistics.] [https://www.edx.org/course/introduction-statistics-probability-uc-berkeleyx-stat2-2x Probability.] [https://www.edx.org/course/introduction-statistics-inference-uc-berkeleyx-stat2-3x Inference.]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
 
* [http://www.stat.berkeley.edu/~stark/SticiGui/ Online book recommended for the UC Berkeley courses]
Line 76: Line 82:
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-1-statistics-harvardx-ph525-1x Self-paced online course from Harvard "Statistics and R"]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
 
* [https://www.edx.org/course/data-analysis-life-sciences-3-harvardx-ph525-3x Self-paced online course from Harvard "Statistical Inference and Modeling for High-throughput Experiments" ]
* [http://data.bits.vib.be/pub/trainingen/StatTheory/SlidesFullDay.pdf VIB "Basic statistics theory" course slides.]
+
* [http://students.brown.edu/seeing-theory/ The Seeing Theory website visualizes the fundamental concepts covered in an introductory college statistics, using D3.jc]
* [https://www.bits.vib.be/index.php/training/180#download VIB "Basic statistics in R" course. Tutorial, excercises, cheat sheets.]
+

Latest revision as of 12:18, 5 October 2017

Course Description

This introductory course to exploratory data analysis and R is offered in 3 two-hour consecutive modules (please see Course Syllabus below), each consisting of a hands-on practicum in a computer class, using R Studio.

Course Objectives

To introduce or to refresh the basic concepts of descriptive statistics and how they can be applied to real-life datasets using R. The students will produce their first scripts that can be re-used when they start analyzing their own data. Knowledge of statistics or R is not required for taking this course. However, familiarity with the material in the previous modules is recommended if the modules are not taken in a sequence.

Course Instructors

  • Sarah Bonnin (Module I, II) sarah.bonnin@crg.eu
  • Julia Ponomarenko (organizer, Module II, III) julia.ponomarenko@crg.eu

Time and Location

  • Oct 3, 4, 5, 2017. 11:00 - 13:00. PRBB Building. Boinformatics classroom. 468. 4th floor. The hotel wing.


Course Syllabus, Schedule, and Materials

MODULE I. Introduction to R. Oct 3, 2017.

  • Introduction to R:
    • What is R?
    • Why to use R?
  • R studio:
    • Local installation
    • Understand and explore panels
  • Basics of R language:
    • Simple arithmetic in R console
    • Syntax
    • Objects
  • Functions in R:
    • Use functions
    • Get help
    • Arguments in functions
  • Data types and data structures in R
    • Types: numeric, character, logical.
    • Structures: vectors, data frames, matrices.
  • Slides for Module I: Open the pdf file.


MODULE II. Descriptive Statistics & Plots in R. Oct 4, 2017.


MODULE III. Introduction to Statistical Inference. Oct 5, 2017.


External Resources

Bioinformatics Core Facility @ CRG — 2011-2024